#refarch (2023-06)
Cloud Posse Reference Architecture
2023-06-01
set up an aurora rds instance using aurora-postgres but can’t seem to access it over the internet. i’ve added 0.0.0.0/0
to allowed_cidr_blocks
and set publicly_accessible
to true
but still nothing is there anything obvious that i’m missing?
must be in a public subnet as well
for RDS, it’s three things that need to be addressed to access it from the internet:
- Network level - must be in a public subnet or accessible via internet in any other way. Also check if in the subnet where the RDS is deployed assigning public IPs at launch is enabled
- Security group level - check if the SG allows the access
- DNS level -
publicly_accesible
set totrue
(which creates a public DNS endpoint for the RDS instance), which you did
i guess you are missing #1 (or related to it)
i’ll double check everything
it looks like it’s associated to the private subnets in the code. is that intentional? https://github.com/cloudposse/terraform-aws-components/blob/main/modules/aurora-postgres/cluster-regional.tf#L24
subnets = local.private_subnet_ids
the component is opinionated, it’s intentional since it’s best practice to deploy it into private subnets (and (almost) never in public)
the module supports both private and public
the component can def be updated to support both using a flag
i’m still struggling getting pubic access. i updated that line to concat the public and private subnets
subnets = concat(local.private_subnet_ids, local.public_subnet_ids)
and the SG is allowing inbound and outbound traffic from/to 0.0.0.0/0
does this network acl (for public subnet) look normal?
after concating, is the instance in a public subnet?
under rds > subnet groups > xxx-ue1-dev-xxx-db-shared? yes, i see the public subnets
public and private
strange. i destroyed and recreated it, and it seems to work now
2023-06-07
Trying to cold start an AWS account with atmos, and I’m running into a problem with cloudposse/terraform-aws-components/tfstate-backend
(git tag 1.210.0
):
Error: Unsupported argument
│
│ on .terraform/modules/tfstate_backend.log_storage/main.tf line 158, in data "aws_iam_policy_document" "aggregated_policy":
│ 158: source_json = var.policy
│
│ An argument named "source_json" is not expected here.
Looks like source_json
and override_json
were deprecated in AWS provider v4, and removed in v5. Any good workaround for this?
Looks like the tfstate-backend
root module in the cloudposse/terraform-aws-components
repo still points to an old version of cloudposse/tfstate-backend/aws
in the terraform registry. Hmm.
I guess this means I just have to write my own root module to use the newer version of the tfstate-backend?
You could always vendor the tfstate-backend
and then bump the version of the downstreamed module to the latest version. The components are a reusable base and are owned by the consumer. The modules are re-usable and unchange-able except by PR. This gives you freedom in the components (root tf module/dir) to modify as desired.
https://github.com/cloudposse/terraform-aws-tfstate-backend/releases
@Jeremy G (Cloud Posse) @Max Lobur (Cloud Posse)
Correct, I’m about to release that fix for V5, expect today
The workaround would be to pin aws provider to “<5” if it doesn’t brake you anything else
@Dan Miller (Cloud Posse) @Max Lobur (Cloud Posse) I uploaded the latest changes.
2023-06-08
I’m using atmos and cloudposse/cloud-infrastructure-automation/spacelift, and trying to figure out if the “component instance name” is passed to the spacelift stack anywhere. I’m not super familiar with atmos tbh, the name I’m referring to is the actual key under components.terraform
in the stack yaml (not necessarily the name of the component itself). Anyone know if this is something I can get at?
@Nat Williams before we can answer in the best way, can you instead lead by what you want to accomplish (want to avoid xyproblem.info)
I’m using atmos and cloudposse/cloud-infrastructure-automation/spacelift, and trying to figure out if the “component instance name” is passed to the spacelift stack anywhere. I’m not super familiar with atmos tbh, the name I’m referring to is the actual key under components.terraform
in the stack yaml (not necessarily the name of the component itself). Anyone know if this is something I can get at?
in general, a Spacelift stack name is calculated from the context + the Atmis component name.
The context is: namespace, tenant, environment, stage
let’s say you have:
vars:
namespace: eg
tenant: core
environment: ue2
stage: prod
components:
terraform:
vpc-1:
metadata:
component: vpc # Point to terraform component
vars: {}
vpc-2:
metadata:
component: vpc # Point to terraform component
vars: {}
the Spacelift stack names for these two Atmos components will be:
eg-core-ue2-prod-vpc-1
eg-core-ue2-prod-vpc-2
although:
- Spacelift stack name can be overridden (per Atmos component)
- The order of the context variables can be overridden in
atmos.yaml
- Which context variables are used in the stack names is set in
atmos.yaml
(e.g. you might not use namespace and tenant, in which case the Spacelift stack names will be
ue2-prod-vpc-1
ue2-prod-vpc-2
Yeah, considering it a bit more, I guess it’s not really a Spacelift thing at all. I think ultimately I’m just surprised that vpc-1
doesn’t show up in the output of atmos terraform generate varfile
I would expect that to be in the name
var
the name var - you set it as well
it’s part of the context which is used to uniquely and consistently name the AWS resources
yeah, I just didn’t want to have to do it myself
in many cases the name var can be the same as the Atmos component name
but in some cases they are different
@Nat Williams has joined the channel
After you deploy the account module, each member account gets an email like aws+<account_name>@mycompany.com. How do you access each email to get the reset password link? In the docs you guys metion automation to get it forwarded to a shared slack channel. Did I miss a step?
For each new account:
Perform a password reset by attempting to log in to the AWS console as a "root user", using that account's email address, and then clicking the "Forgot password?" link. You will receive a password reset link via email, which should be forwarded to the shared Slack channel for automated messages. Click the link and enter a new password. (Use 1Password or Random.org to create a password 26-38 characters long, including at least 3 of each class of character: lower case, uppercase, digit, and symbol. You may need to manually combine or add to the generated password to ensure 3 symbols and digits are present.) Save the email address and generated password as web login credentials in 1Password. While you are at it, save the account number in a separate field.
are we supposed to create the emails ahead of time?
if youre using plus addressing, these emails will all go to the same address. And we typically will create a new Slack channel for notifications and then set up an email integration with the primary email address
Ah okay thanks!
Is the primary email the email in the management account?
it should be the base email that you used. [email protected] for example, so then each account would be [email protected], [email protected], etc
this page also has some of our initial set up before even the cold start https://docs.cloudposse.com/reference-architecture/quickstart/kick-off/#slack
(full disclosure, those are behind our paywall)
@Dan Miller (Cloud Posse) now knowing how this works is it possible to redeploy the component and update the emails?
I am not sure if Terraform can update account emails via the API. It’s now supported (as in AWS added support for that in the past ~6 mo), however, we’ve not tested it.
Historically, it’s a MAJOR PIA to get your account emails wrong.
crap
(it’s bitten us too)
I deployed them as
aws+%[email protected]
It should have been
devops+%[email protected]
Aha
not sure what to do
Well, the worst case is just create the [[email protected]](mailto:[email protected])
alias
on the devops
group
Then it will just work as expected. You can do resets.
what do you mean by devops group?
[devops+%[email protected]](mailto:devops+%[email protected])
Our standard recommendation is to make this a group (aka distribution list, aka google group)
Not a user account.
ah ha okay
Since nothing is deployed on the member accounts could I just delete them all and just deploy again with the correct email?
@Erik Osterman (Cloud Posse) nvm got it working with the alias! Thank you so much!
you can put in an AWS service request to change that email, but it can take several days. But since you have the alias now and it’s not blocking you, you could put in that request now to eventually get those emails corrected
Did you get it resolved?
2023-06-09
I have a question related to bootstrapping a new account with atmos. When you have been given a fresh account no S3 or dynamoDB to store your backend is there a special case/actions that need to can be done by atmos to create those resources using local backend before it can be configured to use the S3 bucket and dynamodb as the backends for all subsequent runs. If so is there an example of said setup that I can refer to. So what I want is to run Atmos to create the remote backend then use the remote backend setup by atmos to then be used for further provisioning.
this is not directly related to Atmos. You have to first create a backend before you have a backend to store the TF state in. So you create it using the local backend (TF will store the state on your local computer), then you define (or uncomment) the S3 backend, and TF will ask you to copy the local state to the remote S3 backend
Terraform module that provision an S3 bucket to store the terraform.tfstate
file and a DynamoDB table to lock the state file to prevent concurrent modifications and state corruption.
(in Atmos, you can comment out the S3 backend definition first, create the backend, then uncomment it, then run terraform plan
again, and TF will ask you to copy the state from local to S3)
Thanks for the quick response.
Conceptually, this is how it works: https://github.com/cloudposse/terraform-aws-tfstate-backend/tree/main#create
In our reference architecture, here’s how we do it:
Create a workflow file like this to automate it
workflows:
init/tfstate:
description: Provision Terraform State Backend for initial deployment.
steps:
- command: terraform deploy tfstate-backend -var=access_roles_enabled=false --stack core-{{ region_0 }}-root --auto-generate-backend-file=false
- command: until aws s3 ls {{ cookiecutter.namespace }}-core-{{ region_0 }}-root-tfstate; do sleep 5; done
type: shell
- command: terraform deploy tfstate-backend -var=access_roles_enabled=false --stack core-{{ region_0 }}-root --init-run-reconfigure=false
Workflows are a way of combining multiple commands into one executable unit of work.
Oh, replace that {{ region_0 }}
with your region.
I will have a try once I am out of all the Monday meeting to see how far I get. I am sure I will have further questions.
Maybe I am missing something here. I have the following structure
.
├── components
│ └── terraform
│ └── infra-init
│ └── main.tf
└── stacks
└── workflows
└── env-init.workflow
I use .env in the root to define my environment variables in an .envrc
export ATMOS_CLI_CONFIG_PATH=${PWD}/.atmos
export ATMOS_BASE_PATH=${PWD}
so when I cd in to the directory my environment variables get set
ATMOS_BASE_PATH=atmos
ATMOS_CLI_CONFIG_PATH=/atmos/.atmos
My atmos.yml is the default one
base_path: ""
components:
terraform:
# Can also be set using 'ATMOS_COMPONENTS_TERRAFORM_BASE_PATH' ENV var, or '--terraform-dir' command-line argument
# Supports both absolute and relative paths
base_path: "components/terraform"
# Can also be set using 'ATMOS_COMPONENTS_TERRAFORM_APPLY_AUTO_APPROVE' ENV var
apply_auto_approve: false
# Can also be set using 'ATMOS_COMPONENTS_TERRAFORM_DEPLOY_RUN_INIT' ENV var, or '--deploy-run-init' command-line argument
deploy_run_init: true
# Can also be set using 'ATMOS_COMPONENTS_TERRAFORM_INIT_RUN_RECONFIGURE' ENV var, or '--init-run-reconfigure' command-line argument
init_run_reconfigure: true
# Can also be set using 'ATMOS_COMPONENTS_TERRAFORM_AUTO_GENERATE_BACKEND_FILE' ENV var, or '--auto-generate-backend-file' command-line argument
auto_generate_backend_file: false
stacks:
# Can also be set using 'ATMOS_STACKS_BASE_PATH' ENV var, or '--config-dir' and '--stacks-dir' command-line arguments
# Supports both absolute and relative paths
base_path: "stacks"
# Can also be set using 'ATMOS_STACKS_INCLUDED_PATHS' ENV var (comma-separated values string)
included_paths:
- "orgs/**/*"
# Can also be set using 'ATMOS_STACKS_EXCLUDED_PATHS' ENV var (comma-separated values string)
excluded_paths:
- "**/_defaults.yaml"
# Can also be set using 'ATMOS_STACKS_NAME_PATTERN' ENV var
name_pattern: "{tenant}-{environment}-{stage}"
workflows:
# Can also be set using 'ATMOS_WORKFLOWS_BASE_PATH' ENV var, or '--workflows-dir' command-line arguments
# Supports both absolute and relative paths
base_path: "stacks/workflows"
logs:
file: "/dev/stdout"
# Supported log levels: Trace, Debug, Info, Warning, Off
level: Info
I create a workflow under stacks to boot strap the environment
more env-init.workflow
workflows:
init/tfstate:
description: Provision Terraform State Backend for initial deployment.
steps:
- command: terraform deploy tfstate-backend -var=access_roles_enabled=false --stack core-eu-west-1-root --auto-generate-backend-file=false
- command: until aws s3 ls test-core-eu-west-1-root-tfstate; do sleep 5; done
type: shell
- command: terraform deploy tfstate-backend -var=access_roles_enabled=false --stack core-eu-west-1-root --init-run-reconfigure=false
I create a stack/component
module "terraform_state_backend" {
source = "cloudposse/tfstate-backend/aws"
# Cloud Posse recommends pinning every module to a specific version
vrsion = "1.1.1"
namespace = var.namespace
stage = var.stage
name = "env-init"
attributes = ["state"]
terraform_backend_config_file_path = "."
terraform_backend_config_file_name = "backend.tf"
force_destroy = false
}
I run
atmos workflow init/tfstate -f env-init.workflow -s ../../components/terraform/infra-init/main.tf --dry-run
get the following error
failed to find a match for the import '/stacks/orgs/**/*.yaml' ('stacks/orgs' + '**/*.yaml')
So the issue the first is this include path
included_paths: 40 - “orgs/*/” 1 # Can also be set us in atmos.yaml
If I remove it I get
at least one path must be provided in 'stacks.included_paths' config or ATMOS_STACKS_INCLUDED_PATHS' ENV variable
2023-06-12
2023-06-13
2023-06-14
2023-06-19
I’ve deployed the account component but it had a failure. However all the accounts were actually created.
Error: error creating Organizations Policy (acme-gbl-root-organization): DuplicatePolicyException: A policy with the specified name and type already exists.
│
│ with module.account.module.organization_service_control_policies[0].aws_organizations_policy.this[0],
│ on .terraform/modules/account.organization_service_control_policies/main.tf line 37, in resource "aws_organizations_policy" "this":
│ 37: resource "aws_organizations_policy" "this" {
Now if I try to run the deploy again it fails because the accounts already exist. How can I fix my terraform state? Do I need to try and import all those accounts now? Heres the output from the plan when I try to deploy it now. It should be 0 to add but it thinks the accounts aren’t deployed so it wants to add them.
It appears your workspace is misconfigured. There shouldnt be a plat-gbl-root
, since root is not a platform account and is typically in core
. But that’s difficult to debug without seeing code
Do you have a single account
component? Where is that deployed? Could you perhaps share that catalog configuration?
In particular, I’d like to check what your organization_config
looks like. It should be something like this:
organization_config:
root_account:
name: core-root
stage: root
tenant: core
tags:
eks: false
accounts: []
organization:
service_control_policies: []
organizational_units:
- name: core
accounts:
- name: core-auto
tenant: core
stage: auto
tags:
eks: true
- name: core-identity
tenant: core
stage: identity
tags:
eks: false
- name: core-network
tenant: core
stage: network
tags:
eks: false
service_control_policies:
- DenyLeavingOrganization
- name: plat
accounts:
- name: plat-dev
tenant: plat
stage: dev
tags:
eks: true
- name: plat-sandbox
tenant: plat
stage: sandbox
tags:
eks: true
- name: plat-staging
tenant: plat
stage: staging
tags:
eks: true
- name: plat-prod
tenant: plat
stage: prod
tags:
eks: true
service_control_policies:
- DenyLeavingOrganization
and that component (account
) should only be deployed once by the core-gbl-root
stack