SweetOps #refarch for June, 2023

Cloud Posse Reference Architecture

2023-06-01

Michael Dizon

set up an aurora rds instance using aurora-postgres but can’t seem to access it over the internet. i’ve added 0.0.0.0/0 to allowed_cidr_blocks and set publicly_accessible to true but still nothing is there anything obvious that i’m missing?

Andriy Knysh (Cloud Posse)

04:01:09 PM

must be in a public subnet as well

Andriy Knysh (Cloud Posse)

04:01:46 PM

for RDS, it’s three things that need to be addressed to access it from the internet:

Andriy Knysh (Cloud Posse)

04:02:44 PM

Network level - must be in a public subnet or accessible via internet in any other way. Also check if in the subnet where the RDS is deployed assigning public IPs at launch is enabled

Andriy Knysh (Cloud Posse)

04:03:24 PM

Security group level - check if the SG allows the access

Andriy Knysh (Cloud Posse)

04:04:00 PM

DNS level - publicly_accesible set to true (which creates a public DNS endpoint for the RDS instance), which you did

Andriy Knysh (Cloud Posse)

04:08:42 PM

i guess you are missing #1 (or related to it)

Michael Dizon

04:09:30 PM

i’ll double check everything

Michael Dizon

04:35:23 PM

it looks like it’s associated to the private subnets in the code. is that intentional? https://github.com/cloudposse/terraform-aws-components/blob/main/modules/aurora-postgres/cluster-regional.tf#L24

  subnets                             = local.private_subnet_ids

Andriy Knysh (Cloud Posse)

04:42:30 PM

the component is opinionated, it’s intentional since it’s best practice to deploy it into private subnets (and (almost) never in public)

Andriy Knysh (Cloud Posse)

04:42:46 PM

the module supports both private and public

Andriy Knysh (Cloud Posse)

05:01:09 PM

the component can def be updated to support both using a flag

Michael Dizon

05:04:45 PM

i’m still struggling getting pubic access. i updated that line to concat the public and private subnets

subnets = concat(local.private_subnet_ids, local.public_subnet_ids)

and the SG is allowing inbound and outbound traffic from/to 0.0.0.0/0

Michael Dizon

05:07:54 PM

does this network acl (for public subnet) look normal?

Andriy Knysh (Cloud Posse)

05:27:17 PM

after concating, is the instance in a public subnet?

Michael Dizon

05:32:15 PM

under rds > subnet groups > xxx-ue1-dev-xxx-db-shared? yes, i see the public subnets

Michael Dizon

05:32:28 PM

public and private

Michael Dizon

07:05:59 PM

strange. i destroyed and recreated it, and it seems to work now

2023-06-07

jwood

03:34:25 PM

Trying to cold start an AWS account with atmos, and I’m running into a problem with cloudposse/terraform-aws-components/tfstate-backend (git tag 1.210.0):

Error: Unsupported argument
│ 
│   on .terraform/modules/tfstate_backend.log_storage/main.tf line 158, in data "aws_iam_policy_document" "aggregated_policy":
│  158:   source_json   = var.policy
│ 
│ An argument named "source_json" is not expected here.

Looks like source_json and override_json were deprecated in AWS provider v4, and removed in v5. Any good workaround for this?

jwood

03:47:31 PM

Looks like the tfstate-backend root module in the cloudposse/terraform-aws-components repo still points to an old version of cloudposse/tfstate-backend/aws in the terraform registry. Hmm.

I guess this means I just have to write my own root module to use the newer version of the tfstate-backend?

02:01:22 AM

You could always vendor the tfstate-backend and then bump the version of the downstreamed module to the latest version. The components are a reusable base and are owned by the consumer. The modules are re-usable and unchange-able except by PR. This gives you freedom in the components (root tf module/dir) to modify as desired.

https://github.com/cloudposse/terraform-aws-components/blob/83f0021cd63c9148b76135fc24ce769d852b39ec/modules/tfstate-backend/main.tf#L7

https://github.com/cloudposse/terraform-aws-tfstate-backend/releases

Erik Osterman (Cloud Posse)

02:54:42 PM

@Jeremy G (Cloud Posse) @Max Lobur (Cloud Posse)

Max Lobur (Cloud Posse)

03:34:52 PM

Correct, I’m about to release that fix for V5, expect today

Max Lobur (Cloud Posse)

03:35:18 PM

The workaround would be to pin aws provider to “<5” if it doesn’t brake you anything else

Max Lobur (Cloud Posse)

04:13:51 PM

https://github.com/cloudposse/terraform-aws-components/pull/716/files

Jeremy G (Cloud Posse)

09:43:32 PM

@Dan Miller (Cloud Posse) @Max Lobur (Cloud Posse) I uploaded the latest changes.

2023-06-08

Erik Osterman (Cloud Posse)

10:27:37 PM

https://sweetops.slack.com/archives/C0313BVSVQD/p1686261407166159

I’m using atmos and cloudposse/cloud-infrastructure-automation/spacelift, and trying to figure out if the “component instance name” is passed to the spacelift stack anywhere. I’m not super familiar with atmos tbh, the name I’m referring to is the actual key under components.terraform in the stack yaml (not necessarily the name of the component itself). Anyone know if this is something I can get at?

Erik Osterman (Cloud Posse)

10:28:17 PM

@Nat Williams before we can answer in the best way, can you instead lead by what you want to accomplish (want to avoid xyproblem.info)

Andriy Knysh (Cloud Posse)

10:50:47 PM

in general, a Spacelift stack name is calculated from the context + the Atmis component name.

The context is: namespace, tenant, environment, stage

Andriy Knysh (Cloud Posse)

10:50:58 PM

let’s say you have:

Andriy Knysh (Cloud Posse)

10:52:38 PM

vars:
  namespace: eg
  tenant: core
  environment: ue2
  stage: prod

components:
  terraform:
    vpc-1:
      metadata:
        component: vpc # Point to terraform component
      vars: {}

    vpc-2:
      metadata:
        component: vpc # Point to terraform component
      vars: {}

Andriy Knysh (Cloud Posse)

10:53:07 PM

the Spacelift stack names for these two Atmos components will be:

Andriy Knysh (Cloud Posse)

10:53:28 PM

eg-core-ue2-prod-vpc-1
eg-core-ue2-prod-vpc-2

Andriy Knysh (Cloud Posse)

10:54:28 PM

although:

Spacelift stack name can be overridden (per Atmos component)
The order of the context variables can be overridden in atmos.yaml
Which context variables are used in the stack names is set in atmos.yaml (e.g. you might not use namespace and tenant, in which case the Spacelift stack names will be

Andriy Knysh (Cloud Posse)

10:56:11 PM

ue2-prod-vpc-1
ue2-prod-vpc-2

Nat Williams

11:13:29 PM

Yeah, considering it a bit more, I guess it’s not really a Spacelift thing at all. I think ultimately I’m just surprised that vpc-1 doesn’t show up in the output of atmos terraform generate varfile

Nat Williams

11:14:04 PM

I would expect that to be in the name var

Andriy Knysh (Cloud Posse)

11:23:32 PM

the name var - you set it as well

Andriy Knysh (Cloud Posse)

11:24:29 PM

it’s part of the context which is used to uniquely and consistently name the AWS resources

Nat Williams

11:24:51 PM

yeah, I just didn’t want to have to do it myself

Andriy Knysh (Cloud Posse)

11:25:12 PM

in many cases the name var can be the same as the Atmos component name

Andriy Knysh (Cloud Posse)

11:25:23 PM

but in some cases they are different

Nat Williams

10:28:26 PM

@Nat Williams has joined the channel

Colby Chenard

11:21:24 PM

After you deploy the account module, each member account gets an email like aws+<account_name>@mycompany.com. How do you access each email to get the reset password link? In the docs you guys metion automation to get it forwarded to a shared slack channel. Did I miss a step?

For each new account:

Perform a password reset by attempting to log in to the AWS console as a "root user", using that account's email address, and then clicking the "Forgot password?" link. You will receive a password reset link via email, which should be forwarded to the shared Slack channel for automated messages. Click the link and enter a new password. (Use 1Password or Random.org to create a password 26-38 characters long, including at least 3 of each class of character: lower case, uppercase, digit, and symbol. You may need to manually combine or add to the generated password to ensure 3 symbols and digits are present.) Save the email address and generated password as web login credentials in 1Password. While you are at it, save the account number in a separate field.

Colby Chenard

11:22:51 PM

are we supposed to create the emails ahead of time?

Dan Miller (Cloud Posse)

01:27:50 AM

if youre using plus addressing, these emails will all go to the same address. And we typically will create a new Slack channel for notifications and then set up an email integration with the primary email address

Colby Chenard

01:34:33 AM

Ah okay thanks!

Colby Chenard

01:35:47 AM

Is the primary email the email in the management account?

Dan Miller (Cloud Posse)

01:43:53 AM

it should be the base email that you used. [email protected] for example, so then each account would be [email protected], [email protected], etc

Dan Miller (Cloud Posse)

01:43:53 AM

https://docs.cloudposse.com/reference-architecture/how-to-guides/implementation/enterp[…]ement-aws-cold-start/how-to-set-up-aws-email-notifications/

Dan Miller (Cloud Posse)

01:44:38 AM

this page also has some of our initial set up before even the cold start https://docs.cloudposse.com/reference-architecture/quickstart/kick-off/#slack

Erik Osterman (Cloud Posse)

12:25:08 PM

(full disclosure, those are behind our paywall)

Colby Chenard

02:38:05 PM

@Dan Miller (Cloud Posse) now knowing how this works is it possible to redeploy the component and update the emails?

Erik Osterman (Cloud Posse)

02:42:26 PM

I am not sure if Terraform can update account emails via the API. It’s now supported (as in AWS added support for that in the past ~6 mo), however, we’ve not tested it.

Erik Osterman (Cloud Posse)

02:42:38 PM

Historically, it’s a MAJOR PIA to get your account emails wrong.

Colby Chenard

02:42:45 PM

crap

Erik Osterman (Cloud Posse)

02:43:02 PM

(it’s bitten us too)

Colby Chenard

02:43:10 PM

I deployed them as

aws+%[email protected]

Colby Chenard

02:43:16 PM

It should have been

devops+%[email protected]

Erik Osterman (Cloud Posse)

02:43:20 PM

Aha

Colby Chenard

02:43:29 PM

not sure what to do

Erik Osterman (Cloud Posse)

02:43:34 PM

Well, the worst case is just create the [[email protected]](mailto:[email protected]) alias

Erik Osterman (Cloud Posse)

02:43:41 PM

on the devops group

Erik Osterman (Cloud Posse)

02:43:49 PM

Then it will just work as expected. You can do resets.

Colby Chenard

02:44:11 PM

what do you mean by devops group?

Erik Osterman (Cloud Posse)

02:44:20 PM

[devops+%[email protected]](mailto:devops+%[email protected])

Erik Osterman (Cloud Posse)

02:44:35 PM

Our standard recommendation is to make this a group (aka distribution list, aka google group)

Erik Osterman (Cloud Posse)

02:44:44 PM

Not a user account.

Colby Chenard

02:46:38 PM

ah ha okay

Colby Chenard

02:58:42 PM

Since nothing is deployed on the member accounts could I just delete them all and just deploy again with the correct email?

Colby Chenard

03:49:47 PM

@Erik Osterman (Cloud Posse) nvm got it working with the alias! Thank you so much!

Dan Miller (Cloud Posse)

04:11:14 PM

you can put in an AWS service request to change that email, but it can take several days. But since you have the alias now and it’s not blocking you, you could put in that request now to eventually get those emails corrected

Colby Chenard

04:11:47 PM

Awesome thanks for your help guys!

Erik Osterman (Cloud Posse)

02:42:51 PM

Did you get it resolved?

Colby Chenard

03:13:29 PM

I sure did!

2023-06-09

Imran Hussain

05:32:52 PM

I have a question related to bootstrapping a new account with atmos. When you have been given a fresh account no S3 or dynamoDB to store your backend is there a special case/actions that need to can be done by atmos to create those resources using local backend before it can be configured to use the S3 bucket and dynamodb as the backends for all subsequent runs. If so is there an example of said setup that I can refer to. So what I want is to run Atmos to create the remote backend then use the remote backend setup by atmos to then be used for further provisioning.

Andriy Knysh (Cloud Posse)

05:40:57 PM

this is not directly related to Atmos. You have to first create a backend before you have a backend to store the TF state in. So you create it using the local backend (TF will store the state on your local computer), then you define (or uncomment) the S3 backend, and TF will ask you to copy the local state to the remote S3 backend

Andriy Knysh (Cloud Posse)

05:40:59 PM

see https://github.com/cloudposse/terraform-aws-tfstate-backend

cloudposse/terraform-aws-tfstate-backend

Terraform module that provision an S3 bucket to store the terraform.tfstate file and a DynamoDB table to lock the state file to prevent concurrent modifications and state corruption.

Andriy Knysh (Cloud Posse)

05:42:06 PM

(in Atmos, you can comment out the S3 backend definition first, create the backend, then uncomment it, then run terraform plan again, and TF will ask you to copy the state from local to S3)

Imran Hussain

05:42:54 PM

Thanks for the quick response.

Erik Osterman (Cloud Posse)

09:03:26 PM

Conceptually, this is how it works: https://github.com/cloudposse/terraform-aws-tfstate-backend/tree/main#create

In our reference architecture, here’s how we do it:

Create a workflow file like this to automate it

workflows:
  init/tfstate:
    description: Provision Terraform State Backend for initial deployment.
    steps:
      - command: terraform deploy tfstate-backend -var=access_roles_enabled=false --stack core-{{ region_0 }}-root --auto-generate-backend-file=false
      - command: until aws s3 ls {{ cookiecutter.namespace }}-core-{{ region_0 }}-root-tfstate; do sleep 5; done
        type: shell
      - command: terraform deploy tfstate-backend -var=access_roles_enabled=false --stack core-{{ region_0 }}-root --init-run-reconfigure=false

Workflows | atmos

Workflows are a way of combining multiple commands into one executable unit of work.

Erik Osterman (Cloud Posse)

09:03:47 PM

Oh, replace that {{ region_0 }} with your region.

Erik Osterman (Cloud Posse)

09:04:06 PM

and replace {{ cookiecutter.namespace } with your namespace

Imran Hussain

08:14:45 AM

I will have a try once I am out of all the Monday meeting to see how far I get. I am sure I will have further questions.

Imran Hussain

01:57:12 PM

Maybe I am missing something here. I have the following structure

Imran Hussain

01:57:19 PM

.
├── components
│  └── terraform
│     └── infra-init
│        └── main.tf
└── stacks
   └── workflows
      └── env-init.workflow

Imran Hussain

01:57:51 PM

I use .env in the root to define my environment variables in an .envrc

Imran Hussain

01:58:21 PM

export ATMOS_CLI_CONFIG_PATH=${PWD}/.atmos
export ATMOS_BASE_PATH=${PWD}

Imran Hussain

01:58:40 PM

so when I cd in to the directory my environment variables get set

Imran Hussain

02:00:34 PM

ATMOS_BASE_PATH=atmos
ATMOS_CLI_CONFIG_PATH=/atmos/.atmos

Imran Hussain

02:01:08 PM

My atmos.yml is the default one

Imran Hussain

02:01:37 PM

base_path: ""

components:
  terraform:
    # Can also be set using 'ATMOS_COMPONENTS_TERRAFORM_BASE_PATH' ENV var, or '--terraform-dir' command-line argument
    # Supports both absolute and relative paths
    base_path: "components/terraform"
    # Can also be set using 'ATMOS_COMPONENTS_TERRAFORM_APPLY_AUTO_APPROVE' ENV var
    apply_auto_approve: false
    # Can also be set using 'ATMOS_COMPONENTS_TERRAFORM_DEPLOY_RUN_INIT' ENV var, or '--deploy-run-init' command-line argument
    deploy_run_init: true
    # Can also be set using 'ATMOS_COMPONENTS_TERRAFORM_INIT_RUN_RECONFIGURE' ENV var, or '--init-run-reconfigure' command-line argument
    init_run_reconfigure: true
    # Can also be set using 'ATMOS_COMPONENTS_TERRAFORM_AUTO_GENERATE_BACKEND_FILE' ENV var, or '--auto-generate-backend-file' command-line argument
    auto_generate_backend_file: false

stacks:
  # Can also be set using 'ATMOS_STACKS_BASE_PATH' ENV var, or '--config-dir' and '--stacks-dir' command-line arguments
  # Supports both absolute and relative paths
  base_path: "stacks"
  # Can also be set using 'ATMOS_STACKS_INCLUDED_PATHS' ENV var (comma-separated values string)
  included_paths:
    - "orgs/**/*"
  # Can also be set using 'ATMOS_STACKS_EXCLUDED_PATHS' ENV var (comma-separated values string)
  excluded_paths:
    - "**/_defaults.yaml"
  # Can also be set using 'ATMOS_STACKS_NAME_PATTERN' ENV var
  name_pattern: "{tenant}-{environment}-{stage}"

workflows:
  # Can also be set using 'ATMOS_WORKFLOWS_BASE_PATH' ENV var, or '--workflows-dir' command-line arguments
  # Supports both absolute and relative paths
  base_path: "stacks/workflows"

logs:
  file: "/dev/stdout"
  # Supported log levels: Trace, Debug, Info, Warning, Off
  level: Info

Imran Hussain

02:01:57 PM

I create a workflow under stacks to boot strap the environment

Imran Hussain

02:03:01 PM

more env-init.workflow
workflows:
  init/tfstate:
    description: Provision Terraform State Backend for initial deployment.
    steps:
      - command: terraform deploy tfstate-backend -var=access_roles_enabled=false --stack core-eu-west-1-root --auto-generate-backend-file=false
      - command: until aws s3 ls test-core-eu-west-1-root-tfstate; do sleep 5; done
        type: shell
      - command: terraform deploy tfstate-backend -var=access_roles_enabled=false --stack core-eu-west-1-root --init-run-reconfigure=false

Imran Hussain

02:03:48 PM

I create a stack/component

module "terraform_state_backend" {
  source = "cloudposse/tfstate-backend/aws"
  # Cloud Posse recommends pinning every module to a specific version
   vrsion     = "1.1.1"
  namespace  = var.namespace
  stage      = var.stage
  name       = "env-init"
  attributes = ["state"]

  terraform_backend_config_file_path = "."
  terraform_backend_config_file_name = "backend.tf"
  force_destroy                      = false
}

Imran Hussain

02:04:15 PM

I run

atmos workflow init/tfstate -f env-init.workflow -s ../../components/terraform/infra-init/main.tf --dry-run

Imran Hussain

02:04:25 PM

get the following error

Imran Hussain

02:05:13 PM

failed to find a match for the import '/stacks/orgs/**/*.yaml' ('stacks/orgs' + '**/*.yaml')

Imran Hussain

02:08:26 PM

So the issue the first is this include path

Imran Hussain

02:08:40 PM

included_paths: 40 - “orgs/*/” 1 # Can also be set us in atmos.yaml

Imran Hussain

02:11:13 PM

If I remove it I get

Imran Hussain

02:11:16 PM

at least one path must be provided in 'stacks.included_paths' config or ATMOS_STACKS_INCLUDED_PATHS' ENV variable

2023-06-12

2023-06-13

2023-06-14

2023-06-19

Colby Chenard

08:47:33 PM

I’ve deployed the account component but it had a failure. However all the accounts were actually created.

 Error: error creating Organizations Policy (acme-gbl-root-organization): DuplicatePolicyException: A policy with the specified name and type already exists.
│ 
│   with module.account.module.organization_service_control_policies[0].aws_organizations_policy.this[0],
│   on .terraform/modules/account.organization_service_control_policies/main.tf line 37, in resource "aws_organizations_policy" "this":
│   37: resource "aws_organizations_policy" "this" {

Now if I try to run the deploy again it fails because the accounts already exist. How can I fix my terraform state? Do I need to try and import all those accounts now? Heres the output from the plan when I try to deploy it now. It should be 0 to add but it thinks the accounts aren’t deployed so it wants to add them.

Dan Miller (Cloud Posse)

09:08:39 PM

It appears your workspace is misconfigured. There shouldnt be a plat-gbl-root, since root is not a platform account and is typically in core. But that’s difficult to debug without seeing code

Do you have a single account component? Where is that deployed? Could you perhaps share that catalog configuration?

Dan Miller (Cloud Posse)

09:10:50 PM

In particular, I’d like to check what your organization_config looks like. It should be something like this:

      organization_config:
          root_account:
            name: core-root
            stage: root
            tenant: core
            tags:
              eks: false
          accounts: []
          organization:
            service_control_policies: []
          organizational_units:
            - name: core
              accounts:
                - name: core-auto
                  tenant: core
                  stage: auto
                  tags:
                    eks: true
                - name: core-identity
                  tenant: core
                  stage: identity
                  tags:
                    eks: false
                - name: core-network
                  tenant: core
                  stage: network
                  tags:
                    eks: false
              service_control_policies:
                - DenyLeavingOrganization
            - name: plat
              accounts:
                - name: plat-dev
                  tenant: plat
                  stage: dev
                  tags:
                    eks: true
                - name: plat-sandbox
                  tenant: plat
                  stage: sandbox
                  tags:
                    eks: true
                - name: plat-staging
                  tenant: plat
                  stage: staging
                  tags:
                    eks: true
                - name: plat-prod
                  tenant: plat
                  stage: prod
                  tags:
                    eks: true
              service_control_policies:
                - DenyLeavingOrganization

Dan Miller (Cloud Posse)

09:11:36 PM

and that component (account) should only be deployed once by the core-gbl-root stack

#refarch (2023-06)

2023-06-01

2023-06-07

2023-06-08

2023-06-09

2023-06-12

2023-06-13

2023-06-14

2023-06-19

2023-06-21