SweetOps #refarch for December, 2024

Cloud Posse Reference Architecture

2024-12-03

github3

03:59:37 PM

It looks like the order of events is

• Review Design • Prepare AWS Org • Init TF backend • Deploy Account Baseline • Deploy Accounts

But the “Deploy Account Baseline” (which precedes “Deploy Accounts”) starts with the text “Now that all accounts have been deployed” & we when run we are seeing an error that the account-map does not exist which seems to make sense since we are working towards deploying accounts.
Based on this, should “Deploy Accounts” be before the baseline ?

image

github3

04:02:56 PM

@seanlongnyc you’re right, this is a mistake in the sidebar. “Deploy Accounts” should come before “Deploy Account Baseline”. I’ll create a task to fix this.

2024-12-04

Matthew Clark

09:11:28 PM

Hi folks. We’re trying to swap transit gateway with vpc peering.

We pulled the vpc-peering component - [github.com/cloudposse/terraform-aws-components.git//modules/vpc-peering](http://github.com/cloudposse/terraform-aws-components.git//modules/vpc-peering)

We’re getting this error:

╷
│ Error: Cannot assume IAM Role
│ 
│   with module.vpc_peering.provider["registry.terraform.io/hashicorp/aws"].accepter,
│   on .terraform/modules/vpc_peering/accepter.tf line 2, in provider "aws":
│    2: provider "aws" {
│ 
│ IAM Role (arn:aws:iam::216989127967:role/inno-plat-gbl-dev-terraform) cannot be assumed.
│ 
│ There are a number of possible causes of this - the most common are:
│   * The credentials used in order to assume the role are invalid
│   * The credentials do not have appropriate permission to assume the role
│   * The role ARN is not valid
│ 
│ Error: operation error STS: AssumeRole, https response error StatusCode: 403, RequestID: 896f3d54-01de-48e6-9abb-5181b3fa6cd7, api error AccessDenied: User:
│ arn:aws:iam::182399693862:user/SuperAdmin is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::216989127967:role/inno-plat-gbl-dev-terraform

216989127967 refers to a plat account. The role exists in question exists, but only records core-identity as a trusted entity while this VPC peering request is coming from core-network. I’m struggling 1) to understand if this would be considered bad practice to enable this additional trusted entity between two non-identity accounts, and 2) the best way to handle this specifically for cross account vpc peering (as a TGW alternative). It appears we need to explicitly create new roles in each of the requester/accepter accounts (seems that’s not abstracted)

Dan Miller (Cloud Posse)

09:16:05 PM

I believe this is a red-herring based on this message:

│ Error: operation error STS: AssumeRole, https response error StatusCode: 403, RequestID: 896f3d54-01de-48e6-9abb-5181b3fa6cd7, api error AccessDenied: User:
│ arn:aws:iam::182399693862:user/SuperAdmin is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::216989127967:role/inno-plat-gbl-dev-terraform

Your error is related to your current user and is saying that SuperAdmin cannot assume that Terraform role. Which would be configured with [providers.tf](http://providers.tf). We could debug that separately, but you shouldnt need to use SuperAdmin anyway

Dan Miller (Cloud Posse)

09:16:23 PM

Instead do you have your inno-identity profile configured?

Dan Miller (Cloud Posse)

09:19:57 PM

The role in Terraform referred to as accepter_aws_assume_role_arn is the role for Terraform to assume to finish the peering connection in the other account. As long as your current session can assume that role, you should be all set. VPC peering doesnt use that role itself

2024-12-05

Christopher Mayora

06:16:23 PM

Hey folks. we are trying to deploy an ec2 instance and have access to it but needs a key-pair using this module https://github.com/cloudposse/terraform-aws-components/tree/main/modules/ec2-instance

but i don’t see the option or a variable to do it, this have to be done separately?

Dan Miller (Cloud Posse)

06:34:00 PM

The default component (root module) doesn’t have an SSH key and instead would use SSM agent to connect to the instance if necessary.

If you’d like to set up a SSH key with your instance, I’d recommend taking a look at the bastion component as a reference or the example included with the module itself

cloudposse-terraform-components/aws-bastion

Christopher Mayora

07:01:06 PM

Christopher Mayora

02:24:08 PM

for what this vars are needed for?

image_container: infrastructure:latest
        image_repository: "111111111111.dkr.ecr.us-east-1.amazonaws.com/example/infrastructure"

Dan Miller (Cloud Posse)

02:55:11 PM

that’s the image and repository to use as a container on the bastion here: https://github.com/cloudposse-terraform-components/aws-bastion/blob/main/src/templates/container.sh

#!/bin/bash
REGION=${ region }
REPOSITORY=${ image_repository }
IMAGE=$REPOSITORY/${ image_container }

aws ecr get-login-password --region $REGION | docker login --username AWS --password-stdin $REPOSITORY

docker pull $IMAGE
docker run --rm \
  -it $IMAGE bash -c "${ container_command }"

Christopher Mayora

03:27:11 PM

i don’t a have an image for this, this cannot be a public docker image? what is usually used?

Dan Miller (Cloud Posse)

03:29:44 PM

you could use a your geodesic image if you have one, or push a public image to ECR (such as with a pull through cache), or update the Terraform change that behavior

Dan Miller (Cloud Posse)

03:30:01 PM

most would use their geodesic image

Christopher Mayora

03:30:21 PM

Dan Miller (Cloud Posse)

03:30:24 PM

https://docs.cloudposse.com/layers/project/toolbox/

Prepare the Toolbox Image | The Cloud Posse Reference Architecture

Prepare the toolbox image for Atmos and Terraform

Dan Miller (Cloud Posse)

03:30:25 PM

np!

Christopher Mayora

04:30:25 PM

the bastion image can be changed to another one rather than the amazon linux 2?

Christopher Mayora

04:30:33 PM

for example an oracle image?

Christopher Mayora

04:30:59 PM

im unable to locate a variable for that…

Christopher Mayora

05:06:55 PM

i think i can change this filter

  dynamic "filter" {
    for_each = {
      name = ["amzn2-ami-hvm-2.*-x86_64-ebs"]
    }
    content {
      name   = filter.key
      values = filter.value
    }
  }

Christopher Mayora

05:22:40 PM

i was able changing that

Christopher Mayora

05:22:40 PM

Dan Miller (Cloud Posse)

05:29:46 PM

yup! that’s the place to update

Christopher Mayora

08:59:47 PM

i tried to add the ssm resources to the ec2-instance module because we need a dedicated instance for a service and i cannot connect to the instance via console using the ssm idk if you have any hint

Dan Miller (Cloud Posse)

09:10:41 PM

I would recommend checking the system logs for the instance. Make sure whatever AMI you’re using has SSM agent support or if you install SSM agent that the cloud init script runs successfully

Christopher Mayora

11:00:04 PM

Christopher Mayora

11:00:16 PM

i have this on the instance logs, but still i cannot connect via console ssm

Dan Miller (Cloud Posse)

11:04:20 PM

Huh I’m not sure! Perhaps something on this page might help https://docs.aws.amazon.com/systems-manager/latest/userguide/troubleshooting-ssm-agent.html

Troubleshooting SSM Agent - AWS Systems Manager

View SSM Agent log files and troubleshoot the agent.

Dan Miller (Cloud Posse)

11:05:29 PM

If not, then I’d recommend trying to make it to one of the community workshop calls next week. We could take a look together and with a few others from cloud posse

2024-12-06

2024-12-16

Christopher Mayora

09:36:41 PM

Hey Team, im trying to get running the efs and storage class on k8s using but im getting, this timeout error but when i do telnet to that addr is reached so connection is good, maybe is auth? do i need to have the cluster config on my .kube ?

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the
following symbols:
  + create

Terraform planned the following actions, but then encountered a problem:

  # kubernetes_storage_class_v1.ebs["gp3"] will be created
  + resource "kubernetes_storage_class_v1" "ebs" {
      + allow_volume_expansion = true
      + id                     = (known after apply)
      + parameters             = {
          + "csi.storage.k8s.io/fstype" = "ext4"
          + "encrypted"                 = "true"
          + "tagSpecification_1"        = "Environment=use1"
          + "tagSpecification_2"        = "Name=inno-core-use1-auto-eks-cluster"
          + "tagSpecification_3"        = "Namespace=inno"
          + "tagSpecification_4"        = "Stage=auto"
          + "tagSpecification_5"        = "Tenant=core"
          + "type"                      = "gp3"
        }
      + reclaim_policy         = "Delete"
      + storage_provisioner    = "ebs.csi.aws.com"
      + volume_binding_mode    = "WaitForFirstConsumer"

      + metadata {
          + annotations      = {
              + "storageclass.kubernetes.io/is-default-class" = "true"
            }
          + generation       = (known after apply)
          + name             = "gp3"
          + resource_version = (known after apply)
          + uid              = (known after apply)
        }
    }

  # kubernetes_storage_class_v1.ebs["io2"] will be created
  + resource "kubernetes_storage_class_v1" "ebs" {
      + allow_volume_expansion = true
      + id                     = (known after apply)
      + parameters             = {
          + "csi.storage.k8s.io/fstype" = "ext4"
          + "encrypted"                 = "true"
          + "iopsPerGB"                 = "10"
          + "tagSpecification_1"        = "Environment=use1"
          + "tagSpecification_2"        = "Name=inno-core-use1-auto-eks-cluster"
          + "tagSpecification_3"        = "Namespace=inno"
          + "tagSpecification_4"        = "Stage=auto"
          + "tagSpecification_5"        = "Tenant=core"
          + "type"                      = "io2"
        }
      + reclaim_policy         = "Delete"
      + storage_provisioner    = "ebs.csi.aws.com"
      + volume_binding_mode    = "WaitForFirstConsumer"

      + metadata {
          + annotations      = {
              + "storageclass.kubernetes.io/is-default-class" = "false"
            }
          + generation       = (known after apply)
          + name             = "io2"
          + resource_version = (known after apply)
          + uid              = (known after apply)
        }
    }

  # kubernetes_storage_class_v1.efs["efs"] will be created
  + resource "kubernetes_storage_class_v1" "efs" {
      + allow_volume_expansion = true
      + id                     = (known after apply)
      + parameters             = {
          + "basePath"         = "/efs_controller"
          + "directoryPerms"   = "700"
          + "fileSystemId"     = "fs-067c7e065d9487d35"
          + "provisioningMode" = "efs-ap"
        }
      + reclaim_policy         = "Delete"
      + storage_provisioner    = "efs.csi.aws.com"
      + volume_binding_mode    = "Immediate"

      + metadata {
          + annotations      = {
              + "storageclass.kubernetes.io/is-default-class" = "false"
            }
          + generation       = (known after apply)
          + name             = "efs"
          + resource_version = (known after apply)
          + uid              = (known after apply)
        }
    }

Plan: 3 to add, 0 to change, 0 to destroy.
╷
│ Error: Get "<https://0652A3BDF1CD1452BD5E6A105A4FC989.gr7.us-east-1.eks.amazonaws.com/apis/storage.k8s.io/v1/storageclasses/gp2>": net/http: TLS handshake timeout

Dan Miller (Cloud Posse)

09:37:38 PM

this error almost always means you cannot access the cluster’s control plane. Are you connected to the private network by VPN or other means?

Dan Miller (Cloud Posse)

09:38:47 PM

do i need to have the cluster config on my .kube You shouldnt need to do anything at all other than connect to the private network

Dan Miller (Cloud Posse)

09:39:16 PM

You should also check the cluster’s security group and make sure whatever means you have to connect is allowed

Dan Miller (Cloud Posse)

09:40:13 PM

Here are a few tips on debugging connectivity: https://docs.cloudposse.com/layers/eks/faq/#common-connectivity-issues-and-solutions