#refarch (2024-12)
Cloud Posse Reference Architecture
2024-12-03
It looks like the order of events is
• Review Design • Prepare AWS Org • Init TF backend • Deploy Account Baseline • Deploy Accounts
But the “Deploy Account Baseline” (which precedes “Deploy Accounts”) starts with the text “Now that all accounts have been deployed” & we when run we are seeing an error that the account-map does not exist which seems to make sense since we are working towards deploying accounts.
Based on this, should “Deploy Accounts” be before the baseline ?
@seanlongnyc you’re right, this is a mistake in the sidebar. “Deploy Accounts” should come before “Deploy Account Baseline”. I’ll create a task to fix this.
2024-12-04
Hi folks. We’re trying to swap transit gateway with vpc peering.
We pulled the vpc-peering component - [github.com/cloudposse/terraform-aws-components.git//modules/vpc-peering](http://github.com/cloudposse/terraform-aws-components.git//modules/vpc-peering)
We’re getting this error:
╷
│ Error: Cannot assume IAM Role
│
│ with module.vpc_peering.provider["registry.terraform.io/hashicorp/aws"].accepter,
│ on .terraform/modules/vpc_peering/accepter.tf line 2, in provider "aws":
│ 2: provider "aws" {
│
│ IAM Role (arn:aws:iam::216989127967:role/inno-plat-gbl-dev-terraform) cannot be assumed.
│
│ There are a number of possible causes of this - the most common are:
│ * The credentials used in order to assume the role are invalid
│ * The credentials do not have appropriate permission to assume the role
│ * The role ARN is not valid
│
│ Error: operation error STS: AssumeRole, https response error StatusCode: 403, RequestID: 896f3d54-01de-48e6-9abb-5181b3fa6cd7, api error AccessDenied: User:
│ arn:aws:iam::182399693862:user/SuperAdmin is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::216989127967:role/inno-plat-gbl-dev-terraform
216989127967
refers to a plat account. The role exists in question exists, but only records core-identity
as a trusted entity while this VPC peering request is coming from core-network
. I’m struggling 1) to understand if this would be considered bad practice to enable this additional trusted entity between two non-identity accounts, and 2) the best way to handle this specifically for cross account vpc peering (as a TGW alternative). It appears we need to explicitly create new roles in each of the requester/accepter accounts (seems that’s not abstracted)
I believe this is a red-herring based on this message:
│ Error: operation error STS: AssumeRole, https response error StatusCode: 403, RequestID: 896f3d54-01de-48e6-9abb-5181b3fa6cd7, api error AccessDenied: User:
│ arn:aws:iam::182399693862:user/SuperAdmin is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::216989127967:role/inno-plat-gbl-dev-terraform
Your error is related to your current user and is saying that SuperAdmin cannot assume that Terraform role. Which would be configured with [providers.tf](http://providers.tf)
. We could debug that separately, but you shouldnt need to use SuperAdmin anyway
Instead do you have your inno-identity
profile configured?
The role in Terraform referred to as accepter_aws_assume_role_arn
is the role for Terraform to assume to finish the peering connection in the other account. As long as your current session can assume that role, you should be all set. VPC peering doesnt use that role itself
2024-12-05
Hey folks. we are trying to deploy an ec2 instance and have access to it but needs a key-pair using this module https://github.com/cloudposse/terraform-aws-components/tree/main/modules/ec2-instance
but i don’t see the option or a variable to do it, this have to be done separately?
The default component (root module) doesn’t have an SSH key and instead would use SSM agent to connect to the instance if necessary.
If you’d like to set up a SSH key with your instance, I’d recommend taking a look at the bastion
component as a reference or the example included with the module itself
ty
for what this vars are needed for?
image_container: infrastructure:latest
image_repository: "111111111111.dkr.ecr.us-east-1.amazonaws.com/example/infrastructure"
that’s the image and repository to use as a container on the bastion here: https://github.com/cloudposse-terraform-components/aws-bastion/blob/main/src/templates/container.sh
#!/bin/bash
REGION=${ region }
REPOSITORY=${ image_repository }
IMAGE=$REPOSITORY/${ image_container }
aws ecr get-login-password --region $REGION | docker login --username AWS --password-stdin $REPOSITORY
docker pull $IMAGE
docker run --rm \
-it $IMAGE bash -c "${ container_command }"
i don’t a have an image for this, this cannot be a public docker image? what is usually used?
you could use a your geodesic image if you have one, or push a public image to ECR (such as with a pull through cache), or update the Terraform change that behavior
ty
Prepare the toolbox image for Atmos and Terraform
np!
the bastion image can be changed to another one rather than the amazon linux 2?
for example an oracle image?
im unable to locate a variable for that…
i think i can change this filter
dynamic "filter" {
for_each = {
name = ["amzn2-ami-hvm-2.*-x86_64-ebs"]
}
content {
name = filter.key
values = filter.value
}
}
i was able changing that
ty
yup! that’s the place to update
i tried to add the ssm resources to the ec2-instance module because we need a dedicated instance for a service and i cannot connect to the instance via console using the ssm idk if you have any hint
I would recommend checking the system logs for the instance. Make sure whatever AMI you’re using has SSM agent support or if you install SSM agent that the cloud init script runs successfully
i have this on the instance logs, but still i cannot connect via console ssm
Huh I’m not sure! Perhaps something on this page might help https://docs.aws.amazon.com/systems-manager/latest/userguide/troubleshooting-ssm-agent.html
View SSM Agent log files and troubleshoot the agent.
If not, then I’d recommend trying to make it to one of the community workshop calls next week. We could take a look together and with a few others from cloud posse
2024-12-06
2024-12-16
Hey Team, im trying to get running the efs and storage class on k8s using but im getting, this timeout error but when i do telnet to that addr is reached so connection is good, maybe is auth? do i need to have the cluster config on my .kube
?
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the
following symbols:
+ create
Terraform planned the following actions, but then encountered a problem:
# kubernetes_storage_class_v1.ebs["gp3"] will be created
+ resource "kubernetes_storage_class_v1" "ebs" {
+ allow_volume_expansion = true
+ id = (known after apply)
+ parameters = {
+ "csi.storage.k8s.io/fstype" = "ext4"
+ "encrypted" = "true"
+ "tagSpecification_1" = "Environment=use1"
+ "tagSpecification_2" = "Name=inno-core-use1-auto-eks-cluster"
+ "tagSpecification_3" = "Namespace=inno"
+ "tagSpecification_4" = "Stage=auto"
+ "tagSpecification_5" = "Tenant=core"
+ "type" = "gp3"
}
+ reclaim_policy = "Delete"
+ storage_provisioner = "ebs.csi.aws.com"
+ volume_binding_mode = "WaitForFirstConsumer"
+ metadata {
+ annotations = {
+ "storageclass.kubernetes.io/is-default-class" = "true"
}
+ generation = (known after apply)
+ name = "gp3"
+ resource_version = (known after apply)
+ uid = (known after apply)
}
}
# kubernetes_storage_class_v1.ebs["io2"] will be created
+ resource "kubernetes_storage_class_v1" "ebs" {
+ allow_volume_expansion = true
+ id = (known after apply)
+ parameters = {
+ "csi.storage.k8s.io/fstype" = "ext4"
+ "encrypted" = "true"
+ "iopsPerGB" = "10"
+ "tagSpecification_1" = "Environment=use1"
+ "tagSpecification_2" = "Name=inno-core-use1-auto-eks-cluster"
+ "tagSpecification_3" = "Namespace=inno"
+ "tagSpecification_4" = "Stage=auto"
+ "tagSpecification_5" = "Tenant=core"
+ "type" = "io2"
}
+ reclaim_policy = "Delete"
+ storage_provisioner = "ebs.csi.aws.com"
+ volume_binding_mode = "WaitForFirstConsumer"
+ metadata {
+ annotations = {
+ "storageclass.kubernetes.io/is-default-class" = "false"
}
+ generation = (known after apply)
+ name = "io2"
+ resource_version = (known after apply)
+ uid = (known after apply)
}
}
# kubernetes_storage_class_v1.efs["efs"] will be created
+ resource "kubernetes_storage_class_v1" "efs" {
+ allow_volume_expansion = true
+ id = (known after apply)
+ parameters = {
+ "basePath" = "/efs_controller"
+ "directoryPerms" = "700"
+ "fileSystemId" = "fs-067c7e065d9487d35"
+ "provisioningMode" = "efs-ap"
}
+ reclaim_policy = "Delete"
+ storage_provisioner = "efs.csi.aws.com"
+ volume_binding_mode = "Immediate"
+ metadata {
+ annotations = {
+ "storageclass.kubernetes.io/is-default-class" = "false"
}
+ generation = (known after apply)
+ name = "efs"
+ resource_version = (known after apply)
+ uid = (known after apply)
}
}
Plan: 3 to add, 0 to change, 0 to destroy.
╷
│ Error: Get "<https://0652A3BDF1CD1452BD5E6A105A4FC989.gr7.us-east-1.eks.amazonaws.com/apis/storage.k8s.io/v1/storageclasses/gp2>": net/http: TLS handshake timeout
this error almost always means you cannot access the cluster’s control plane. Are you connected to the private network by VPN or other means?
do i need to have the cluster config on my .kube
You shouldnt need to do anything at all other than connect to the private network
You should also check the cluster’s security group and make sure whatever means you have to connect is allowed
Here are a few tips on debugging connectivity: https://docs.cloudposse.com/layers/eks/faq/#common-connectivity-issues-and-solutions
Frequently asked questions about EKS with Cloud Posse’s reference architecture.
ty