SweetOps #aws for September, 2024

Discussion related to Amazon Web Services (AWS)

Archive: https://archive.sweetops.com/aws/

2024-09-02

Emmanuel O

Hello, I’m currently facing an issue with aws load balancer. I have an ecs fargate cluster with about five tasks. However, I noticed that these instances dont scale pass 10 users during a load test. Upon further debugging, I had to ssh into each of these instances and did a htop to see the cpu and memory utilization of these five tasks. I noticed that one of these tasks had 100% cpu utilization and the rest had no cpu utilization. This makes the cpu utilization get very high and makes the ecs tasks unhealthy and unable to receive more traffic. This image shows the ecs cpu utilization for one instance. How can i ensure this traffic is evenly distributing the traffic to all tasks in the ecs service. Upon checking my load balancer access logs, I also noticed that a lot of requests came from one ip address. I tried modifying the load balancer traffic distribution style to round robin but it still doesnt distribute traffic evenly across all my tasks. What can I do to ensure scalability of my application? Has anyone faced this ?

Gabriela Campana (Cloud Posse)

01:57:52 PM

@Jeremy White (Cloud Posse)

Michael Galey

06:56:58 PM

you likely have session stickiness on on the load balancer, which means each unique session goes to the same instance, to allow for things like safe deploys (once you hit app version 2.0, you no longer hit 1.0). Your load tester is 1 session, so it always hits 1 server. You could temporarily turn off session stickiness during the load test, or else you might have to lookup how it determines that stickiness, and randomize that on the load tester side. AWS Load balancer sets a cookie for that stickiness, so you’d need to clear cookies. Cookie name is AWSALB at least on mine

Emmanuel O

10:46:46 AM

Thanks @Michael Galey Currently stickiness if turned off. The problem is that one ecs task takes 87% cpu while the remaining four tasks have 0% utilization. So this has an impact on the scalability of the system. Do you know how I can resolve this ?

Jeremy White (Cloud Posse)

12:39:17 PM

usually I think about python applications as needing something to share requests. There are a few ways to do this, but a common couple to try first are gunicorn and uwsgi . Do you have any application that’s sharing the listener port with your server threads/PIDs?

Jeremy White (Cloud Posse)

12:39:36 PM

Here’s a site on uWSGI: https://uwsgi-docs.readthedocs.io/en/latest/WSGIquickstart.html

Jeremy White (Cloud Posse)

12:41:32 PM

if you are using one of those tool already, could you share your config? Doesn’t have to be all the gory details, but at least some notion of how it decides to spawn your application on a request

Michael Galey

02:38:43 PM

the above would be per server, he’s already at full utilization on one server. His load balancer is not balancing the load, at least not from a single source.

Michael Galey

02:42:40 PM

if you do 2-3 load tests of smaller size from a few diff ips, does it go to that number of ecs tasks? I’d suggest trying least-request if that’s an option for the load balancer, otherwise maybe just start simple, follow some tutorial for a basic load balancer + hello world, and compare the load balancer config / target group config against yours. I don’t see how it’s the app’s fault here, I think it’s the load balancer config + single origin ip

Jeremy White (Cloud Posse)

04:12:41 PM

I better understand now. I’m not sure what’s up, but are you using target groups? Do all the tasks show as healthy?

2024-09-03

2024-09-04

2024-09-05

Veerapandian M

05:07:34 PM

Hi, Team. I am looking for help with Azure DevOps repository + AWS Amplify deployment.

bradym

06:04:34 PM

You’re more likely to get help if you ask questions.

Zing

01:37:45 AM

https://github.com/aws/containers-roadmap/issues/474 hey there, how are you all working around aws’ silly limitation on EKS access entries not supporting wildcards? it’s a nightmare for permission set arns, since they have that random string at the end of the permission set role

#474 [EKS] [request]: EKS authentication rolearn wildcard support aka improved support for AWS Identity Center SSO

Tell us about your request
Support basic glob wildcard rolearn matching for aws-auth configmap that controls iam role eks auth.

Which service(s) is this request for?
EKS

Tell us about the problem you’re trying to solve. What are you trying to do, and why is it hard?
Trying to avoid hardcoding lots of IAM role arns into the aws-auth configmap. It would be useful if basic glob wildcard matching worked in the rolearn field of each role mapping:

apiVersion: v1
kind: ConfigMap
metadata:
  name: aws-auth
  namespace: kube-system
data:
  mapRoles: |
    - groups: [AcmeCorp]
      rolearn: arn:aws:iam::111122223333:role/teams/*
      username: AcmeCorp

Are you currently working around this issue?
Individually specifying each rolearn and updating the configmap everytime these roles change:

apiVersion: v1
kind: ConfigMap
metadata:
  name: aws-auth
  namespace: kube-system
data:
  mapRoles: |
    - groups: [AcmeCorp]
      rolearn: arn:aws:iam::111122223333:role/teams/SomeTeam
      username: SomeTeam
    - groups: [AcmeCorp]
      rolearn: arn:aws:iam::111122223333:role/teams/AnotherTeam
      username: AnotherTeam

Additional context
I tried using a * on a working rolearn field and the role became unable to authenticate with the api server. EKS version (Im not sure what component handles this auth delegation, so I dont know of another relevant version to check for that):

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.1", GitCommit:"4485c6f18cee9a5d3c3b4e523bd27972b1b53892", GitTreeState:"clean", BuildDate:"2019-07-18T14:25:20Z", GoVersion:"go1.12.7", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.10-eks-5ac0f1", GitCommit:"5ac0f1d9ab2c254ea2b0ce3534fd72932094c6e1", GitTreeState:"clean", BuildDate:"2019-08-20T22:39:46Z", GoVersion:"go1.11.13", Compiler:"gc", Platform:"linux/amd64"}

01:47:33 AM

Codify each arn in aws auth config

#474 [EKS] [request]: EKS authentication rolearn wildcard support aka improved support for AWS Identity Center SSO

Tell us about your request
Support basic glob wildcard rolearn matching for aws-auth configmap that controls iam role eks auth.

Which service(s) is this request for?
EKS

apiVersion: v1
kind: ConfigMap
metadata:
  name: aws-auth
  namespace: kube-system
data:
  mapRoles: |
    - groups: [AcmeCorp]
      rolearn: arn:aws:iam::111122223333:role/teams/*
      username: AcmeCorp

Are you currently working around this issue?
Individually specifying each rolearn and updating the configmap everytime these roles change:

apiVersion: v1
kind: ConfigMap
metadata:
  name: aws-auth
  namespace: kube-system
data:
  mapRoles: |
    - groups: [AcmeCorp]
      rolearn: arn:aws:iam::111122223333:role/teams/SomeTeam
      username: SomeTeam
    - groups: [AcmeCorp]
      rolearn: arn:aws:iam::111122223333:role/teams/AnotherTeam
      username: AnotherTeam

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.1", GitCommit:"4485c6f18cee9a5d3c3b4e523bd27972b1b53892", GitTreeState:"clean", BuildDate:"2019-07-18T14:25:20Z", GoVersion:"go1.12.7", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.10-eks-5ac0f1", GitCommit:"5ac0f1d9ab2c254ea2b0ce3534fd72932094c6e1", GitTreeState:"clean", BuildDate:"2019-08-20T22:39:46Z", GoVersion:"go1.11.13", Compiler:"gc", Platform:"linux/amd64"}

01:47:53 AM

Now days the aws auth config file is deprecated

01:51:13 AM

Deprecated method https://docs.aws.amazon.com/eks/latest/userguide/auth-configmap.html

01:51:33 AM

Access entries is the new way https://docs.aws.amazon.com/eks/latest/userguide/access-entries.html

01:55:29 AM

Migrating https://docs.aws.amazon.com/eks/latest/userguide/grant-k8s-access.html#authentication-modes

02:00:41 AM

You can’t use wildcards/globs in aws auth config or in eks access entries

02:01:33 AM

One way to do it, if youre using terraform, whether with aws auth config or access entries, you can use the data source aws iam roles, specify a wildcard to retrieve all the arns, and then populate the arns in your config map or access entries

02:07:01 AM

https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_roles

data "aws_iam_roles" "default" {
  name_regex = ".*project.*"
}

02:08:48 AM

https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/eks_access_entry

resource "aws_eks_access_entry" "default" {
  for_each = data.aws_iam_roles.default

  cluster_name      = aws_eks_cluster.default.name
  principal_arn     = each.value.arn
  kubernetes_groups = ["group-1", "group-2"]
  type              = "STANDARD"
}

Zing

02:39:26 AM

yeah, i saw that workaround in the thread

Zing

02:39:38 AM

but its so hacky

Zing

02:40:02 AM

i also can see it breaking since we use terragrunt, and it’s hard to pass in data calls into module inputs

Zing

02:40:40 AM

so we’d have to just rely on “in-module” access entries, and ignore the terragrunt layer, i think (maybe not - haven’t thought abt it enough)

Erik Osterman (Cloud Posse)

07:04:48 PM

Yes, we’ve run into this as well. It’s one of the reasons we also implement the aws-teams and aws-team-roles architecture in our reference architecture and allow permission sets to assume them. This allows us to have consistent roles for both programmatic/machine access (e.g. GitHub OIDC) as well as for developers.

Erik Osterman (Cloud Posse)

07:05:06 PM

We talk more about our approach here https://docs.cloudposse.com/layers/identity/

Identity and Authentication | The Cloud Posse Reference Architecture

Setup fine-grained access control for an entire organization

Zing

01:04:32 AM

thanks! i’ve been considering going with this approach, but i’m hesitant because of the extra assume role hop (for human users)

Zing

01:04:55 AM

especially non-technical users

Erik Osterman (Cloud Posse)

02:38:08 PM

Yes, the extra assume role hop is really for technical users

Erik Osterman (Cloud Posse)

02:38:30 PM

But sounds like you’re using EKS. You have technical users.

Zing

04:10:53 AM

yeah, but we have a fair amount of non technical users (clickops in the console), and i’m not sure what we’d do for them. i think the extra assume role in the console would make dem go nuts but i guess a hybrid approach could work too…

Erik Osterman (Cloud Posse)

02:46:55 PM

@Jeremy G (Cloud Posse) might have some updated ideas on this we haven’t yet tried. He’s OoO though.

2024-09-06

2024-09-07

Mark

08:38:31 AM

Hey everyone, Recently we did a change on our ECS infrastructure. We’ve transitioned to using AWS service discovery and have configured our containers to use HTTPS on their hostnames. After resolving various issues with appsettings and Dockerfiles, the HTTPS port is now open. Previously, we used an ALB for each service in the ECS cluster. With the move to HTTPS and service discovery, we need to set HTTPS as the port for service health checks. The challenge we’re facing is that target groups don’t allow us to define a hostname for service discovery. You might wonder why we switched to HTTPS. The decision was driven by difficulties we encountered with service discovery, which we found were best addressed by using HTTPS. I’ve attached the task definition file for one of the services and the appsettings file. These should help illustrate the issue with the target group’s inability to accept a hostname. Just a note: I’m fairly new to DevOps—only been in this field for two months—and I’m really enjoying the learning process!

andrey.a.devyatkin

01:36:45 PM

this could be useful https://www.youtube.com/watch?v=z1WQ-YSAsVY

FivexL Live: Keeping your data secure in transit with ECS Service Connect | AWS | Terraform | 2024

andrey.a.devyatkin

01:37:35 PM

here is the text version with links https://fivexl.io/blog/ecs-service-connect-encryption/

Keeping your data secure in transit with ECS Service Connect

Deep-dive into AWS ECS Service Connect. How startup can enable encryption in transit with ECS Service Connect and ECS Fargate deployment

Fizz

11:37:59 PM

Why don’t you let the ECS service manage registration with the target group for you? It’s allowed to use both service discovery AND allow the ECS service to register task IPs with the target group

Mark

06:29:33 AM

@andrey.a.devyatkin Thank you so much for the video! @Fizz Yup, this is what i chose to go with, during the creation process when defining the ECS service, i choose the load balancer and then i was prompted to create the Target Group & add it to the listener. All the ports of which i used use HTTPS now and it’s working perfectly, am just adding a route 53 entry and then we will go public!

Rishav

09:46:51 AM

This is super neat to see, and do share your experience during and after the process! I’m keen to implement ECS Service Connect within Fargate target groups using Terraform provisioning, as soon as I’m able wrap my mind around it.

andrey.a.devyatkin

10:31:09 AM

@Rishav checkout blog post and video above - they go into details of ECS Service Connect implementation for ECS/Fargate

Veerapandian M

04:07:58 PM

I am a team looking for help with the yml pipeline for Azure DevOps to Azure static Apps service in the nextjs application.

Hao Wang

01:05:48 AM

I worked on Azure for a while, hope you’ve worked it out already

Veerapandian M

09:17:08 AM

Thank you for your response; I have resolved the pipeline issues; however, the deployment is taking time; I am working on skipping a few items.

Hao Wang

02:49:41 PM

cool

Veerapandian M

02:50:03 PM

Hao Wang

02:59:35 PM

Hi there, how is it going?

2024-09-08

2024-09-09

Dexter Cariño

03:51:48 PM

Hello, how to deploy docker compose on aws fargate? I searched some but its outdated/retired.

Darren Cunningham

04:03:31 PM

I think you’re looking for https://github.com/aws/amazon-ecs-cli

Dexter Cariño

06:41:01 AM

will check on this.

Dexter Cariño

06:41:03 AM

thank you

Erik Osterman (Cloud Posse)

03:02:47 PM

Yea, unfortunately they deprecated docker compose deployments to ECS in the docker-compose CLI

Erik Osterman (Cloud Posse)

03:02:53 PM

We were really bummed about that

Erik Osterman (Cloud Posse)

03:04:41 PM

wow, @Darren Cunningham i didn’t know about this new ECS cli

Erik Osterman (Cloud Posse)

03:05:37 PM

And this amazon-ecs-cli is different from the other ECS cli by AWS https://aws.github.io/copilot-cli/

AWS Copilot CLI

Develop, Release and Operate Container Apps on AWS.

Erik Osterman (Cloud Posse)

09:06:20 PM

And then Copilot bites the dust https://github.com/aws/copilot-cli/issues/5987#issuecomment-2494477701

Comment on #5987 Is copilot-cli still maintained?

Turns out there is a discussion about this aws/copilot-cli#5925. I have no idea when the repo doesn’t state the status in the readme

managedkaos

01:33:26 AM

Why can’t AWS keep an ECS CLI around? I’ve had better luck writing bash scripts that use the native AWS CLI

2024-09-10

2024-09-11

Zing

11:55:03 AM

https://github.com/aws/containers-roadmap/issues/2411

Can we get some traction on this

Support for custom eks access entry policies

2024-09-12

2024-09-16

03:12:04 PM

I came across this OWASP project recently that implements an open source version of AWS PrivateCA without the costs of PrivateCA

https://serverlessca.com/

Terraform module for serverless CA on AWS

Serverless CA in AWS with FIPS 140-2 level 3 CA key storage and cost typically under $5 per month

jose.amengual

04:24:41 PM

well that is a HUGE different in price

Terraform module for serverless CA on AWS

Serverless CA in AWS with FIPS 140-2 level 3 CA key storage and cost typically under $5 per month

kevcube

11:05:17 AM

i think more terraform needs to move toward this. fully packaged applications. as an infra guy, sure i can set up the VPC, ASG, ECS yada yada

but I think OSS devs/communities could benefit a ton from saying “just run this one auditable, fully configurable command in your AWS account and you get the application running.”

of course someone has to write that terraform, select plenty of opinionated defaults when doing it, but it’s far more collaboratively-approachable than a cloudformation template. also lends itself to cross-cloud translation.

Erik Osterman (Cloud Posse)

07:20:09 PM

(Discussed on office hours)

2024-09-17

2024-09-18

2024-09-23

2024-09-26

Sean Turner

02:20:58 PM

Going deep on renovate lately in a move from cluster-branch ArgoCD Applications to ApplicationSets…

AWS Just released m8g instances. How do you all go about upgrading your Karpenter Manifests to pull in the newest instance type? Do you decaratively express family + version (e.g. m8g? Or perhaps just family (e.g. mg)?

Gabriela Campana (Cloud Posse)

05:30:47 PM

@Yonatan Koren @Jeremy G (Cloud Posse) @Jeremy White (Cloud Posse)

Jeremy G (Cloud Posse)

05:43:16 PM

@Sean for Karpenter, unless we have to conform to SCP restrictions, we generally limit by instance generation (Gt 5), architectures (amd64), and vCPUs (Gt 2, Lt 32) and

              - key: "karpenter.k8s.aws/instance-encryption-in-transit-supported"
                operator: "In"
                values: ["true"]
              # Requiring Nitro is redundant with Encryption in Transit, but we keep it for now.
              - key: "karpenter.k8s.aws/instance-hypervisor"
                operator: In
                values: ["nitro"]

then we get access to all the instances, and let Karpenter/AWS decide which is the best fit for our needs.

Sean Turner

05:43:36 PM

Great, tyvm!

2024-09-27

Adarsh

08:52:13 AM

Has anyone worked with SRV record type , I have a private hosted zone , and had a dns records of Type A for my services deployed in ecs for inter-communication , I had to change one of the service record type from A to SRV to expose one of the route to public via api gateway , When i created SRV record it automatically created a type A record too , so SRV type record :- svc1.accept.com and A type record :- 678521378612382091734.svc1.accept.com , and on running dig command on svc1.accept.com it is pointing to 678521378612382091734.svc1.accept.com , although the service was exposed using the api gateway , but the other services in the cluster are failing to connect my service , I tried to replace the urls in the other services env files to :- 678521378612382091734.svc1.accept.com -> connection refused svc1.accept.com -> cannot resolve http://678521378612382091734.svc1.accept.com<i class="em em-8080|678521378612382091734.svc1.accept.com"</i>8080> -> connection refused I cannot change it back to A record because api gateway needs SRV type only

Scott Kaminski

09:02:03 PM

Is your VPC Private DNS setting correct? Can you manually verify the endpoints with dig

IE: dig 678521378612382091734.svc1.accept.com SRV

2024-09-30

Shirisha Sudhakar Rao

12:22:11 PM

Is it possible to use Cloudposse’s VPC module to also create the database and intra subnets (similar to the terraform-aws-modules/vpc/aws component)?

Gabriela Campana (Cloud Posse)

05:08:08 PM

@Jeremy G (Cloud Posse)

Jeremy G (Cloud Posse)

05:45:52 PM

@Andriy Knysh (Cloud Posse)

Andriy Knysh (Cloud Posse)

05:53:09 PM

subnets are created by https://github.com/cloudposse/terraform-aws-dynamic-subnets

the component https://github.com/cloudposse/terraform-aws-components/tree/main/modules/vpc uses both the cloudposse/vpc/aws module to create a VPC, and the cloudposse/dynamic-subnets/aws module to create the subnets

see this example on how to create multiple (named) subnets per AZ https://github.com/cloudposse/terraform-aws-dynamic-subnets/tree/main/examples/multiple-subnets-per-az

Andriy Knysh (Cloud Posse)

05:53:16 PM

@Shirisha Sudhakar Rao

Jeremy G (Cloud Posse)

06:16:34 PM

So, to add to what Andriy explained, The VPC root module (which we call a “component”) and the dynamic-subnets module can create multiple named subnets. The current limitation on both is that there is only one flag for creating public subnets, so either all the subnets have both public and private allocations in each AZ or all the subnets are only private.

If you want to create some subnets that are both public and private, and some that are only private, you cannot easily use the VPC component because it assigns a CIDR range to the VPC and then divides it up among all the subnets it creates. You would use the component to create all the subnets that are both public and private and they would take up the entire primary CIDR block of the VPC. You would specify, to the VPC component, ipv4_additional_cidr_block_associations, and then separately use dynamic-subnets to allocate private-only subnets covering one of the additional CIDR blocks.

cloudposse/terraform-aws-dynamic-subnets

Terraform module for public and private subnets provisioning in existing VPC

Shirisha Sudhakar Rao

12:23:42 PM

@Jeremy G (Cloud Posse) @Andriy Knysh (Cloud Posse) Thank you. I was able to setup the subnets.