#aws (2024-09)
Discussion related to Amazon Web Services (AWS)
Discussion related to Amazon Web Services (AWS)
Archive: https://archive.sweetops.com/aws/
2024-09-02
Hello, I’m currently facing an issue with aws load balancer. I have an ecs fargate cluster with about five tasks. However, I noticed that these instances dont scale pass 10 users during a load test. Upon further debugging, I had to ssh into each of these instances and did a htop to see the cpu and memory utilization of these five tasks. I noticed that one of these tasks had 100% cpu utilization and the rest had no cpu utilization. This makes the cpu utilization get very high and makes the ecs tasks unhealthy and unable to receive more traffic. This image shows the ecs cpu utilization for one instance. How can i ensure this traffic is evenly distributing the traffic to all tasks in the ecs service. Upon checking my load balancer access logs, I also noticed that a lot of requests came from one ip address. I tried modifying the load balancer traffic distribution style to round robin but it still doesnt distribute traffic evenly across all my tasks. What can I do to ensure scalability of my application? Has anyone faced this ?
@Jeremy White (Cloud Posse)
you likely have session stickiness on on the load balancer, which means each unique session goes to the same instance, to allow for things like safe deploys (once you hit app version 2.0, you no longer hit 1.0). Your load tester is 1 session, so it always hits 1 server. You could temporarily turn off session stickiness during the load test, or else you might have to lookup how it determines that stickiness, and randomize that on the load tester side. AWS Load balancer sets a cookie for that stickiness, so you’d need to clear cookies. Cookie name is AWSALB at least on mine
Thanks @Michael Galey Currently stickiness if turned off. The problem is that one ecs task takes 87% cpu while the remaining four tasks have 0% utilization. So this has an impact on the scalability of the system. Do you know how I can resolve this ?
usually I think about python applications as needing something to share requests. There are a few ways to do this, but a common couple to try first are gunicorn and uwsgi . Do you have any application that’s sharing the listener port with your server threads/PIDs?
Here’s a site on uWSGI: https://uwsgi-docs.readthedocs.io/en/latest/WSGIquickstart.html
if you are using one of those tool already, could you share your config? Doesn’t have to be all the gory details, but at least some notion of how it decides to spawn your application on a request
the above would be per server, he’s already at full utilization on one server. His load balancer is not balancing the load, at least not from a single source.
if you do 2-3 load tests of smaller size from a few diff ips, does it go to that number of ecs tasks? I’d suggest trying least-request if that’s an option for the load balancer, otherwise maybe just start simple, follow some tutorial for a basic load balancer + hello world, and compare the load balancer config / target group config against yours. I don’t see how it’s the app’s fault here, I think it’s the load balancer config + single origin ip
I better understand now. I’m not sure what’s up, but are you using target groups? Do all the tasks show as healthy?
2024-09-03
2024-09-04
2024-09-05
Hi, Team. I am looking for help with Azure DevOps repository + AWS Amplify deployment.
You’re more likely to get help if you ask questions.
https://github.com/aws/containers-roadmap/issues/474 hey there, how are you all working around aws’ silly limitation on EKS access entries not supporting wildcards? it’s a nightmare for permission set arns, since they have that random string at the end of the permission set role
Tell us about your request
Support basic glob wildcard rolearn matching for aws-auth configmap that controls iam role eks auth.
Which service(s) is this request for?
EKS
Tell us about the problem you’re trying to solve. What are you trying to do, and why is it hard?
Trying to avoid hardcoding lots of IAM role arns into the aws-auth configmap. It would be useful if basic glob wildcard matching worked in the rolearn
field of each role mapping:
apiVersion: v1
kind: ConfigMap
metadata:
name: aws-auth
namespace: kube-system
data:
mapRoles: |
- groups: [AcmeCorp]
rolearn: arn:aws:iam::111122223333:role/teams/*
username: AcmeCorp
Are you currently working around this issue?
Individually specifying each rolearn and updating the configmap everytime these roles change:
apiVersion: v1
kind: ConfigMap
metadata:
name: aws-auth
namespace: kube-system
data:
mapRoles: |
- groups: [AcmeCorp]
rolearn: arn:aws:iam::111122223333:role/teams/SomeTeam
username: SomeTeam
- groups: [AcmeCorp]
rolearn: arn:aws:iam::111122223333:role/teams/AnotherTeam
username: AnotherTeam
Additional context
I tried using a *
on a working rolearn
field and the role became unable to authenticate with the api server. EKS version (Im not sure what component handles this auth delegation, so I dont know of another relevant version to check for that):
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.1", GitCommit:"4485c6f18cee9a5d3c3b4e523bd27972b1b53892", GitTreeState:"clean", BuildDate:"2019-07-18T14:25:20Z", GoVersion:"go1.12.7", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.10-eks-5ac0f1", GitCommit:"5ac0f1d9ab2c254ea2b0ce3534fd72932094c6e1", GitTreeState:"clean", BuildDate:"2019-08-20T22:39:46Z", GoVersion:"go1.11.13", Compiler:"gc", Platform:"linux/amd64"}
Codify each arn in aws auth config
Tell us about your request
Support basic glob wildcard rolearn matching for aws-auth configmap that controls iam role eks auth.
Which service(s) is this request for?
EKS
Tell us about the problem you’re trying to solve. What are you trying to do, and why is it hard?
Trying to avoid hardcoding lots of IAM role arns into the aws-auth configmap. It would be useful if basic glob wildcard matching worked in the rolearn
field of each role mapping:
apiVersion: v1
kind: ConfigMap
metadata:
name: aws-auth
namespace: kube-system
data:
mapRoles: |
- groups: [AcmeCorp]
rolearn: arn:aws:iam::111122223333:role/teams/*
username: AcmeCorp
Are you currently working around this issue?
Individually specifying each rolearn and updating the configmap everytime these roles change:
apiVersion: v1
kind: ConfigMap
metadata:
name: aws-auth
namespace: kube-system
data:
mapRoles: |
- groups: [AcmeCorp]
rolearn: arn:aws:iam::111122223333:role/teams/SomeTeam
username: SomeTeam
- groups: [AcmeCorp]
rolearn: arn:aws:iam::111122223333:role/teams/AnotherTeam
username: AnotherTeam
Additional context
I tried using a *
on a working rolearn
field and the role became unable to authenticate with the api server. EKS version (Im not sure what component handles this auth delegation, so I dont know of another relevant version to check for that):
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.1", GitCommit:"4485c6f18cee9a5d3c3b4e523bd27972b1b53892", GitTreeState:"clean", BuildDate:"2019-07-18T14:25:20Z", GoVersion:"go1.12.7", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.10-eks-5ac0f1", GitCommit:"5ac0f1d9ab2c254ea2b0ce3534fd72932094c6e1", GitTreeState:"clean", BuildDate:"2019-08-20T22:39:46Z", GoVersion:"go1.11.13", Compiler:"gc", Platform:"linux/amd64"}
Now days the aws auth config file is deprecated
Deprecated method https://docs.aws.amazon.com/eks/latest/userguide/auth-configmap.html
Access entries is the new way https://docs.aws.amazon.com/eks/latest/userguide/access-entries.html
You can’t use wildcards/globs in aws auth config or in eks access entries
One way to do it, if youre using terraform, whether with aws auth config or access entries, you can use the data source aws iam roles, specify a wildcard to retrieve all the arns, and then populate the arns in your config map or access entries
https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_roles
data "aws_iam_roles" "default" {
name_regex = ".*project.*"
}
https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/eks_access_entry
resource "aws_eks_access_entry" "default" {
for_each = data.aws_iam_roles.default
cluster_name = aws_eks_cluster.default.name
principal_arn = each.value.arn
kubernetes_groups = ["group-1", "group-2"]
type = "STANDARD"
}
yeah, i saw that workaround in the thread
but its so hacky
i also can see it breaking since we use terragrunt, and it’s hard to pass in data calls into module inputs
so we’d have to just rely on “in-module” access entries, and ignore the terragrunt layer, i think (maybe not - haven’t thought abt it enough)
Yes, we’ve run into this as well. It’s one of the reasons we also implement the aws-teams
and aws-team-roles
architecture in our reference architecture and allow permission sets to assume them. This allows us to have consistent roles for both programmatic/machine access (e.g. GitHub OIDC) as well as for developers.
We talk more about our approach here https://docs.cloudposse.com/layers/identity/
Setup fine-grained access control for an entire organization
thanks! i’ve been considering going with this approach, but i’m hesitant because of the extra assume role hop (for human users)
especially non-technical users
Yes, the extra assume role hop is really for technical users
But sounds like you’re using EKS. You have technical users.
yeah, but we have a fair amount of non technical users (clickops in the console), and i’m not sure what we’d do for them. i think the extra assume role in the console would make dem go nuts but i guess a hybrid approach could work too…
@Jeremy G (Cloud Posse) might have some updated ideas on this we haven’t yet tried. He’s OoO though.
2024-09-06
2024-09-07
Hey everyone,
Recently we did a change on our ECS infrastructure. We’ve transitioned to using AWS service discovery and have configured our containers to use HTTPS on their hostnames. After resolving various issues with appsettings
and Dockerfiles, the HTTPS port is now open.
Previously, we used an ALB for each service in the ECS cluster. With the move to HTTPS and service discovery, we need to set HTTPS as the port for service health checks. The challenge we’re facing is that target groups don’t allow us to define a hostname for service discovery.
You might wonder why we switched to HTTPS. The decision was driven by difficulties we encountered with service discovery, which we found were best addressed by using HTTPS.
I’ve attached the task definition file for one of the services and the appsettings
file. These should help illustrate the issue with the target group’s inability to accept a hostname.
Just a note: I’m fairly new to DevOps—only been in this field for two months—and I’m really enjoying the learning process!
this could be useful https://www.youtube.com/watch?v=z1WQ-YSAsVY
here is the text version with links https://fivexl.io/blog/ecs-service-connect-encryption/
Deep-dive into AWS ECS Service Connect. How startup can enable encryption in transit with ECS Service Connect and ECS Fargate deployment
Why don’t you let the ECS service manage registration with the target group for you? It’s allowed to use both service discovery AND allow the ECS service to register task IPs with the target group
@andrey.a.devyatkin Thank you so much for the video! @Fizz Yup, this is what i chose to go with, during the creation process when defining the ECS service, i choose the load balancer and then i was prompted to create the Target Group & add it to the listener. All the ports of which i used use HTTPS now and it’s working perfectly, am just adding a route 53 entry and then we will go public!
This is super neat to see, and do share your experience during and after the process! I’m keen to implement ECS Service Connect within Fargate target groups using Terraform provisioning, as soon as I’m able wrap my mind around it.
@Rishav checkout blog post and video above - they go into details of ECS Service Connect implementation for ECS/Fargate
I am a team looking for help with the yml pipeline for Azure DevOps to Azure static Apps service in the nextjs application.
I worked on Azure for a while, hope you’ve worked it out already
Thank you for your response; I have resolved the pipeline issues; however, the deployment is taking time; I am working on skipping a few items.
cool
Hi
Hi there, how is it going?
2024-09-08
2024-09-09
Hello, how to deploy docker compose on aws fargate? I searched some but its outdated/retired.
I think you’re looking for https://github.com/aws/amazon-ecs-cli
will check on this.
Yea, unfortunately they deprecated docker compose deployments to ECS in the docker-compose
CLI
We were really bummed about that
wow, @Darren Cunningham i didn’t know about this new ECS cli
And this amazon-ecs-cli
is different from the other ECS cli by AWS https://aws.github.io/copilot-cli/
Develop, Release and Operate Container Apps on AWS.
And then Copilot bites the dust https://github.com/aws/copilot-cli/issues/5987#issuecomment-2494477701
Turns out there is a discussion about this aws/copilot-cli#5925. I have no idea when the repo doesn’t state the status in the readme
Why can’t AWS keep an ECS CLI around? I’ve had better luck writing bash scripts that use the native AWS CLI
2024-09-10
2024-09-11
https://github.com/aws/containers-roadmap/issues/2411
Can we get some traction on this
Support for custom eks access entry policies
2024-09-12
2024-09-16
I came across this OWASP project recently that implements an open source version of AWS PrivateCA without the costs of PrivateCA
Serverless CA in AWS with FIPS 140-2 level 3 CA key storage and cost typically under $5 per month
well that is a HUGE different in price
Serverless CA in AWS with FIPS 140-2 level 3 CA key storage and cost typically under $5 per month
i think more terraform needs to move toward this. fully packaged applications. as an infra guy, sure i can set up the VPC, ASG, ECS yada yada
but I think OSS devs/communities could benefit a ton from saying “just run this one auditable, fully configurable command in your AWS account and you get the application running.”
of course someone has to write that terraform, select plenty of opinionated defaults when doing it, but it’s far more collaboratively-approachable than a cloudformation template. also lends itself to cross-cloud translation.
2024-09-17
2024-09-18
2024-09-23
2024-09-26
Going deep on renovate lately in a move from cluster-branch ArgoCD Applications to ApplicationSets…
AWS Just released m8g instances. How do you all go about upgrading your Karpenter Manifests to pull in the newest instance type? Do you decaratively express family + version (e.g. m8g? Or perhaps just family (e.g. mg)?
@Yonatan Koren @Jeremy G (Cloud Posse) @Jeremy White (Cloud Posse)
@Sean for Karpenter, unless we have to conform to SCP restrictions, we generally limit by instance generation (Gt 5
), architectures (amd64
), and vCPUs (Gt 2
, Lt 32
) and
- key: "karpenter.k8s.aws/instance-encryption-in-transit-supported"
operator: "In"
values: ["true"]
# Requiring Nitro is redundant with Encryption in Transit, but we keep it for now.
- key: "karpenter.k8s.aws/instance-hypervisor"
operator: In
values: ["nitro"]
then we get access to all the instances, and let Karpenter/AWS decide which is the best fit for our needs.
Great, tyvm!
2024-09-27
Has anyone worked with SRV record type , I have a private hosted zone , and had a dns records of Type A for my services deployed in ecs for inter-communication , I had to change one of the service record type from A to SRV to expose one of the route to public via api gateway , When i created SRV record it automatically created a type A record too , so SRV type record :- svc1.accept.com and A type record :- 678521378612382091734.svc1.accept.com , and on running dig command on svc1.accept.com it is pointing to 678521378612382091734.svc1.accept.com , although the service was exposed using the api gateway , but the other services in the cluster are failing to connect my service , I tried to replace the urls in the other services env files to :- 678521378612382091734.svc1.accept.com -> connection refused svc1.accept.com -> cannot resolve http://678521378612382091734.svc1.accept.com<i class="em em-8080|678521378612382091734.svc1.accept.com"</i>8080> -> connection refused I cannot change it back to A record because api gateway needs SRV type only
Is your VPC Private DNS setting correct? Can you manually verify the endpoints with dig
IE: dig 678521378612382091734.svc1.accept.com SRV
2024-09-30
Is it possible to use Cloudposse’s VPC module to also create the database and intra subnets (similar to the terraform-aws-modules/vpc/aws component)?
@Jeremy G (Cloud Posse)
subnets are created by https://github.com/cloudposse/terraform-aws-dynamic-subnets
the component https://github.com/cloudposse/terraform-aws-components/tree/main/modules/vpc uses both the cloudposse/vpc/aws
module to create a VPC, and the cloudposse/dynamic-subnets/aws
module to create the subnets
see this example on how to create multiple (named) subnets per AZ https://github.com/cloudposse/terraform-aws-dynamic-subnets/tree/main/examples/multiple-subnets-per-az
@Shirisha Sudhakar Rao
So, to add to what Andriy explained, The VPC root module (which we call a “component”) and the dynamic-subnets
module can create multiple named subnets. The current limitation on both is that there is only one flag for creating public subnets, so either all the subnets have both public and private allocations in each AZ or all the subnets are only private.
If you want to create some subnets that are both public and private, and some that are only private, you cannot easily use the VPC component because it assigns a CIDR range to the VPC and then divides it up among all the subnets it creates. You would use the component to create all the subnets that are both public and private and they would take up the entire primary CIDR block of the VPC. You would specify, to the VPC component, ipv4_additional_cidr_block_associations
, and then separately use dynamic-subnets to allocate private-only subnets covering one of the additional CIDR blocks.
Terraform module for public and private subnets provisioning in existing VPC
@Jeremy G (Cloud Posse) @Andriy Knysh (Cloud Posse) Thank you. I was able to setup the subnets.