SweetOps #aws for June, 2023

Discussion related to Amazon Web Services (AWS)

Archive: https://archive.sweetops.com/aws/

2023-06-01

Adnan

“A pull through cache is a way to cache images you use from an upstream repository. Container images are copied and kept up-to-date without giving you a direct dependency on the external registry. If the upstream registry or container image becomes unavailable, then your cached copy can still be used.” https://aws.amazon.com/blogs/containers/announcing-pull-through-cache-for-registry-k8s-io-in-amazon-elastic-container-registry/

Announcing pull through cache for registry.k8s.io in Amazon Elastic Container Registry | Amazon Web Services attachment image

Introduction Container images are stored in registries and pulled into environments where they run. There are many different types of registries from private, self-run registries to public, unauthenticated registries. The registry you use is a direct dependency that can have an impact on how fast you can scale, the security of the software you run, […]

2023-06-05

vicentemanzano6

09:18:11 AM

Hello! Im trying to create a 301 redirect with s3 and cloudfront. The main idea is to redirect traffic from support.example.com to our atlassian service desk portal url. Cloudfronts behaviour is to redirect http from https and theres a valid certificate from acm added to cloudfront for *.example.com, however, i get this error from chrome when trying to test the redirect: NET::ERR_CERT_COMMON_NAME_INVALID. Any ideas what could be wrong?

managedkaos

03:51:12 PM

Check the name of the bucket. IIRC, the name of the bucket has to match the domain name pretty much exactly. Its been a while and I haven’t looked at the docs yet but that’s one thing to consider.

Adnan

09:40:38 AM

For those who are using AWS Identity Center (SSO)

• Are you using primarily permission sets for cross-account access?

• Are you using primarily self managed roles for cross-account access? Why do you use one over the other?

Phil Hadviger

05:37:17 AM

I hope I understand the question correctly, but I use permission-sets for cross account access that is focused around users and groups. When it comes to services, and automation I use self-managed rules.

Phil Hadviger

05:38:42 AM

With permissions sets, it’s primarily a mix of inline-policies and AWS managed policies. So far haven’t really used the customer managed policies option of permission sets.

Adnan

05:43:10 AM

Yes, that’s what I am interested in. Do you run EKS clusters?

Phil Hadviger

05:43:47 AM

I do not, the org I’m working in is primarily ECS.

Adnan

05:52:16 AM

With permission sets, AWS sets up those SSO roles with random numbers in the ARN’s across accounts. Do you use those anywhere to configure access or allow something else to assume the role?

Sergei

08:05:42 AM

We use sso roles with random numbers in terraform plans.The random numbers are appeneded to the role names so role names can still be used applying some basic regex

2023-06-08

SlackBot

08:20:50 PM

This message was deleted.

2023-06-09

M Asif

05:25:18 PM

Hi. Want to import existing elasticache cluster into terraform. Not sure what is it about the replication group? Should I create a replication group and import cluster into it or what?

jimp

05:47:03 PM

Your best bet is to get the specific spec using describe_replication_group and then write the terraform resource to match. Then import it, yes. Do a terraform plan and if you note any deviations, then fix the resource definition to match.

M Asif

05:56:49 PM

Thanks @jimplet me do that

M Asif

05:57:53 PM

I described the cluster and got two clusters. Then I was trying to import both individually. But they did not succeed at terraform plan stage.

I’ll now follow what you have suggested.

Hao Wang

06:14:04 PM

May Terraformer help?

M Asif

06:32:09 PM

Thanks let me check that.

2023-06-12

Oleh Kopyl

03:20:28 PM

Guys, how can I redeploy tasks after i pushed another Docker image to ECR?

When i press “Update service” and tick “force deploy”, it does not do anything.

So tired of going manually through deleting a service -> waiting until it’s finished the deletion (cause otherwise my tasks fail, probably due to the quota limit, so they have to finish deleting first) -> creating another service from scratch…

Nat Williams

03:22:38 PM

Are you updating the Task Definition before pressing “update service”?

Nat Williams

03:22:57 PM

to reference the new docker image

Oleh Kopyl

03:23:41 PM

@Nat Williams no, i don’t. But i have set some tag like …[us-east-1.amazonaws.com/test-classifier-fargate//us-east-1.amazonaws.com/test-classifier-fargate:arm-fp16)

Oleh Kopyl

03:23:56 PM

the tag is always the same

Nat Williams

03:27:36 PM

hmm. It sounds like the default ECS behaviour should be to check the repo for a newer version every time

Nat Williams

03:27:50 PM

you’re on ECS, right?

Nat Williams

03:28:06 PM

I just sort of assumed that from the “Update service” and “force deploy” verbiage

Oleh Kopyl

03:28:09 PM

@Nat Williams ECS Fargate

Hao Wang

03:28:43 PM

normally need to use a different image tag

Nat Williams

03:29:33 PM

Yeah, ideally I guess you’d be updating the task definition with a specific version each time

Oleh Kopyl

03:29:47 PM

@Hao Wang why? :(

Is there any way to use the same without having to set up awful CodeDeploy?

Nat Williams

03:30:21 PM

Sure, just create a new revision of the Task Definition and updating the service to use it

Oleh Kopyl

03:31:45 PM

@Nat Williams so it’s either i set up CodeDeploy or manually create another task revision? (or maybe via CLI somehow so it just duplicates it)

Hao Wang

03:32:32 PM

Using different image tags is a best practice especially in prod env

Nat Williams

03:32:34 PM

I mean, you’re already manually forcing the deploy, so it’s not that big a change

Oleh Kopyl

03:33:09 PM

@Hao Wang but i’d still need to update the task definition, right?

Hao Wang

03:33:17 PM

hmm ECS should update image if forcing deploy

Nat Williams

03:33:36 PM

yeah, it should. There is something weird going on

Hao Wang

03:33:39 PM

yeah, if using a different tag

Hao Wang

03:34:04 PM

AWS support would know more

Hao Wang

03:40:08 PM

have run into some other issues and finally AWS support worked them out, it is a black box for us

Oleh Kopyl

03:40:55 PM

@Hao Wang thank you very much

Jut reached out to support with this issue :)

Michael Galey

04:04:55 PM

ec2 hosted ecs might use the locally cached image of that tag to avoid the network cost. Fargate would pull fresh on a redeploy of the same tag. You might be able to disable image caching if using ec2, dunno how.

Oleh Kopyl

04:05:40 PM

@Michael Galey not using ECS + EC2, but ECS + Fargate

Michael Galey

04:09:40 PM

then it’d work. Your ecr repo might have tag immutability checked? if so, your 2nd push wouldn’t overwrite the first one. Or theres something else that’s more user-side, if your force deployed tasks failed to pass health checks for instance, so checked the date of the task start, and view the logs of any recently stopped instances

Michael Galey

04:10:05 PM

but in general i’d highly recommend using unique ids for deploys to solve possible confusion issues like this

Michael Galey

04:11:32 PM

you could also pull the image locally and do docker inspect or connect to it to see if your recent change is there

Oleh Kopyl

06:13:58 PM

@Michael Galey could you please tell me where i can find that immutability tag? :)

Oleh Kopyl

06:14:38 PM

and how to use those unique ids :(

Michael Galey

06:15:28 PM

how do you build images? codepipeline?

Oleh Kopyl

06:19:30 PM

@Michael Galey nope. I build locally, push locally and then Fargate takes my image from ECR and deploys

Michael Galey

06:21:55 PM

add something like this to deploy script ? it assumes code is committed

      - COMMIT_HASH=`git rev-parse --short HEAD` 
      - IMAGE_TAG_VER=v-${COMMIT_HASH:=latest}
- docker build ... -t <repo url>:$IMAGE_TAG_VER
- docker push 
- <deploy command> <repo_url>:$IMAGE_TAG_VER

Oleh Kopyl

06:22:43 PM

@Michael Galey without <deploy command> <repo_url>:$IMAGE_TAG_VER

Oleh Kopyl

06:23:09 PM

have no idea what the deploy command should be

Michael Galey

06:24:54 PM

whatever it is now, you’d just be looking for a parameter for the image, i haven’t used this stuff but quick googles show things like https://github.com/awslabs/fargatecli

awslabs/fargatecli

CLI for AWS Fargate

Oleh Kopyl

06:26:41 PM

@Michael Galey but if it doesn’t work in UI, how can it even work in CLI and even some third-party apps?

Oleh Kopyl

06:26:54 PM

I tried deploying with AWS cli. Does not work

Oleh Kopyl

06:27:24 PM

Something like

aws ecs update-service --cluster ${{ inputs.ecs_cluster_name }} --service ${{ inputs.ecs_service_name }} --force-new-deployment

Michael Galey

06:27:42 PM

did you check the actual tasks are successfully starting?

Oleh Kopyl

06:27:55 PM

@Michael Galey they are up and running now

Michael Galey

06:27:58 PM

“Fargate does not cache images, and therefore the whole image is pulled from the registry when a task runs.”

Oleh Kopyl

06:28:11 PM

everything works, LB handles requests to them just fine

Michael Galey

06:28:13 PM

did you inspect the latest image?

Oleh Kopyl

06:28:21 PM

@Michael Galey locally?

Michael Galey

06:28:47 PM

pull the image from ecr, and see if your intended change is in there

Michael Galey

06:28:54 PM

and see if theres a date via docker inspect

Oleh Kopyl

06:28:58 PM

it’s there

Michael Galey

06:29:00 PM

you’d have to clear your local cache

Michael Galey

06:29:08 PM

and the code isn’t working in the deployed version?

Oleh Kopyl

06:29:29 PM

code is working. AWS is not redeploying on “force redeploy”

Michael Galey

06:29:45 PM

do the tasks start date line up with the redeploy command?

Oleh Kopyl

06:30:25 PM

@Michael Galey Nope. Old date. They do not redeploy

Michael Galey

06:31:00 PM

oh ok, not a cache thing then, not sure

Michael Galey

06:31:15 PM

command looks right to me

Oleh Kopyl

06:31:18 PM

@Michael Galey thanks anyways

Oleh Kopyl

06:31:24 PM

@Michael Galey which command?

Michael Galey

06:31:37 PM

aws ecs update-service --cluster <<cluster-name>> --service <<service-name>> --force-new-deployment --region <<region>>

Oleh Kopyl

06:31:59 PM

@Michael Galey i would rather make sure it works in UI, then try to get the command working

Oleh Kopyl

03:51:56 PM

AWS support is horrible

support: All restrictions on your account have been lifted. me: What were the restrictions? support: https://i.imgur.com/TuC0If7.jpg me: You said “All restrictions on your account have been lifted.”. So what were the restrictions? support: https://i.imgur.com/imFbdjL.jpg

support: “I understand”

are they on crack?

Hao Wang

04:06:51 PM

oh it is common to have such restrictions, AWS should have sent some email to root account

Oleh Kopyl

04:19:08 PM

@Hao Wang but they’re refusing to tell me the restrictions…

Hao Wang

06:48:51 PM

hard to know the details for this case which is not related to security

jonjitsu

04:24:09 PM

Any thoughts/opinions on orgformation as an alternative to Control Tower/Landing Zones?

Hao Wang

06:51:03 PM

great to know this project, just took a look, it is CFN wrapper with JS/TS

Hao Wang

06:51:48 PM

feeling much better than CT for some insights into the black box of AWS

Erik Osterman (Cloud Posse)

08:54:35 PM

Fwiw, we use #terraform https://github.com/cloudposse/terraform-aws-components/tree/main/modules/account

Erik Osterman (Cloud Posse)

08:55:08 PM

https://docs.cloudposse.com/components/library/aws/account/

account | The Cloud Posse Developer Hub

This component is responsible for provisioning the full account hierarchy along with Organizational Units (OUs). It includes the ability to associate Service Control Policies (SCPs) to the Organization, each Organizational Unit and account.

2023-06-13

Patrick McDonald

08:04:23 PM

anyone impacted from the current aws us-east-1 outage?

Wayne Jessen

08:05:23 PM

Yup. We are all waiting around and can’t do anything about it.

mike

08:07:58 PM

us-east with lambdas api gateway both being down gonna be a bad time for a large swath of aws

Patrick McDonald

08:10:41 PM

cloud formation and lambda are affected I wonder if this only impacts provisioning managed services

Erik Osterman (Cloud Posse)

08:37:55 PM

https://downdetector.com/

Status overview attachment image

Realtime overview of issues and outages with all kinds of services. Having issues? We help you find out what is wrong.

2023-06-14

jose.amengual

05:48:00 PM

Has anyone implement a TCP/UDP proxy for instances in AWS? pure forwarding of ports to different instances like port:1 to instance:1 , port2: instance2 etc ? I wonder if there is a container with nginx or something else that have this built in to make it easier instead of cooking my on image. I could use NLBs but NLBs are layer4 and do not support SGs so once is public then I need to use NACLs to close the access and I will like to avoid that

Michael Galey

06:26:27 PM

I use socat to make some private things available to different clouds and it’s worked well so far, been running 6 months or so with no issues.

jose.amengual

06:30:39 PM

mmm I think I have the same issue than nginx container, that I can’t pass multiple lines to the entypoint to create multiple configurations

loren

06:30:59 PM

would ssm port forwarding work for your use case?

jose.amengual

06:31:25 PM

no, this users do not have AWS access, that is my first option

Alex Jurkiewicz

01:58:14 AM

use NLB. NACL management is less than management of an entire proxy

jose.amengual

03:20:28 AM

that is true

2023-06-16

Oleh Kopyl

08:49:26 AM

Is there any easy way to launch a Docker container on AWS from ECR without a complex cluster + task + service setup on ECS?

If there is such a complex setup for just playing around with one server, there is no point in ECR – better set up manually EC2…

Fizz

09:14:11 AM

https://docs.aws.amazon.com/apprunner/latest/dg/what-is-apprunner.html

What is AWS App Runner? - AWS App Runner

Use the AWS App Runner service to go directly from an existing container image or a source code repository to a running web service in the AWS Cloud with CI/CD capability.

Fizz

09:14:39 AM

You can also deploy docker images to lambda

tommy

01:47:31 PM

Take a look at copilot, it still starts ecs cluster and other resources but in a simpler way.

tommy

01:48:11 PM

You no longer need to provision by yourself.

Oleh Kopyl

08:05:31 PM

Lambda? I don’t want to configure api gateway so I could just make requests

Oleh Kopyl

08:05:53 PM

@tommy thanks

Oleh Kopyl

08:06:08 PM

@Fizz thanks

Fizz

08:06:24 PM

Lambda can be accessed via http now

Fizz

08:07:28 PM

https://docs.aws.amazon.com/lambda/latest/dg/lambda-urls.html

Lambda function URLs - AWS Lambda

Add a dedicated HTTP(S) endpoint to your Lambda function using a function URL.

Oleh Kopyl

01:23:39 AM

@Fizz it’s still not the best option. Lambda requires specific Docker requirements, whereas EC2 does not

Oleh Kopyl

01:31:19 AM

@Fizz App Runner costs $0.064 per hour? It’s crazy. EC2 costs around $10 a month whereas this would cost almost $50.

Oleh Kopyl

03:15:00 AM

@tommy

Copilot such an awful tool.

Takes a lot of time to deploy just one small Docker image.

It takes less time to deploy a Fargate task manually.

Stoor

08:44:03 AM

You could use AWS Lightsail for this. 7$ for the nano sized per month. (0,25 vCPU, 0,5GB RAM)

Oleh Kopyl

05:29:12 AM

@Stoor awful tool as well. Don’t remember why tho

Oleh Kopyl

05:29:35 AM

@Stoor but thank you very much

Stoor

07:56:53 AM

Awful tool? What do you mean? It’s doing exactly what it needs to be doing.

Oleh Kopyl

08:46:14 AM

@Stoor the UX is awful in the first place…

John Bedalov

01:14:29 PM

this elasticache module doesn’t seem to conveniently handle global clusters https://github.com/cloudposse/terraform-aws-elasticache-redis - is that correct?

cloudposse/terraform-aws-elasticache-redis

Terraform module to provision an ElastiCache Redis Cluster

John Bedalov

01:15:25 PM

right now I’m using this module to create 1 cluster then bring in the global cluster resources with ordinary aws terraform resources

cloudposse/terraform-aws-elasticache-redis

Terraform module to provision an ElastiCache Redis Cluster

jose.amengual

03:47:03 PM

hi John , my guess is people do not use much global clusters in Redis li it was in the rds module a while ago

jose.amengual

03:47:14 PM

but that can be implemented

jose.amengual

03:47:20 PM

PRs are welcome

John Bedalov

03:47:27 PM

John Bedalov

03:48:28 PM

Just making sure I’m going down the right path with using the module and resources for the rest. The secondary inherits most of the primary’s settings so it basically works.

jose.amengual

03:57:23 PM

I think taking the approach of the RDS module and do a for_each for the replication group , or different count logic and add

resource "aws_elasticache_global_replication_group" "example" {
  global_replication_group_id_suffix = "example"
  primary_replication_group_id       = aws_elasticache_replication_group.primary.id
}

should be sufficient

John Bedalov

03:59:35 PM

Thanks Pepe!

Oleh Kopyl

01:57:47 AM

Anybody has experience in AWS Lightsai?

Is there any way to make the launch script work? If i execute these commands in a shell script by running ./install.sh with these commands, i can then find that my packages are installed, whereas with this – when I ssh into an instance – there are no such packages…

Oleh Kopyl

02:10:05 AM

How to select arm/x86 for a lightsail container service?

Oleh Kopyl

02:19:34 AM

Do you know why does my Lightsail deployemnt fails?
No launch command, environment variables, or open ports specified. It does have CMD in my Dockerfile

Oleh Kopyl

02:20:13 AM

One of the worst development experience with Lightsail too

Oleh Kopyl

02:22:30 AM

And why a deployment on Lightsail take such a lot of time? Crazy…

Oleh Kopyl

02:34:34 AM

Deployment on Lightsail takes way more time than to Fargate…

Oleh Kopyl

02:38:09 AM

Seeing this in Lightsail service deployment logs. Why? So for Fargate this Docker image is good and for Lightsail it’s not, correct?

Oleh Kopyl

02:41:08 AM

It’s maybe because the image is built for ARM but it deploys on x86.

Amazing, what an awful UX, Amazon.

Alex Jurkiewicz

03:56:14 AM

Lightsail tho

Alex Jurkiewicz

03:57:11 AM

Blaming incorrect architecture on the platform is something though. Maybe try running a windows executable

2023-06-17

Oleh Kopyl

07:44:55 AM

Yo. Please help me. I created a new task definition with .5 vCPU instead of .25 vCPU.

How can I update all the current tasks im a service to .5 vCPU?

I pressed “Update service” button, then checked “Force new deployment”, then pressed “Update” button, got “Service updated:” message and my tasks are still .25 vCPU. Why?

Oleh Kopyl

07:51:20 AM

I checked “Deployments” tab and all the last deployment is “In progress” for eternity now…

It’s faster to remove a service and create a new service to update tasks… What the hell…

Oleh Kopyl

10:13:25 AM

Guys, could you please recommend any Fargate autoscaling tutorials?

Don’t recommend AWS docs please.

I set it up and it does not work…

Hao Wang

12:10:11 PM

I’ve got many clients in the same situation, it is frustrating, let us calm down first I think the cause is some basic thing is ignored like the image architecture as Alex mentioned

Oleh Kopyl

12:43:09 PM

Cause of what exactly? :)

Hao Wang

12:53:52 PM

cannot know the cause for now with the shared information, do you have source codes or doc?

Oleh Kopyl

01:34:47 PM

@Hao Wang source code of what?)

Hao Wang

04:30:07 PM

source codes or instructions you followed

Oleh Kopyl

12:26:59 AM

I did not have any instructions :(

Hao Wang

12:11:12 PM

I was frustrated when I started using Docker at 2014, version 0.9

Hao Wang

12:12:38 PM

You are at the psychology turning point of learning tech. Don’t give up but try different platforms, AWS may not be a good fit for you

Hao Wang

12:14:27 PM

The other platform will have the similar issue since this seems not an AWS issue though

Hao Wang

12:15:21 PM

So back to the first point, which is some basic stuff was overseen… Do you have a writeup or source codes?

Oleh Kopyl

01:46:03 AM

Please tell me how to test docker lambda locally.

I found this https://docs.aws.amazon.com/lambda/latest/dg/images-test.html

Ran a docker image like this docker run -p 9000:8080 ...

I got: 18 Jun 2023 01:34:24,542 [INFO] (rapid) exec '/var/runtime/bootstrap' (cwd=/var/task, handler=)

Went to http://127.0.0.1:9000/2023-06-18/functions/function/invocations , got nothing

Oleh Kopyl

02:11:40 AM

figured out. You have to use that exact URL:

http://localhost:9000/2015-03-31/functions/function/invocations

Oleh Kopyl

02:23:56 AM

Why does the hell local Lamda has different event ‘s structure from pushed function’s event?

Oleh Kopyl

02:33:32 AM

Here is my Lanbda’s handler:

def handler(event, context):
    image_url = event.get("image_url")
    token = event.get("headers", {}).get("Authorization")
    return {
        "headers": {"Content-Type": "application/json"},
        "statusCode": "status_code",
        "body": image_url,
        "token": token,
        "event": event,
    }

Here is my Python’s request:

import requests


payload = {
    "image_url": 'https://...',
    "return_version": True,
}
headers = {
    "Authorization": "tes",
}
res = requests.post("https://...", json=payload, headers=headers)
print(res.json())

Here is what i get in Python from deployed Lambda invocation:

None

Here is what i get from local Lambda invocation:

{'headers': {'Content-Type': 'application/json'}, 'statusCode': 'status_code', 'body': 'https://...', 'token': None, 'event': {'image_url': 'https://...', 'return_version': True}}

Here is what i get when i press “Test” in AWS console: https://i.imgur.com/Vmuy01T.png

Why the hell it’s the same function, but 3 different outputs? How am i supposed to test it locally if it gives me the different result in the deployed state?

Oleh Kopyl

02:39:43 AM

And what is even this None???

Darren Cunningham

02:40:33 AM

woosha

Darren Cunningham

02:49:28 AM

I think you’re suffering from the same thing that I suffered from as I was getting started, slow down and approach the problem a little more pragmatically. If you’re attempting to implement something that has been done before then if it’s not working the problem is not the language/framework/platform, it’s your understanding thereof. That’s not a judgement statement, but an encouragement to slow down and breakdown the problem into digestible bits.

Darren Cunningham

02:56:06 AM

I’m not trying to be crude/discouraging but just trying to help break things down in a meaningful way. If the technology was truly as lacking or ass backwards as you’re making it out to be, then AWS wouldn’t be nearly as successful as they are.

Oleh Kopyl

06:08:33 AM

@Darren Cunningham don’t you agree that making different environment in dev and prod is such a stupid thing of Lambda?

Oleh Kopyl

03:27:59 AM

Is there any way to have faster Lambda responses? My lambda processes requests for too long as opposed to a regular EC2…

Darren Cunningham

04:17:25 AM

are you able to provide some more detail?

• “too long” — what’s the limit you want to stay under?

• what is the response time Lambda vs EC2?

• what runtime? ◦ what memory allocation are you giving it?

• what is the access pattern? meaning, how are you invoking the Lambda vs EC2 ◦ is this a local invocation or remote?

• what does the lambda do? aka, does it access resources within the VPC? if so, are you deploying it into the VPC?

Darren Cunningham

04:26:16 AM

Generally it’s not surprising that an EC2 is going to process a single or a few hundred requests faster than a Lambda. Lambda shines in it’s ability to “scale infinitely” (besides rate limits, etc).

EC2 is like owning a car, you can get it in and go wherever you want/whenever you want without waiting. But you own the maintenance thereof. Lambda is a taxi, it’s not always faster (sometimes is) but it’s worry free…outside of the cost and the more you know how to work the system, the more value you get out of it.

Darren Cunningham

04:37:26 AM

sticking with my car analogy, you can get a 1000 taxis to show up at your door step a lot easier, faster and cheaper than if you tried to rent 1000 cars for a night

Oleh Kopyl

05:59:07 AM

@Darren Cunningham seems like i figured it. I had to increase the memory so it implicitly increases CPU and makes responses faster. But thank you (:

Oleh Kopyl

05:59:51 AM

Generally i want to stay under 2s while not paying so much for 2 CPU lambda

Oleh Kopyl

06:02:42 AM

Lambda responds in 2s on 2048 memory.

EC2 responds in under 1s on 1 vCPU. But only on pure flask multi-threaded without any WSGI thing like gunicorn (makes responses longer)

Oleh Kopyl

06:03:00 AM

Runtime: public.ecr.aws/lambda/python:3.10-arm64

Oleh Kopyl

06:05:01 AM

gave it 512 memory. On 2048 it’s faster, but i don’t need that much, i only need more CPU, not more memory

Oleh Kopyl

06:05:36 AM

access pattern: Function URL

Oleh Kopyl

06:05:49 AM

remote invocation

Oleh Kopyl

06:06:29 AM

does not access resources. Makes AI CPU inference (yolov8 ONNX)

z0rc3r

06:02:01 PM

If your lambda is called infrequently, increased latency is result of slow lambda start-up, which is kinda expected for lambdas is general. You might want to play with provisioned concurrency and see if it helps for your case https://aws.amazon.com/blogs/compute/new-for-aws-lambda-predictable-start-up-times-with-provisioned-concurrency/

New for AWS Lambda – Predictable start-up times with Provisioned Concurrency | Amazon Web Services attachment image

Since the launch of AWS Lambda five years ago, thousands of customers such as iRobot, Fender, and Expedia have experienced the benefits of the serverless operational model. Being able to spend less time on managing scaling and availability, builders are increasingly using serverless for more sophisticated workloads with more exacting latency requirements. As customers have […]

Oleh Kopyl

08:15:48 PM

It’s still to expensive, but thanks

2023-06-18

Oleh Kopyl

10:04:13 AM

Guys, could you please recommend any good course video or article on auto-scaling Fargate for beginners?

Everything i read and watch so far gives me me more questions that answers…

james192

12:22:22 PM

Isn’t it a case of updating the task definition to increase or decrease counts via CloudWatch alarms?

Oleh Kopyl

12:31:47 PM

@james192 don’t know. Which counts?

Oleh Kopyl

08:18:53 PM

Could you please recommend any good tutorials/courses on deploying Docker images to EKS and setting up auto-scaling?

Again, so tired of watching videos and reading articles which give me more questions than answers…

Hao Wang

02:02:02 PM

to be frank, stop learning AWS would be a good option for you for now. picking it up in a few months would be a good strategy

Oleh Kopyl

05:30:18 AM

@Hao Wang why? I need to deploy my app to it now, not in a few months

2023-06-19

2023-06-20

Oleh Kopyl

12:06:41 PM

Guys, do you know how to edit a cloudwatch alarm for EC2 auto-scaling?

Was unable to Google anything on it (maybe i Googled it wrong…)

I need it to scale up if CPU is over 70% for 30 seconds and scale down if CPU is under 70% for 30 seconds

kunalsingthakur

12:22:56 PM

To edit a CloudWatch alarm for EC2 auto-scaling, you can follow these steps:

Sign in to the AWS Management Console and open the Amazon CloudWatch service.
In the navigation pane, click on “Alarms” under the “CloudWatch” section.
Locate the alarm that is associated with your EC2 auto-scaling group and select it.
Click on the “Actions” dropdown menu and choose “Edit.”
In the “Create/Edit Alarm” wizard, you can modify the alarm configuration to match your requirements.
- Under the “Conditions” section, select the “Static” option for the “Threshold type.”
- For the “Whenever” condition, choose “Greater” and enter “70” in the text box.
- Set the “Period” to 30 seconds.
- Enable the “Consecutive periods” option and set it to “1.”
- Choose the appropriate “Statistic” (e.g., “Average” CPU utilization) and adjust the “Datapoints to alarm” if needed.
Under the “Actions” section, click on the “Add notification action” button if you want to receive notifications when the alarm state changes.
Optionally, you can configure auto-scaling actions when the alarm state is triggered.
- Click on the “Add Scaling Action” button.
- Choose the appropriate scaling policy for scaling up and scaling down.
- Configure the desired scaling adjustments, such as the number of instances to add or remove.
- Save the scaling actions.
Review your changes and click on the “Save” button to update the alarm.

The edited CloudWatch alarm will now trigger scaling actions for your EC2 auto-scaling group based on the specified CPU utilization thresholds and duration.

Oleh Kopyl

04:53:14 PM

did you read my message?

kunalsingthakur

12:23:06 PM

this is from chatgpt

Oleh Kopyl

04:15:20 PM

How do you add CloudFlare to AWS Load Balancer? Spent an hour trying to figure out how to make it work – no luck.

Without NGINX forwarding of course, since it seems like a huuuge redundant overhead

Oleh Kopyl

04:19:39 PM

@kunalsingthakur chatGPT does not give proper solution for this

Oleh Kopyl

06:20:34 PM

Hell. I tried refreshing instances in auto-scaling group.

I thought that the logic for this is following

Create a new instance
Make sure that it’s port 80 is accessible
Drop old instance, remove it from the auto-scaling group

But the logic is like this

Remove old instance from the auto-scaling group
Create a new instance
Drop old instance

How to make it work like it should (1st case?)

Andriy Knysh (Cloud Posse)

06:24:17 PM

the purpose of an auto-scaling group is not to manipulate instances manually

Andriy Knysh (Cloud Posse)

06:24:38 PM

if you need more instances launched, increase the min size

Andriy Knysh (Cloud Posse)

06:25:07 PM

if you want to refresh all running instances (for any reason), there is a menu item in the AWS console “Refresh Instances” (if you click on it, the ASG will replace all old instances with new ones making sure the min desired size is always running)

Hao Wang

06:49:18 PM

This reminds me of the termination policy of AWS:

Hao Wang

06:49:20 PM

If you did not assign a specific termination policy to the group, Amazon EC2 Auto Scaling uses the default termination policy. It selects the Availability Zone with two instances, and terminates the instance that was launched from the oldest launch template or launch configuration. If the instances were launched from the same launch template or launch configuration, Amazon EC2 Auto Scaling selects the instance that is closest to the next billing hour and terminates it.

Hao Wang

06:49:56 PM

I don’t use the policy for a while but still remember the pain lol

Hao Wang

06:50:55 PM

If investing too much into ASG, the beauty of k8s will be missed

Hao Wang

06:51:20 PM

Learn k8s directly

Oleh Kopyl

08:17:37 PM

@Andriy Knysh (Cloud Posse) not manipulate maually? But how to update instances with new code? I have a launch script in the “user data” which clones github repo. How can i update all my instances automatically?

Oleh Kopyl

08:18:59 PM

if you want to refresh all running instances (for any reason), there is a menu item in the AWS console “Refresh Instances” I did exactly this and accessing my load balancing was giving me http errors for a while, meaning it drop an existing instance from the group while another instance was not ready. Why?

Oleh Kopyl

08:19:54 PM

@Hao Wang what’s the ASG?)

Oleh Kopyl

08:20:53 PM

@Hao Wang does k8s launches Docker inside EC2 instances? Or Creates EC2 instances already as a Docker image without inner Docker overhead?

Alex Jurkiewicz

10:25:23 PM

It sounds like you could learn a lot from some introduction tutorials to auto scaling. AWS have lots of good video introductions to resources. If you feel masochistic you can also try to read the docs

Andriy Knysh (Cloud Posse)

10:34:20 PM

But how to update instances with new code? I have a launch script in the “user data” which clones github repo. How can i update all my instances automatically?

Andriy Knysh (Cloud Posse)

10:34:55 PM

you use a Launch template, update any parameter in it, and the ASG will update all instances to use the new version of the Launch template

Andriy Knysh (Cloud Posse)

10:35:19 PM

if you know terraform, this is how it’s done https://github.com/cloudposse/terraform-aws-ec2-autoscale-group/blob/main/main.tf

``` resource “aws_launch_template” “default” { count = module.this.enabled ? 1 : 0

name_prefix = format(“%s%s”, module.this.id, module.this.delimiter)

dynamic “block_device_mappings” { for_each = var.block_device_mappings content { device_name = lookup(block_device_mappings.value, “device_name”, null) no_device = lookup(block_device_mappings.value, “no_device”, null) virtual_name = lookup(block_device_mappings.value, “virtual_name”, null)

  dynamic "ebs" {
    for_each = lookup(block_device_mappings.value, "ebs", null) == null ? [] : ["ebs"]
    content {
      delete_on_termination = lookup(block_device_mappings.value.ebs, "delete_on_termination", null)
      encrypted             = lookup(block_device_mappings.value.ebs, "encrypted", null)
      iops                  = lookup(block_device_mappings.value.ebs, "iops", null)
      kms_key_id            = lookup(block_device_mappings.value.ebs, "kms_key_id", null)
      snapshot_id           = lookup(block_device_mappings.value.ebs, "snapshot_id", null)
      volume_size           = lookup(block_device_mappings.value.ebs, "volume_size", null)
      volume_type           = lookup(block_device_mappings.value.ebs, "volume_type", null)
    }
  }
}   }

dynamic “credit_specification” { for_each = var.credit_specification != null ? [var.credit_specification] : [] content { cpu_credits = lookup(credit_specification.value, “cpu_credits”, null) } }

disable_api_termination = var.disable_api_termination ebs_optimized = var.ebs_optimized update_default_version = var.update_default_version

dynamic “elastic_gpu_specifications” { for_each = var.elastic_gpu_specifications != null ? [var.elastic_gpu_specifications] : [] content { type = lookup(elastic_gpu_specifications.value, “type”, null) } }

image_id = var.image_id instance_initiated_shutdown_behavior = var.instance_initiated_shutdown_behavior

dynamic “instance_market_options” { for_each = var.instance_market_options != null ? [var.instance_market_options] : [] content { market_type = lookup(instance_market_options.value, “market_type”, null)

  dynamic "spot_options" {
    for_each = (instance_market_options.value.spot_options != null ?
    [instance_market_options.value.spot_options] : [])
    content {
      block_duration_minutes         = lookup(spot_options.value, "block_duration_minutes", null)
      instance_interruption_behavior = lookup(spot_options.value, "instance_interruption_behavior", null)
      max_price                      = lookup(spot_options.value, "max_price", null)
      spot_instance_type             = lookup(spot_options.value, "spot_instance_type", null)
      valid_until                    = lookup(spot_options.value, "valid_until", null)
    }
  }
}   }

instance_type = var.instance_type key_name = var.key_name

dynamic “placement” { for_each = var.placement != null ? [var.placement] : [] content { affinity = lookup(placement.value, “affinity”, null) availability_zone = lookup(placement.value, “availability_zone”, null) group_name = lookup(placement.value, “group_name”, null) host_id = lookup(placement.value, “host_id”, null) tenancy = lookup(placement.value, “tenancy”, null) } }

user_data = var.user_data_base64

dynamic “iam_instance_profile” { for_each = var.iam_instance_profile_name != “” ? [var.iam_instance_profile_name] : [] content { name = iam_instance_profile.value } }

monitoring { enabled = var.enable_monitoring }

# https://github.com/terraform-providers/terraform-provider-aws/issues/4570 network_interfaces { description = module.this.id device_index = 0 associate_public_ip_address = var.associate_public_ip_address delete_on_termination = true security_groups = var.security_group_ids }

metadata_options { http_endpoint = (var.metadata_http_endpoint_enabled) ? “enabled” : “disabled” http_put_response_hop_limit = var.metadata_http_put_response_hop_limit http_tokens = (var.metadata_http_tokens_required) ? “required” : “optional” http_protocol_ipv6 = (var.metadata_http_protocol_ipv6_enabled) ? “enabled” : “disabled” instance_metadata_tags = (var.metadata_instance_metadata_tags_enabled) ? “enabled” : “disabled” }

dynamic “tag_specifications” { for_each = var.tag_specifications_resource_types

content {
  resource_type = tag_specifications.value
  tags          = module.this.tags
}   }

tags = module.this.tags

lifecycle { create_before_destroy = true } }

locals {
launch_template_block = {
id = one(aws_launch_template.default[*].id)
version = var.launch_template_version != “” ? var.launch_template_version : one(aws_launch_template.default[*].latest_version)
}
launch_template = (
var.mixed_instances_policy == null ? local.launch_template_block: null) mixed_instances_policy = ( var.mixed_instances_policy == null ? null : { instances_distribution = var.mixed_instances_policy.instances_distribution launch_template = local.launch_template_block override = var.mixed_instances_policy.override }) tags = { for key, value in module.this.tags : key => value if value != “” && value != null } }

resource “aws_autoscaling_group” “default” { count = module.this.enabled ? 1 : 0

name_prefix = format(“%s%s”, module.this.id, module.this.delimiter) vpc_zone_identifier = var.subnet_ids max_size = var.max_size min_size = var.min_size load_balancers = var.load_balancers health_check_grace_period = var.health_check_grace_period health_check_type = var.health_check_type min_elb_capacity = var.min_elb_capacity wait_for_elb_capacity = var.wait_for_elb_capacity target_group_arns = var.target_group_arns default_cooldown = var.default_cooldown force_delete = var.force_delete termination_policies = var.termination_policies suspended_processes = var.suspended_processes placement_group = var.placement_group enabled_metrics = var.enabled_metrics metrics_granularity = var.metrics_granularity wait_for_capacity_timeout = var.wait_for_capacity_timeout protect_from_scale_in = var.protect_from_scale_in service_linked_role_arn = var.service_linked_role_arn desired_capacity = var.desired_capacity max_instance_lifetime = var.max_instance_lifetime capacity_rebalance = var.capacity_rebalance

dynamic “instance_refresh” { for_each = (var.instance_refresh != null ? [var.instance_refresh] : [])

content {
  strategy = instance_refresh.value.strategy
  dynamic "preferences" {
    for_each = (length(instance_refresh.value.preferences) > 0 ? [instance_refresh.value.preferences] : [])
    content {
      instance_warmup        = lookup(preferences.value, "instance_warmup", null)
      min_healthy_percentage = lookup(preferences.value, "min_healthy_percentage", null)
    }
  }
  triggers = instance_refresh.value.triggers
}   }

dynamic “launch_template” { for_each = (local.launch_template != null ? [local.launch_template] : []) content { id = local.launch_template_block.id version = local.launch_template_block.version } }

dynamic “mixed_instances_policy” { for_each = (local.mixed_instances_policy != null ? [local.mixed_instances_policy] : []) content { dynamic “instances_distribution” { for_each = ( mixed_in…

Andriy Knysh (Cloud Posse)

10:35:37 PM

https://github.com/cloudposse/terraform-aws-ec2-autoscale-group/blob/main/main.tf#L82

  user_data = var.user_data_base64

Andriy Knysh (Cloud Posse)

10:36:17 PM

https://www.google.com/search?q=aws+launch+template+tutorial

Hao Wang

12:14:20 AM

@Oleh Kopyl ASG = AutoScaling Group, yeah you can understand docker in EC2 for now, but under the hood it is much more complex

Hao Wang

12:15:30 AM

Docker is just a middle man(run time) which can be replaced by others, like containerd

Oleh Kopyl

01:34:09 AM

@Alex Jurkiewicz docs? no, thank you very much.

If you could share a couple of good videos which actually helped you personally, i would appreciate it

Oleh Kopyl

01:35:25 AM

@Andriy Knysh (Cloud Posse) so it does the update automatically upon each template version creation?

Oleh Kopyl

01:37:01 AM

@Andriy Knysh (Cloud Posse) i don’t know Terraform. But should I?

Oleh Kopyl

01:38:41 AM

@Hao Wang i mean that i don’t want to Have Docker in Docker. So if messing around with user data of a launch template is the only option – i better do this than pay for 1 CPU i don’t use (which is used under the hood to have Docker inside Docker))

Hao Wang

02:06:16 AM

It is not DnD

Oleh Kopyl

02:09:08 AM

@Hao Wang k8s?

Hao Wang

02:09:40 AM

yeah

Hao Wang

02:10:10 AM

lol better to dig in more, before I didn’t have ChatGPT…

Oleh Kopyl

02:15:00 AM

@Hao Wang thanks :)

Hao Wang

02:16:32 AM

My pleasure, keep learning

Satish Tripathi

03:51:38 AM

@Oleh Kopyl If you just want to refresh the instances in the ASG , this could be helpful. https://docs.aws.amazon.com/autoscaling/ec2/userguide/asg-instance-refresh.html

Satish Tripathi

04:05:16 AM

if you are using terraform, you can add below setting into your asg configuration : https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/autoscaling_group#instance_refresh

Oleh Kopyl

06:28:04 AM

@Satish Tripathi thank you

2023-06-22

Oleh Kopyl

02:44:29 PM

i launch my ec2 instance with a launch script (user data). it ends with “python3 launch-app.py” Where are the logs of it in Ubuntu? Meaning where is the stderr and stdout?

Hao Wang

02:52:40 PM

can be found under /var/log, it should be managed by systemd otherwise cloud-init will never finishes running

Satish Tripathi

02:52:52 PM

i am not very sure but i remember i had a ubuntu server running with similar setup as yours and was trying to find stderr and stdout logs but i ended up finding nothing until you write those logs to a file. This article says the same. https://askubuntu.com/questions/1030938/where-do-i-find-stderr-logs#<i class="em em-~"</i>text=Command%20output%2C%20i.e.%20stdout%20and,writing%20it%20to%20a%20file> This could be helpful you can send your user data logs: https://repost.aws/knowledge-center/ec2-linux-log-user-data

Satish Tripathi

03:00:42 PM

Another option is to look at the cloudinit logs.

Oleh Kopyl

03:40:17 PM

@Hao Wang yeah, but there are multiple files. So i was wondering what is the one for the logs i need…

I could as well check all the files, since there are not so many and avoid posting here.

Thank you very much any was

Hao Wang

04:15:23 PM

Is checking all the files fun? Lol starting with cloud-init*

Hao Wang

04:16:21 PM

if python3 launch-app.py run, cloud-init will hang forever

Oleh Kopyl

05:23:15 PM

@Hao Wang not fun, true

Oleh Kopyl

05:23:46 PM

if python3 launch-app.py run, cloud-init will hang forever @Hao Wang what do you mean by “hang forever”?

Hao Wang

05:35:39 PM

yeah, forever

jonjitsu

07:51:44 PM

cloud init is not meant to run long running services. it’s meant for doing EC2 first run bootstrapping. It won’t even run on reboot. Your cloud init script should create a systemd service file, enable the service and start the service.

jonjitsu

07:53:32 PM

if you did run the app at the end of the launch script then all the stdout/err will be in the /var/logs/cloud* files.

Oleh Kopyl

10:29:21 PM

cloud init is not meant to run long running services Sure, but why can’t it?

Oleh Kopyl

10:29:28 PM

@jonjitsu

jonjitsu

06:38:04 PM

cloud init is an agent which runs setup which eventually ends. It is not a service manager. For example it won’t run on reboot so if you try to run a service at the end of a cloud-init script it won’t run. All linuxes come with service managers. Ubuntu’s is systemd, so in your cloud init you can create the systemd service file and systemctl enable then systemctl start it and your service will be properly managed including in the case of reboots.

jonjitsu

06:39:18 PM

I don’t know what happens when you try to run a never ending process at the end of a cloud init script or if that is even valid according to the cloud init specs.

Oleh Kopyl

07:00:42 PM

@jonjitsu just yesterday rewrote it to systemd since i needed restarts when an app fails (OOM kill)

#!/bin/bash
git clone https://[email protected]/kopyl/crossdot-yolo.git && \
mv crossdot-yolo/onnx /home/ubuntu/onnx && \
rm -r crossdot-yolo && \
mv /home/ubuntu/onnx/* /home/ubuntu/ && \
rm -r /home/ubuntu/onnx/ && \
echo "wget <https://bootstrap.pypa.io/get-pip.py> && \
python3 get-pip.py && \
python3 -m pip install --ignore-installed flask && \
python3 -m pip install psycopg2-binary && \
python3 -m pip install onnxruntime==1.13.1 && \
python3 -m pip install opencv-python-headless==4.7.0.72 " >> i.sh && \
chmod 777 i.sh && \
./i.sh && \
echo "[Unit]
Description=Flask app
After=multi-user.target
[Service]
Environment=AWS_POSTGRES_DB_HOST=.
Environment=AWS_POSTGRES_DB_USER=.
Environment=AWS_POSTGRES_DB_PASSWORD=.
Environment=ONNX_MODEL_PATH=/home/ubuntu/model.onnx
Environment=PYTHONUNBUFFERED=1
Type=simple
Restart=always
ExecStart=/usr/bin/python3 /home/ubuntu/flask-server-postgres.py
[Install]
WantedBy=multi-user.target" >> /etc/systemd/system/crossdot-flask-inference.service && \
systemctl daemon-reload && \
systemctl enable crossdot-flask-inference.service && \
systemctl start crossdot-flask-inference.service && \
systemctl status crossdot-flask-inference.service

Oleh Kopyl

07:06:20 PM

@jonjitsu
agent which runs setup which eventually ends Do you have any proofs? Not trying to be rude, just really want to read it from some official source

Oleh Kopyl

07:32:15 PM

@jonjitsu by the way, is it going to start on its own on reboot or you need some additional configs?

jonjitsu

09:45:03 PM

I don’t really have any proof. It might be out there though, you’ll have to check the docs.

jonjitsu

09:45:30 PM

It should start on it’s own after reboot because you ran systemctl enable ...

Oleh Kopyl

11:36:37 PM

@jonjitsu was not able to find it in docs. How can you know then if you have no proofs?

2023-06-23

2023-06-26

Nishant Thorat

10:19:45 AM

AWS S3 Buckets never cease to amaze me with their peculiar nature. https://www.cloudyali.io/blogs/aws-s3-bucket-creation-date-discrepancy-in-master-and-other-regions

AWS S3 creation date may not be consistent in all regions

AWS S3 bucket creation date may be reported differently in different regions. To get the S3 bucket creation date correctly call list api in us-east-1.

Alex Jurkiewicz

10:28:23 AM

Good post. Straightforward and brings receipts

AWS S3 creation date may not be consistent in all regions

AWS S3 bucket creation date may be reported differently in different regions. To get the S3 bucket creation date correctly call list api in us-east-1.

2023-06-27

Oleh Kopyl

03:46:29 PM

I have an issue with Fargate. It scales up fast (from 1 to 60 instances in 1 minute), but scales down tooo slow (from 60 to 1 instance in 59 min, meaning it scales 1 instance per 1 minute).

Can i have more control over it? I need it to scale up in 1 minute and down in 1 minute too (from whatever amount instances i have to whatever amount instances are needed at a moment be it 1 or 30 or anything else(

jonjitsu

09:49:47 PM

if you change the service desired count to 1 you are saying it is taking 59 mins to remove all the extra containers?

Oleh Kopyl

11:30:55 PM

I’m not changing it to 1

Oleh Kopyl

11:31:05 PM

It changes to 1 on its own

Oleh Kopyl

11:31:27 PM

Reducing amount of “desired” by 1 per minute on its own

Oleh Kopyl

11:31:43 PM

I need it to reduce to one in 1 minute, not in 59

jonjitsu

12:32:10 PM

why does it go from 1 to 60 back down to 1 in one minute? Is this a batch job or something? what mechanism are you using to scale it from 1 to 60?

Oleh Kopyl

03:48:21 PM

Is it possible to have app restarts on Fargate?

As if you launch it with systemd as a service…

I was getting 5xx errors from Fargate probably due to OOM kills. App restart would fix this crap….

jonjitsu

09:51:12 PM

when a container dies it’s not getting restarted by fargate?

Oleh Kopyl

10:30:42 PM

@jonjitsu what do you mean by “dies” exactly?

Oleh Kopyl

02:39:13 AM

@jonjitsu i was getting 5xx HTTP error, so restarts wasn’t working obviously

Ibansal

01:59:07 PM

You can try fore deployment option in ecs servcie.

Vlad Ionescu (he/him)

04:24:56 PM

@Oleh Kopyl how are you controlling the scaling? Scaling down should not take than long

Oleh Kopyl

04:45:10 PM

With CloudWatch metrics

Oleh Kopyl

11:34:56 PM

Does SageMaker Real-Time Inference scale up and down?

So I don’t pay for instances which are not used

Oleh Kopyl

11:43:44 PM

Seems to be the case

https://aws.amazon.com/getting-started/hands-on/machine-learning-tutorial-deploy-model-to-real-time-inference-endpoint/

Deploy a Machine Learning Model for Inference - Amazon Web Services

Use this step-by-step, hands-on guide to learn how to deploy a trained machine learning model to a real-time inference endpoint.

Sami

12:08:23 AM

Hey all.

I’m pondering over a project I’m working on at the moment and was hoping to get some advice or thoughts from other people.

I have to design the architecture for a 3 tier nodejs application which consists of a simple web front-end, and API component, and a database. My initial thoughts are to go serverless and deploy the web and API components on Lambda and try to keep things light and quick. I am concerned here though about the potential lack of flexibility with the front-end. I understand that you can have a Lambda function return HTML but I don’t know how well it would work for the application’s progression in the furture.

Alternatively, I can containerise both the web and API components and move them onto ECS which would cost more but allow for greater flexibility and if need be a migration to Kubernetes if required down the track.

Has anybody got any thoughts on this? Have you deployed front-end on Lambda and had it work well or poorly?

Alex Jurkiewicz

12:28:39 AM

I think “light and quick” correlates more with using the technologies you know well, rather than using specific tools that have buzz

Sami

12:33:58 AM

I think that’s some wisdom I needed to hear

Oleh Kopyl

02:47:42 AM

Some say Fargate does restarts. But does it restart the whole image or just an app from CMD?

I was getting some 5xx errors from Fargate due to OOM Kills. OOM Kills are okay, but 5xx errors are not. With my EC2 instance (no Docker) systemd always restarts my Python app and since i have a load balancer, i never get 5xx errors (the worst what i can get - .5s delay on a request.

Meaning that Fargate seemingly makes whole image restarts instead of just my python app restarts (the thing which was in CMD like ["python", "main.py"]

If it’s really the case, is there any way to force Fargate to just restart my app (in the same way as systemd does it). I was trying to get systemd to work in Docker but was getting this error: /bin/sh: 4: systemd: not found. Even after i did apt update && apt install --reinstall systemd -y, i was still getting errors like this:

System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to connect to bus: Host is down

I tried using service instead of systemctl, but it had errors too:

# service crossdot-flask-inference.service on
crossdot-flask-inference.service: unrecognized service

Alex Jurkiewicz

02:51:14 AM

when a container fails healthcheck, the entire thing is replaced

Alex Jurkiewicz

02:52:58 AM

containers aren’t meant to be touched beyond the initial process being started. If you want the app to automatically restart, add a supervisor.

Systemd is one approach, but it’s pretty heavyweight. You could also use a shell script which contains something like

while true; do python main.py ; done

Oleh Kopyl

05:11:17 AM

What is supervisor? This thing? https://dev.to/rabeeaali/install-supervisor-on-aws-eb-with-laravel-5g8a

Is my python app going to be relaunched in a case of OOM kill if i launch it with the shell script?

Oleh Kopyl

05:12:03 AM

when a container fails healthcheck, the entire thing is replaced Damn.. Not something i need…

Alex Jurkiewicz

06:16:25 AM

no, supervisor is “init system”. Generally, containers run without one. init systems are designed for long-running servers, but that’s not how containers work. As you see, they follow the philosophy “if it stops working, kill it and get a new one”

2023-06-28

Balazs Varga

01:31:15 PM

aws serverless v1. how vcan I restore from backup and not from snapshot?

Alex Jurkiewicz

10:03:11 PM

What do you mean by backup if not a snapshot?

Balazs Varga

04:44:11 PM

In rds I see snapshots and backups. I wanted to restore from backup. Guess that is good only for backup point in time

Gabriela Campana (Cloud Posse)

07:23:10 PM

@Balazs Varga do you still need to restore from backup? Maybe @Alex Jurkiewicz can help

Balazs Varga

07:15:22 AM

yes, to a different location. I think I can use the aws backup to copy over the backups to different region and then restore from that backup

Gabriela Campana (Cloud Posse)

01:53:47 PM

Ok, lmk if you need help

Balazs Varga

06:45:07 PM

thanks

2023-06-29

Daniel Ade

11:28:27 AM

Is anyone well versed in github actions and aws, i’m having an issues deploying my container image to ecr i keep getting Error: Not authorized to perform sts:AssumeRoleWithWebIdentity

Warren Parad

12:07:27 PM

We have significant experience, that sounds like there is an issue with your trust policy of the role

Daniel Ade

12:36:23 PM

I gave the role admin access, i know its not good practice but i just wanted to make sure it was working

Warren Parad

12:36:41 PM

that’s not the trust policy

Warren Parad

12:36:43 PM

what’s the trust policy

Daniel Ade

12:37:21 PM

{ “Version”: “2012-10-17”, “Statement”: [ { “Effect”: “Allow”, “Principal”: { “Federated”: “arniam:oidc-provider/token.actions.githubusercontent.com” }, “Action”: “sts:AssumeRoleWithWebIdentity”, “Condition”: { “StringLike”: { “http://token.actions.githubusercontent.com<i class="em em-aud|token.actions.githubusercontent.com"</i>aud>”: “sts.amazonaws.com”, “http://token.actions.githubusercontent.com<i class="em em-sub|token.actions.githubusercontent.com"</i>sub>”: “repo:dandiggas/firstwebapp” } } } ] }

Warren Parad

12:40:08 PM

I think you are missing a *

Warren Parad

12:40:13 PM

repo:dandiggas/firstwebapp:*

Daniel Ade

12:40:26 PM

ok ill try that now

Daniel Ade

12:51:37 PM

Yeah still not working. keep getting this error message Run aws-actions/configure-aws-credentials@v2 https://github.com/Dandiggas/FirstWebApp/actions/runs/5412372323/jobs/9836381905#step<i class="em em-3"6 (node:1742) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023. https://github.com/Dandiggas/FirstWebApp/actions/runs/5412372323/jobs/9836381905#step<i class="em em-3"7

https://github.com/Dandiggas/FirstWebApp/actions/runs/5412372323/jobs/9836381905#step<i class="em em-3"8 Please migrate your code to use AWS SDK for JavaScript (v3). https://github.com/Dandiggas/FirstWebApp/actions/runs/5412372323/jobs/9836381905#step<i class="em em-3"9 For more information, check the migration guide at https://a.co/7PzMCcy https://github.com/Dandiggas/FirstWebApp/actions/runs/5412372323/jobs/9836381905#step<i class="em em-3"10 (Use node --trace-warnings ... to show where the warning was created) https://github.com/Dandiggas/FirstWebApp/actions/runs/5412372323/jobs/9836381905#step<i class="em em-3"11 Error: Not authorized to perform sts:AssumeRoleWithWebIdentity

Warren Parad

12:58:35 PM

What happened when you tried removing the condition altogether?

Daniel Ade

01:13:13 PM

It worked thats what happened lol

Daniel Ade

01:15:45 PM

Thanks a lot! I’m quite new to this so making a lot of errors

Warren Parad

01:55:47 PM

so you need some sort of condition, but that means there is a mismatch between your condition and the reality

Daniel Ade

03:15:07 PM

Yeah i just fixed it with the condition

Warren Parad

03:15:24 PM

what’s the new condition?

Daniel Ade

03:16:39 PM

Daniel Ade

03:17:59 PM

I didn’t realise that it was case sensitive

Daniel Ade

03:21:12 PM

Thanks Warren

Warren Parad

03:21:19 PM

no problem

hamiltondjh

03:24:17 PM

Hey. Earlier this year we began using the terraform-aws-sso module to manage our human access to our AWS accounts. It works really well and has been a lifesaver, so upfront thank you to everyone who added to it. .

However I think I am missing something as only recently did we have a need to make a new account assignment and because I have a depends_on for my Okta moduole to make sure the okta groups are created before the account assignment is attempted, terraform is forcing a replacement of all account assignments despite the code only adding one.

Removing the depends_on fixes it in my plan, but I worry it will fail because it isn’t aware of the dependency on my okta module.

I did some searching and I think that this PR addressed this issue already by adding a variable to handle the dependency issue.

The variable identitystore_group_depends_on description states the value should be “a list of parameters to use for data resources to depend on”.

I don’t understand what parameters it’s referring to? Is it a list of all Okta groups I create?

Erik Osterman (Cloud Posse)

03:45:41 PM

I suggest cross posting link to this message in #terraform

hamiltondjh

03:46:02 PM

Will do. Thanks Eric.

hamiltondjh

05:16:56 PM

@Simon Weil you clearly know what you’re doing since it’s your PR and the code in your example is super clean. Any ideas on how to fix this?

2023-06-30

Balazs Varga

11:31:27 AM

I have few question related to orangizations:

• I know I need to select a management account, but with delegated role can I have a user in member account to manage organization?

• can I limit this delegated role to OU ?

• if I delete the management account will it delete the all other aws accounts in organization ?

Gabriela Campana (Cloud Posse)

07:24:32 PM

@Ben Smith (Cloud Posse)

Ben Smith (Cloud Posse)

07:36:20 PM

Todo this you’d have to use the aws organizations delegation to delegate Organization management to a delegated role. So you could have your AWS Org delegated to an aws-team . By default your management account is the root account, and contains your state bucket and is where your SuperAdmin role deploys the accounts component, which creates the member accounts.
The aws organizations delegated role can be limited to an OU, meaning you just need to specify in the roles permissions
I’ve never done this, as this is where we deploy our TFState and manage organization security policies. It appears from the aws docs that it would delete the organization, resulting in your member accounts becoming standalone accounts

Deleting an organization - AWS Organizations

Learn how to delete an AWS organization that you no longer need.