SweetOps #aws for July, 2023

Discussion related to Amazon Web Services (AWS)

Archive: https://archive.sweetops.com/aws/

2023-07-02

Adnan

06:58:36 AM

Anybody using GuardDuty? How do your bills look like?

Stoor

07:36:29 AM

It all depends on usage honestly, it ranges from few dollars to thousands depending on accounts. But as an example, a account with like 7000$ spend, around 150$ of it is guardduty, the account hosts a web application.

2023-07-03

Thomas Poetke

12:29:02 PM

Hello, I have a question related to the different EC2 Terraform modules. Is it really needed to have 3 seperated modules for single instance, instance group and autoscaling? For me, it looks like ec2 group is just the ec2 module with a count. at the moment it looks hard to maintain all 3 modules

Thomas Poetke

07:31:08 PM

Just asking, because of all three modules have similiar merge requests, I change already something in the single ec2 module, which is still missing at ec2group module. looks like its hard to solve the pipeline for ec2 group module, alot of stuck pr’s there

Alex Jurkiewicz

01:28:05 AM

Are you asking why CloudPosse has three? It’s probably a mix of two reasons:

Modules which have fewer knobs are easier to reason about
As new requirements came up, CloudPosse created new modules to meet new needs

Thomas Poetke

09:24:20 AM

@Alex Jurkiewicz the EC2 module with a count would be the same like https://github.com/cloudposse/terraform-aws-ec2-instance-group but less to maintain. the instance group module is totally outdated at the moment

cloudposse/terraform-aws-ec2-instance-group

Terraform Module for provisioning multiple general purpose EC2 hosts for stateful applications.

Alex Jurkiewicz

09:51:31 AM

If that’s true, there are probably many users of one or the other who don’t want to spend time migrating

2023-07-04

Balazs Varga

12:13:16 PM

In an ALB… DO I need to enable preserve mode ? I see it is disabled default

BATeller

01:39:19 PM

The preserve mode for AWS Application Load Balancer (ALB) is useful when it comes to managing client connections and maintaining “stickiness”. By enabling preserve mode, you can maintain the connection between a client and your application’s instances, even if the targets of the load balancer change.

This is especially beneficial in scenarios where you have applications that maintain stateful connections or utilize real-time protocols. For example, when you have WebSockets or long-polling connections, the connections between the client and the instances need to be maintained.

Normally, ALB will close connections to targets as they are deregistered, but in preserve mode, existing connections remain open and new requests from existing connections are sent to other targets. This can be useful in situations where you are updating or scaling down your backend services and want to minimize disruptions.

2023-07-05

Daniel Ade

06:10:28 PM

hey my ec2 instance has been stuck on this, anyone encountered this before

Hao Wang

06:34:07 PM

It is not stuck, need to enter username/password

Daniel Ade

08:10:31 PM

I made this on terraform with the same project we worked on lol, how do you put the username and password in?

Hao Wang

08:40:48 PM

ssh key should be used, instead of username/password

Alex Jurkiewicz

11:27:04 PM

It looks like you’ve connected to the serial console. That’s not really a recommended access path. Use SSH or SSM Session Manager

2023-07-06

Balazs Varga

11:56:44 AM

do you use spot instances ? do you know anything about pricing? will it increase and reach on demand in near future ?

Erik Osterman (Cloud Posse)

07:55:27 PM

https://www.awsdocsgpt.com/

AWS Docs GPT

AI-powered Search and Chat for AWS Documentation

Alex Jurkiewicz

11:11:39 PM

This is much better than I assumed it would be. The answers are good for easy questions but quickly fail once you start asking anything which is harder

AWS Docs GPT

AI-powered Search and Chat for AWS Documentation

Erik Osterman (Cloud Posse)

11:54:09 PM

I tried only one test which previously yielded me a false answer (halucination) on ChatGPT. My question was how to enable end to end TLS using an ALB to pods on Kubernetes. The wrong answer was terminating TLS on the ALB and using security groups to protect the pod. Using this new service I got the right answer, which involved a tls sidecar.

Adnan

06:41:54 AM

I’m really a bad prompt engineer

Domagoj

07:57:07 AM

having same issues here guess we destroyed it.

Erik Osterman (Cloud Posse)

03:11:50 PM

Heh, they must have run out of GPT credits

2023-07-07

2023-07-10

Balazs Varga

10:57:30 AM

is there any issue in aws ohio ?

Balazs Varga

11:17:21 AM

problem is not there .

Sergei

06:31:33 PM

Hi , we have multiple AWS accounts across multiple regions with Transit Gateways used for connectivity. Most of the times all works good. I am trying to troubleshoot a connectivity issue between two different ec2s placed in vpcs in the same account in different regions. Does not seem like connectivity works but from what I see all the route tables, TGW route tables TGW-to-TGW peering and security groups rules are place. Does anyone know of any methods on how to troubleshoot this kind of problem?

Darren Cunningham

06:41:17 PM

https://docs.aws.amazon.com/vpc/latest/reachability/what-is-reachability-analyzer.html

What is Reachability Analyzer? - Amazon Virtual Private Cloud

Identify network connectivity issues in your virtual private cloud (VPC) using Reachability Analyzer.

Sergei

07:05:42 PM

Thank you for a suggestion! Reachability Analyzer is a great tool, but isn’t it limited to traffic inside VPC only? I looked at it some time ago and it seemed quite limited. Good idea though, I will at least jog my memory and see what it can do for me in this scenario

Sergei

07:20:50 PM

from the Reachability Analyzer docs: *Source and destination resources* The source and destination resources must be in the same Region. The source and destination resources must be in the same VPC or in VPCs that are connected through a VPC peering connection or a transit gateway. The source and destination resources can belong to different AWS accounts in the same organization from AWS Organizations.

Sounds promising

Sergei

07:53:19 PM

Does not seem this can help in my case with cross region traffic. I can enable organisational wide cross account trust but I don’t see an option for Analyzer to do cross-region tests.

Alex Jurkiewicz

11:39:54 PM

Try asking AWS support. They can often do this sorry if troubleshooting for you

Sergei

06:03:17 PM

Thank you everyone, I managed to get it resolved with few rounds of Reachability Analyzer!

2023-07-11

Balazs Varga

10:32:45 AM

is there a way to autosync snapshots over regions? based on tag maybe. or all new is fine as well

Alex Jurkiewicz

12:04:12 PM

For what service? Check AWS Backup

Balazs Varga

12:16:09 PM

for Velero. Velero creates the snapshots for me and I would like to copy/sync them to different region to able to restore backup in different region on different cluster … with aws backup I saw I can do the snapshot and copy them, but I don’t want to create snapshots by aws backup.

Mike Shade

01:23:13 PM

are you writing to s3? https://docs.aws.amazon.com/AmazonS3/latest/userguide/replication.html

Replicating objects - Amazon Simple Storage Service

Set up and configure replication to allow automatic, asynchronous copying of objects across Amazon S3 buckets.

Balazs Varga

02:09:39 PM

yeah, I know this. it is good for objects… but I have snapshots in the “aws” s3 bucket, so nnot managed by me.

Alex Jurkiewicz

10:44:00 PM

What do you mean? If you’re using Velero, you can write to any S3 bucket of your choice?

I assume you’re talking about EKS btw, you still haven’t said

Balazs Varga

08:18:04 AM

not eks, self managed cluster with kops. OK I can write to any S3 bucket, but can I set 2 ? so until my primary cluster is running it should write to 2 locations. … when pirmary goes down then I can restore everything on secondary in different region.

Balazs Varga

08:18:29 AM

in case of aws region where my primary cluster is running goes down and I don’t want to run multi region cluster

Alex Jurkiewicz

09:50:11 AM

You don’t want to write to two buckets. Write to one and replicate the bucket to one in another region

Alex Jurkiewicz

09:50:52 AM

See the link Mike posted before

Balazs Varga

10:38:58 AM

yeah, I saw that but will that work for snapshots as well? Guess it is working only if I use the https://velero.io/docs/v1.11/file-system-backup/ this one has a higher inconsistency issue than using CSI https://velero.io/docs/v1.11/csi/ or original cloud api call. Or using CSI will be the option for me and that can go to the bucket I choose. with cloud api, I still can do the snapshot copy to region but for that I think I need a job or something else.

tommy

07:25:29 PM

S3 can be replicated to other region/account

Chris Wash

03:11:31 PM

Wondering if anyone has started w an approach for how to use and/or incorporate the Aurora Blue/Green deployments feature offered by AWS with existing Terraform projects that manage Aurora clusters?

New – Fully Managed Blue/Green Deployments in Amazon Aurora and Amazon RDS | Amazon Web Services attachment image

When updating databases, using a blue/green deployment technique is an appealing option for users to minimize risk and downtime. This method of making database updates requires two database environments—your current production environment, or blue environment, and a staging environment, or green environment. You must then keep these two environments in sync with each other so […]

Alex Jurkiewicz

10:46:35 PM

We are using blue-green heavily for upgrades, especially for 5.7 to 8 migrations. We don’t manage the process with Terraform, we use clickops. We lock our Terraform state, migrate by hand over a week or two, then perform state surgery to reconcile Terraform with reality.

I think terraform is quite weak at managing complex project state transitions, so this is a pattern we use often for things like migrations, restore from snapshot, etcetera etcetera

New – Fully Managed Blue/Green Deployments in Amazon Aurora and Amazon RDS | Amazon Web Services attachment image

Joe Perez

06:13:47 PM

@Chris Wash any additional insight here from your experience?

2023-07-12

Balazs Varga

12:19:15 PM

do you use change manager for managing organization? E.g put approval process to creating/ deleting accounts, OU-s, assigning scp-s … etc etc. Maybe for assigning the assume role to able to login to another account ?

Gabriela Campana (Cloud Posse)

07:45:16 PM

@Jeremy G (Cloud Posse)

Jeremy G (Cloud Posse)

06:49:44 AM

@Erik Osterman (Cloud Posse)

Erik Osterman (Cloud Posse)

01:36:51 PM

Does GitOps + CODEOWNERS + Branch protections + IAC + PRs count for change management? Then that’s what we have in our refarch

Ibansal

02:10:39 PM

Hello All, I am using “aws ecs wait services-stable” in my CI/CD pipeline to wait for ecs service to be healthy and then make the Jenkins job successful. By default this command waits for 10 minutes to make the decision based on 2 parameter ( “maxAttempts”: 40, & “delay”: 15 sec ) . Is there any way to increase this timeout as there is no direct option provided by aws cli command. if anyone has any suggestion then Please let me know. Refrences:- https://github.com/boto/botocore/blob/b54ceeaca5d4c1316fae0a0496422572d1f6aba5/botocore/data/ecs/2014-11-13/waiters-2.json#L42-L72

    "ServicesStable": {
      "delay": 15,
      "operation": "DescribeServices",
      "maxAttempts": 40,
      "acceptors": [
        {
          "expected": "MISSING",
          "matcher": "pathAny",
          "state": "failure",
          "argument": "failures[].reason"
        },
        {
          "expected": "DRAINING",
          "matcher": "pathAny",
          "state": "failure",
          "argument": "services[].status"
        },
        {
          "expected": "INACTIVE",
          "matcher": "pathAny",
          "state": "failure",
          "argument": "services[].status"
        },
        {
          "expected": true,
          "matcher": "path",
          "state": "success",
          "argument": "services | [@[?length(deployments)!=`1`], @[?desiredCount!=runningCount]][] | length(@) == `0`"
        }
      ]
    },

Hao Wang

03:48:39 PM

It is possible to set a custom value by boto3, https://github.com/boto/botocore/blob/227c1cef8fefd2600a56dd3570a0d589def4bd52/botocore/docs/waiter.py#L142C5-L142C26

    "ServicesStable": {
      "delay": 15,
      "operation": "DescribeServices",
      "maxAttempts": 40,
      "acceptors": [
        {
          "expected": "MISSING",
          "matcher": "pathAny",
          "state": "failure",
          "argument": "failures[].reason"
        },
        {
          "expected": "DRAINING",
          "matcher": "pathAny",
          "state": "failure",
          "argument": "services[].status"
        },
        {
          "expected": "INACTIVE",
          "matcher": "pathAny",
          "state": "failure",
          "argument": "services[].status"
        },
        {
          "expected": true,
          "matcher": "path",
          "state": "success",
          "argument": "services | [@[?length(deployments)!=`1`], @[?desiredCount!=runningCount]][] | length(@) == `0`"
        }
      ]
    },

Hao Wang

03:50:24 PM

oh this is not for ECS service

Hao Wang

03:51:33 PM

as a workaround, may run services-stable multiple times in for loop

Hao Wang

03:53:54 PM

from this PR, https://github.com/boto/botocore/pull/1267, awscli supports custom waiter configuration

#1267 Expose waiter configuration

Adding so we can properly address aws/aws-cli#2761

This will allow WaiterConfig to be passed to waiter.wait() calls allowing customers to adjust waiter’s maxAttempts and delay settings. This is based on how PaginationConfig works. The current keys are formatted to match the waiter models.

waiter.wait(WaiterConfig={
    'delay': 5,
    'maxAttempts': 5,
})

Ibansal

06:01:19 PM

Thanks @Hao Wang, I will try this solution

2023-07-13

Balazs Varga

10:18:17 AM

another question with

create-db-cluster

can I specify a certificate to be used ? maybe created by amazon ? if yes, then I could set a custom domain for rds and then without changing dsn I could change the endpoint

Hao Wang

02:46:38 PM

seems not, the cert is assigned by AWS which can be downloaded in public

Hao Wang

02:51:19 PM

yeah, confirmed from a few sources

Hao Wang

02:51:37 PM

Only AWS cert in different regions can be used

Balazs Varga

06:44:45 PM

ok thanks

Nishant Thorat

04:15:01 AM

http://cloudyali.io/blogs/cis-aws-foundations-benchmark-v20-securing-aws-cloud-resources

CIS AWS Foundations Benchmark v2.0 - Securing AWS cloud resources

Bring your AWS environment into compliance with CIS AWS Foundations Benchmark and prove it at any time with automated reports and dashboards.

2023-07-14

Balazs Varga

02:20:45 PM

about aws backup. how can I set an aurora serverless backup to be backed up every hour? can I somehow set the backup windows to infinite ?

BATeller

02:36:35 PM

From my understanding with Aurora Serverless snapshots are taken automatically every 5 minutes, but AWS does not allow customizing the snapshot frequency

Also the limit on infinite backups is because the backup window cannot overlap with the weekly maintenance window for the DB cluster

Not sure your specific use-case but could you not automate (through a cron or lambda) to trigger a manual snapshot on some set interval?

Balazs Varga

07:47:01 PM

As I know yes, but from RDS I cannot copy the snapshots to a different region to able to restore from that snapshot. I would like to use aws backup apps to have a hourly task that backups the serverless DB and copy it to another region…. Really cannot do this with aws backup apps ?

Balazs Varga

07:49:16 PM

I only want to have a copy of the snapshot in a second region… and have this hourly, not daily.

Alex Atkinson

03:45:59 PM

I can’t spot the details on CloudFront object invalidation latency (cache flush) in the docs. Does anyone know what the performance profile on this action looks like? Back in the day I remember finding POPs with latency in the 10s-100s of seconds of latency.

Alex Atkinson

03:48:35 PM

AH. Google can tell when I give up on it, because the next result is what I’m looking for. https://repost.aws/knowledge-center/cloudfront-serving-outdated-content-s3

Push updated Amazon S3 content from CloudFront

I’m using Amazon CloudFront to serve objects stored in Amazon Simple Storage Service (Amazon S3). I updated my objects in Amazon S3, but my CloudFront distribution is still serving the previous ver…

Alex Atkinson

03:49:30 PM

I’m sure that detail used to be in the CloudFront invalidation docs…

Alex Jurkiewicz

02:05:40 AM

10-100secs sounds like a reasonable p95 interval to me. I’ve seen invalidations take 10+mins in unusual circumstances.

Alex Atkinson

03:42:00 PM

Depends. Anywhere you’re caching api responses that’s a long time. Fastly’s ~150ms global purge on objects/surrogate key is what I’ve come to expect. Their VCL language offers a lot of capability over cf also. Cf is really basic. But sometimes that’s all you need.

Alex Jurkiewicz

10:52:51 PM

Sorry, what I meant by sounds reasonable is that it reflects reality for CF

Alex Atkinson

10:55:11 PM

It’s better than it used to be if I remember right. I think I saw it taking +/- 10m to flush. Eesh

2023-07-18

2023-07-19

forswearbeetle

12:48:58 PM

Hello, I created a youtube video on how to create AWS real-world architecture here is a short description of the video https://youtube.com/shorts/7USwGdSFsfc for the full video here the link: https://youtu.be/VGwO7IYYPXE

Creating AWS Real World Architecture: Assurance Company #shorts attachment image

#shorts https://www.youtube.com/watch?v=VGwO7IYYPXE

Creating AWS Real World Architecture: Assurance Company

2023-07-20

2023-07-22

2023-07-23

2023-07-25

Adnan

06:04:14 AM

Hi all. Is there a way to find out which feature gates are enabled on EKS clusters?

AFAI was able to see, feature gates are not configurable with EKS. But I was wondering how to find out which of them are enabled. I found this in the docs

“The feature gates that control new features for both new and existing API operations are enabled by default.”

Does that mean that we can assume all features gates that control new features are enabled? I know I just repeated what it says but maybe someone can confirm or correct it.

Gabriela Campana (Cloud Posse)

06:08:56 PM

@Andriy Knysh (Cloud Posse) @Dan Miller (Cloud Posse)

2023-07-26

Nishant Thorat

02:41:50 PM

https://www.cloudyali.io/blogs/how-to-monitor-aws-iam-root-users-at-scale-best-practices

How to Monitor AWS IAM Root Users at Scale: Best Practices

Protect your AWS environment by monitoring IAM root users at scale. Discover the top best practices for IAM monitoring and avoid common security pitfalls.

andrey.a.devyatkin

04:29:18 PM

here is a good option https://github.com/fivexl/terraform-aws-cloudtrail-to-slack

fivexl/terraform-aws-cloudtrail-to-slack

Parse AWS CloudTrail events and send alerts to Slack for events that match pre-configured rules

2023-07-27

Abra

07:38:01 AM

Hi everyone, I have a CloudFormation infrastructure repo, What’s the best way to delivery (deploy/update/delete) the stack from the git repo to AWS CloudFormation service.

Vladimir

12:26:48 PM

https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudformation_stack.html