SweetOps #aws for May, 2023

Discussion related to Amazon Web Services (AWS)

Archive: https://archive.sweetops.com/aws/

2023-05-01

SaZaf

Hi guys, I just learned about Cloudwatch. Found that it uses metrics like CPU Usage, Disk usage etc but I also noticed that EC2 also displays many metrics in it’s Monitoring section.

Question: Do we use Cloudwatch with EC2 despite EC2 already providing useful analytics/monitoring? If yes, please share the use cases.

loren

04:42:23 PM

the ec2 metrics are really cloudwatch metrics under the covers

Mark Owusu Ayim

07:00:39 PM

Very true. Unless you want to create some personal insights(dashboards) from the instances you have, all accessed from one place then it will makes more sense to get extra features from Cloudwatch such as events for actionable purposes.

2023-05-02

Balazs Varga

09:14:01 AM

hello all, in aurora serverless, I see my cpucreditbalane dropped to 0 after a recovery triggered by aws. Is it counting same as ec2 T instances? https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/unlimited-mode-examples.html

Unlimited mode examples - Amazon Elastic Compute Cloud

The following examples explain credit use for instances that are configured as unlimited .

Paula

01:18:04 PM

Hi! im using this module https://registry.terraform.io/modules/cloudposse/ecs-alb-service-task/aws/0.68.0 in version 0.66.2 im trying to update to the latest version because everytime i change an environment variable i have to delete the service and recreate it because it doesnt take the latest task definition (generated by codepipeline) to create the new one. i tryied using redeploy_on_apply but i couldnt find any configuration which correctly takes the latest. My configuration looks like this:

module "ecs_alb_service_task" {
  source  = "cloudposse/ecs-alb-service-task/aws"
  version = "0.66.2"
  namespace                          = var.cluster_name
  stage                              = module.global_settings.environment
  name                               = local.project_name
  attributes                         = []
  container_definition_json          = module.container_definition.sensitive_json_map_encoded_list

  #Load Balancer
  alb_security_group                 = var.security_group_id
  ecs_load_balancers                 = local.ecs_load_balancer_internal_config

  #Capacity Provider Strategy 
  capacity_provider_strategies       = var.capacity_provider_strategies
  desired_count                      = 1
  ignore_changes_desired_count       = true
  launch_type                        = module.global_settings.default_ecs_launch_type

  #VPC
  vpc_id                             = var.vpc_id
  subnet_ids                         = var.subnet_ids
  assign_public_ip                   = module.global_settings.default_assign_public_ip
  network_mode                       = "awsvpc"

  ecs_cluster_arn                    = var.cluster_arn
  security_group_ids                 = [var.security_group_id]
  ignore_changes_task_definition     = true
  force_new_deployment               = true
  health_check_grace_period_seconds  = 200
  deployment_minimum_healthy_percent = module.global_settings.default_deployment_minimum_healthy_percent
  deployment_maximum_percent         = module.global_settings.default_deployment_maximum_percent
  deployment_controller_type         = module.global_settings.default_deployment_controller_type
  task_memory                        = local.task_memory
  task_cpu                           = local.task_cpu
  ordered_placement_strategy         = local.ordered_placement_strategy

  label_order                        = local.label_order
  labels_as_tags                     = local.labels_as_tags
  propagate_tags                     = local.propagate_tags
  tags                               = merge(var.tags, local.tags)

  #ECS Service
  task_exec_role_arn                 = [module.task_excecution_role.task_excecution_role_arn]
  task_role_arn                      = [module.task_excecution_role.task_excecution_role_arn]

  depends_on = [
    module.alb_ingress
  ]
}

any suggestions?

JoseF

03:30:41 PM

You are basically saying ignore_changes_task_definition = true meaning, don’t respect the future updates. It should be false.

Paula

03:47:43 PM

When i activate that option it tries to delete the service and recreate it with an older version of the task definition

JoseF

04:02:27 PM

Then your problem is not the task definition, since it suppose to use the latest version. It’s somewhere else. I don’t see the redeploy_on_apply usage which fulfill such purpose.

Paula

04:07:37 PM

Im currently modifying my original module (in version 0.66.2) such as i showed before. I guessed that redeploy_on_apply of the latest version of this module would take the latest version of the task definition and update it in the state file. Im not sure whats wrong

JoseF

04:16:16 PM

one thing is different than the other. redeploy on aply does not update the module version itself, but deploy a new task with version if is detected in the cluster. 2 different things.

Paula

04:41:43 PM

nono i know, but even if i upgrade the version and activate that option the task definition is not the latest. The picture i sended before is from 0.66.2 version with ignore_changes_task_definition = false. The next pictures are from v0.68 & redeploy_on_apply = true and ignore_changes_task_definition_false

Paula

04:43:15 PM

i tryied with diferent configurations of ignore_changes_task_definition, force_new_deployment and redeploy_on_apply and no one is working

Fizz

12:05:18 AM

If you change ‘ignore_changes_task_definition’ from true to false you should expect the service to be destroyed the first time due to the way the service is coded in the module. A second run should not require a destroy.

Fizz

12:06:31 AM

The reason it picks up an older version of v your task definition is because that is all terraform knows about. You have updated the task definition outside of terraform in code pipeline

Fizz

12:12:32 AM

If you are going to manage revisions is codepipeline you could pass in the correct task definition and version in the variable var.task_definition

2023-05-03

Bart Coddens

10:52:52 AM

I have a customer that has a huge oracle database: 120 TB, the limit on RDS is 64 TB, any suggestions ?

Fizz

12:41:01 PM

Sharding or self hosting on ec2

Hugo Samayoa

01:59:03 PM

Also talk to an AWS rep. You might get some free credits for moving such a large dataset. They would also give you some advice on your current issue

jsreed

04:38:47 PM

Aws will give your customer free credits and help cover costs on converting out of oracle for your customer… talk to the TAM

jsreed

04:39:09 PM

Otherwise ec2 self host or sharding

Balazs Varga

04:14:24 PM

is aurora serverless v1 HA compatible ?

ccastrapel

05:35:30 PM

Hi there, I wrote a blog post that y’all may be interested in. It discusses how to manage cross-account AWS IAM permissions for different teams with an open-source Python tool called IAMbic. Would love feedback!

https://www.noq.dev/blog/aws-permission-bouncers-letting-loose-in-dev-keeping-it-tight-in-prod

Noq: AWS Permission Bouncers: Letting Loose in Dev, Keeping it Tight in Prod attachment image

Ever had a slight configuration change take down production services? Wish you could give teams more AWS permissions in dev/test accounts, but less in production? Right sizing IAM policies for each team and account can be a tedious task, especially as your environment grows. In this post, we’ll explore how IAMbic brings order to multi-account AWS IAM chaos.

2023-05-04

Alex Atkinson

10:33:14 PM

For AWS Identity center, is there a way to see which accounts a group has access to via the cli? There’s no way in the console afaict.

Soren Jensen

08:21:07 AM

Not as far as I know It’s such a missing feature

Alex Atkinson

05:10:06 PM

OK. Was just making sure I didn’t just miss it somehow.

2023-05-05

Matt Gowie

05:41:15 PM

Does anyone know of any tools that will scan a set of AWS accounts for best practices? Any that are recommended? My company has a list of 40+ best practices that we’ve identified and I’m looking for solutions to quickly check these best practices against a set of accounts or AWS organization.

bradym

05:47:34 PM

I haven’t used it myself yet, but I think https://github.com/cloud-custodian/cloud-custodian sounds like what you’re looking for.

cloud-custodian/cloud-custodian

Rules engine for cloud security, cost optimization, and governance, DSL in yaml for policies to query, filter, and take actions on resources

loren

05:55:37 PM

this is a nice project maintaining a list of various tools… https://github.com/toniblyx/my-arsenal-of-aws-security-tools

toniblyx/my-arsenal-of-aws-security-tools

List of open source tools for AWS security: defensive, offensive, auditing, DFIR, etc.

loren

05:57:52 PM

if i were to start with just one tool for checking against “best practices”, it would probably be prowler https://github.com/prowler-cloud/prowler

prowler-cloud/prowler

Prowler is an Open Source Security tool for AWS, Azure and GCP to perform Cloud Security best practices assessments, audits, incident response, compliance, continuous monitoring, hardening and forensics readiness. It contains hundreds of controls covering CIS, PCI-DSS, ISO27001, GDPR, HIPAA, FFIEC, SOC2, AWS FTR, ENS and custom security frameworks.

loren

06:05:33 PM

ElectricEye is another great one… https://github.com/jonrau1/ElectricEye

jonrau1/ElectricEye

ElectricEye is a multi-cloud, multi-SaaS Python CLI tool for Cloud Asset Management (CAM), Cloud Security Posture Management (CSPM), SaaS Security Posture Management (SSPM), and External Attack Surface Management (EASM) supporting 100s of services and evaluations to harden your public cloud & SaaS environments.

Hao Wang

08:06:27 PM

yeah custodian is a good one

Hao Wang

09:13:24 PM

the others are also interesting projects, thanks

Matt Gowie

03:52:33 AM

Good stuff – Thank you folks.

Sudhish KR

05:39:52 AM

If you are looking for a SaaS solution - I would go with Aqua security … they bought a compnay called CloudSploit a few years ago, and they have a good level of reporting/remediation steps for issues that are detected.

2023-05-08

venkata.mutyala

05:22:37 AM

Just an FYI - if you plan to upgrade to the latest EBS addon for EKS (1.18.0.build1) you may want to wait: https://github.com/kubernetes-sigs/aws-ebs-csi-driver/issues/1591

We use kube-stack-prometheus and have a CPUThrottling alarm going off from our upgrade.

#1591 CPU Spike attachment image

/kind bug

What happened?

Upgraded EBS Addon in EKS and CPU usage of the node daemonsets spiked

image

What you expected to happen?

literally no change to happen

How to reproduce it (as minimally and precisely as possible)?

eksctl update addon --name aws-ebs-csi-driver --version latest \
  --cluster ${CLUSTER_NAME} \
  --service-account-role-arn arn:aws:iam::${IHSM_ARN}:role/AmazonEKS_EBS_CSI_DriverRole_${CLUSTER_NAME} \
  --force

To roll back to a non spiking version:

eksctl update addon --name aws-ebs-csi-driver --version v1.17.0-eksbuild.1 \
  --cluster ${CLUSTER_NAME} \
  --service-account-role-arn arn:aws:iam::${IHSM_ARN}:role/AmazonEKS_EBS_CSI_DriverRole_${CLUSTER_NAME} \
  --force

2023-05-10

Balazs Varga

01:35:28 PM

we have clusters using spot instances and we use cluster autoscaler. sometimes we see 504. I found few issues in autoscaler github page. how could I avoid 504 when autoscaler scale down instances?

Fizz

10:39:55 AM

Try karpenter? It drains first before removing nodes.

Erik Osterman (Cloud Posse)

08:09:17 PM

Yea, might help with Karpenter, but unless the services are deliberately removed from the ALB, you’ll still get 504s.

Erik Osterman (Cloud Posse)

08:09:30 PM

I am not sure if karpenter does that.

Balazs Varga

04:43:29 PM

We solved it with annotations. So once it cordoned, it will be removed from LB and if the cordon reason is rebalancing, then after delete from cluster it waits for another 90sec before termination. After this we dont see 504 errors in llLB logs

Erik Osterman (Cloud Posse)

05:01:47 PM

@Balazs Varga just to clarify, you cordoned the nodes with an annotation similar to this? or some other annotation?

kubectl annotate node <node-name> eks.amazonaws.com/cordon=true

Balazs Varga

05:04:04 PM

Aws termination hander cordoned the node automatically. We enabled the rebalance watch option in the past.

Balazs Varga

05:04:40 PM

I am with my laptop, but later I will check the exact annotation

Erik Osterman (Cloud Posse)

05:05:51 PM

That would be great, as I think I am not aware of this option.

Balazs Varga

05:48:58 AM

so on svc we use the following: with this we can achieve to close open connections before termination.

Balazs Varga

05:54:09 AM

and on node termination handler we use the following options:

• enableRebalanceDraining: true

• enableSpotInterruptionDraining: true

• enableScheduledEventDraining: true

Erik Osterman (Cloud Posse)

02:52:15 PM

Aha, though please note the node termination handler is redundant with functionality in Karpenter now, and no longer recommended to be deployed along side of it. That said, I don’t know if those features are available in kapenter. Thanks for sharing!

https://aws.github.io/aws-eks-best-practices/karpenter/#enable-interruption-handling-when-using-spot

Erik Osterman (Cloud Posse)

02:52:31 PM

https://karpenter.sh/docs/faq/#interruption-handling

FAQs

Review Karpenter Frequently Asked Questions

Erik Osterman (Cloud Posse)

02:53:01 PM

@Dan Miller (Cloud Posse) heads up

Balazs Varga

03:04:11 PM

I think we cannot use karpenter because of limitations with kops. we create our clusters using kops

Erik Osterman (Cloud Posse)

03:45:23 PM

Aha!

Erik Osterman (Cloud Posse)

03:45:50 PM

Yes, not compatible

Erik Osterman (Cloud Posse)

03:45:53 PM

Probably

Balazs Varga

06:51:43 AM

maybe later .

Erik Osterman (Cloud Posse)

01:50:13 PM

@Jonathan Eunice

2023-05-11

vicentemanzano6

09:52:42 AM

Hi all! do memory db for redis patch updates cause any dowmtime?

Alex Jurkiewicz

01:27:37 PM

minimal but yes

2023-05-12

Aadhesh

10:16:14 AM

Hey Everyone. Curious to know if anyone is using Turbonomics as your Cloud Financial/Cost Management tool and how is your experience when compared to Cloud Health (or) Cloudability?

Aadhesh

10:16:21 AM

Turbomic says it has automated execution actions for Rightsizing Instances. But how does it manage (or) sync the state files if the instances are managed through Terraform?

Erik Osterman (Cloud Posse)

01:05:14 PM

I would be extremely skeptical of any tool that does right sizing the “right way” in an IAC environment

Erik Osterman (Cloud Posse)

01:05:37 PM

There’s no one way to do it, and companies have all kinds of strategies.

Renesh reddy

10:31:55 AM

Hi all

Is there a way to add files( 2 files) to ECS fargate containers ? ( we are using github as source code, not able to add in github due to security reasons )

Darren Cunningham

10:37:13 AM

I’d recommend the service to pull what it needs from either Secrets Manager or S3 depending on the size

Renesh reddy

10:38:24 AM

how do we pull via task defination ? should be update service via task defination.

Darren Cunningham

10:40:15 AM

you mean, you want to add two files to a container without making a change to the image you’re pulling?

Renesh reddy

10:40:31 AM

yup

Darren Cunningham

10:40:49 AM

might be able to do that with volume mounts then

Renesh reddy

10:41:08 AM

Those 2 files are related to Auth files. ( one is private and public )

Renesh reddy

10:41:35 AM

just because of adding 2 files not sure required EFS mount points

Darren Cunningham

10:42:49 AM

you’re fairly constrained as to what options you have if you’re unwilling to update the image

Renesh reddy

10:43:15 AM

If I upload these 2 files from S3 how do I call it from task defination that these 2 files should be on xx/xxx path

Darren Cunningham

10:44:30 AM

you would update the image that you’re using to have an entrypoint script that pulls the s3 files and then starts the application as it has before

Darren Cunningham

10:44:55 AM

not just update the task def

Renesh reddy

10:45:59 AM

Or can i do it in this way.: https://aws.amazon.com/blogs/compute/managing-secrets-for-amazon-ecs-applications-using-parameter-store-and-iam-roles-for-tasks/

Darren Cunningham

10:48:54 AM

that works if you can use an environment variable rather than a file

Matt Gowie

01:02:38 AM

Does anyone have strong opinions on how to do AWS Lambda while also managing the infrastructure via Terraform? There are a bunch of options out there, but I’ve never personally seen an implementation that I liked. My team and I are working on how to do this better and are evaluating Serverless framework (CloudFormation ), AWS SAM (has TF support, but doesn’t look great), and classic “build our own”.

Would love to hear someone who has implemented a solution that doesn’t feel disjointed and has strong opinions from real experience!

loren

02:50:28 AM

What are the pain points you’ve had? I’ve enjoyed using Anton’s terraform-aws-lambda module

jose.amengual

05:45:30 AM

we recently had to do a similar assesment and we ended up with SAM

jose.amengual

05:46:13 AM

the reason behind it was the easy integration and deployment with github actions and easier to understand template options

jose.amengual

05:48:12 AM

this was only done on `VERY basic lambda deploy, which will connect to infra created by TF(SQS, SNS, RDS etc), if someone was, for example, going to deploy a lambda that required a ton of infra then that was a good candidate to move to TF only deploys hybrid with github actions

jose.amengual

05:48:26 AM

that is what we ended with and it works so far

jose.amengual

05:49:08 AM

keep in mind that EVERYTHING else was created with TF ( vpc, subnet, transit gateways, rds, sqs, sns etc)

Vlad Ionescu (he/him)

09:37:21 AM

Yeah, I go either for vanilla SAM or for Anton’s Serverless.tf modules (with or without SAM).

Vanilla CloudFormation is a bit too raw, but works if you’re a large company that wants to build something custom. Serverless Framework I would avoid because of their chaotic history and dubious ecosystem.

Everybody else is too new, too risky, or too limited.

Vlad Ionescu (he/him)

09:38:00 AM

Neither option is perfect, so expect some mild annoyances!

Vlad Ionescu (he/him)

09:39:39 AM

Usually:

• a few Lambdas => serverless.tf

• a bunch of Lambdas and complex serverless architectures => SAM

Alanis Swanepoel

12:07:07 PM

Personally, i prefer building out my stack in CDK, which bootstraps Cloudformation

Alex Jurkiewicz

11:36:01 AM

+1 to SAM. Stick with the first party tooling here

managedkaos

07:18:17 PM

last team i was on that was using IaC and Lambda did a mix of Terraform and SAM.

That is, all the underlying plumbing was deployed using TF.

The devs would use the TF resources as inputs to their SAM deployments.

it worked pretty well.

Matt Gowie

04:52:15 PM

AWESOME stuff – thanks folks! Really appreciating this community right now as I’ve talked or worked with a bunch of you already, so I trust your opinions. We’ll likely try out the TF + SAM route

Joe Perez

06:52:40 PM

I’ve used Terraform+Github Actions+ECR, I’ll be the first to say that the local dev experience isn’t great (aka slow), but it’s a simple setup where a dev pushes to their feature branch, an artifact is built by GHA and pushed to ECR, then a simple aws cli call to update the function code to the dev’s container tag. I’m looking to explore SAM+TF. when I first checked it out, I believe SAM could only use resources created by SAM and the available resource list was a bit small

2023-05-13

2023-05-14

2023-05-15

aj_baller23

06:28:22 PM

Hi All, I was wondering if anyone had experience with setting up control tower on an existing AWS account that’s part of AWS Organizations. I want to separate our current environments into their own account and implement tarraform moving forward. I want to make sure i don’t affect our current environment during this process. Any advice would be awesome. Thanks!

Hao Wang

07:57:17 PM

Wouldn’t recommend using Control Tower, CT is like a black box, hard for troubleshooting

Hao Wang

07:57:41 PM

Maybe it is just myself

tommy

01:10:10 PM

@Hao Wang what solution do you recommend for multi account organizations? I am using control tower to provision new account under AWS organization, it really is not convenient but I don’t know other solutions.

tommy

01:10:15 PM

Thanks

tommy

01:12:52 PM

@aj_baller23 if your envs are under the root account, enrolling into control tower will not affect it. Then you can create sub accounts to host separate envs.

tommy

01:14:25 PM

Cloudposse doesn’t use control tower either, they provision news accounts with their AWS components.

2023-05-16

managedkaos

06:24:54 PM

The Control Tower question got me thinking about another thing I’ve been wondering about for some time:

For SMBs, how do you manage the root account credentials for a multi-account organization?

That is, given a single AWS account that will be used to spawn off sub accounts, how do you govern access to the root email address and the 2FA keys associated with the account?

I’m specifically looking at this from the perspective of a small business or sole proprietorship that needs to keep things secure but also ensure business continuity.

Darren Cunningham

06:32:53 PM

this becomes even trickier for remote teams. but here are the two practices that I’ve seen:

most secure practice - hardware tokens issued to the people who need access and delivered via signature confirmation snail mail. core member provisions the hardware token and dispatches them, token is removed from the root account if they leave the org.

a more manageable (IMO) practice - software 2FA (e.g. Vault or 1Password) in a shared vault, password is rotated if a core member leaves the org

managedkaos

06:38:59 PM

This was also an interesting read….

https://docs.aws.amazon.com/organizations/latest/userguide/orgs_best-practices_mgmt-acct.html

Best practices for the management account - AWS Organizations

Describes best practices related to the management account in AWS Organizations.

Darren Cunningham

06:58:35 PM

ah yeah the group email address is another really good point

Kyle Johnson

08:13:14 PM

Email group specifically for the root user. Password in 1Password or similar. I set up yubikeys and mail them; only share the password on confirmation of receipt. (What Darren said.)

When someone leaves, deactivation is as simple as removing the yubikey MFA from AWS (now that they support multiple!). Avoids the need to rotate passwords. Keeps MFA as a piece of hardware which I like:

• I don’t have to think about some case where your password vault somehow gets compromised. (Hard, but not impossible.)

• You must go find your physical root MFA key should hopefully make folks think a lot harder about what they need it for vs 1P or similar mindlessly autofilling a TOTP.

Darren Cunningham

08:16:11 PM

the multiple hardware keys is essential to this. that was the biggest blocker and why we used the 1P option for so long.

Erik Osterman (Cloud Posse)

11:31:46 PM

Second everyone else. We recommend a google group. We used 1P and shared TOTP, but now multiple hardware keys are available. And with 1P we use geofencing.

Erik Osterman (Cloud Posse)

11:32:21 PM

Also, we recommend resetting the master credentials on all member accounts and enabling MFA on all root credentials for member accounts.

Darren Cunningham

11:40:10 PM

can you share more information about the 1P geofencing? I wasn’t aware of that being an option and would like to explore that a bit more

Erik Osterman (Cloud Posse)

09:34:15 PM

https://support.1password.com/firewall-rules/

Create firewall rules in 1Password Business

Learn how to use Advanced Protection to restrict where your team can access 1Password.

Darren Cunningham

09:37:57 PM

thanks

2023-05-17

Brent G

04:35:50 PM

Anybody know how to purge-with-fire unattended-upgrades in cloud-init? Because that satan of a package is blocking cloud-init itself from installing packages. I tried throwing a systemctl stop/mask into a cloud-boothook section, but that just threw errors.

Hao Wang

08:24:14 PM

unattended-upgrades will be set up beforehand, and cloud-init should just run once?

Hao Wang

08:24:24 PM

not sure if I understand

Brent G

05:45:40 PM

So deploying a new Ubuntu box, specifying packages to install with cloud-init, it gets started, but then ubuntu’s unattended-upgrades script kicks off which blocks cloud-init because it grabs the apt lock.

Brent G

05:47:00 PM

I think I got it setup with bootcmd to kill off the service, so we’ll see

managedkaos

04:39:31 PM

using cookup.ai to generate IAM policies….

Input: give permission to read from the ecr repo named container123456, and permission to start and stop ec2 instances in the us-east-1 region.

Result: Here is the AWS IAM Policy that corresponds to the plaintext description you provided:

{
  "Version":"2012-10-17",
  "Statement":[
    {
      "Effect":"Allow",
      "Action":[
        "ecr:GetAuthorizationToken",
        "ecr:BatchCheckLayerAvailability",
        "ecr:GetDownloadUrlForLayer",
        "ecr:GetRepositoryPolicy",
        "ecr:DescribeRepositories",
        "ecr:ListImages",
        "ecr:DescribeImages",
        "ecr:BatchGetImage"
      ],
      "Resource":[
        "arn:aws:ecr:us-east-1:123456789012:repository/container123456"
      ]
    },
    {
      "Effect":"Allow",
      "Action":[
        "ec2:StartInstances",
        "ec2:StopInstances"
      ],
      "Resource":"arn:aws:ec2:us-east-1:123456789012:instance/*"
    }
  ]
}

good response but I had to format the code myself…

Vlad Ciobancai

05:13:09 PM

I’m using https://flosell.github.io/iam-policy-json-to-terraform/ to format the policy to tf hcl

iam-policy-json-to-terraform - Easily convert AWS IAM policies to Terraform HCL

This tool converts standard IAM policies in JSON format (like what you’d find in the AWS docs) into more terraform native aws_iam_policy_document data source code

managedkaos

05:15:08 PM

I prefer keeping my IAM json as json and using jsonencode in TF. Makes it easier to bring existing policies in…or update my policies in TF when people tweak them outside of IaC

managedkaos

05:16:38 PM

but i guess using these two tools, you can describe a policy in plaintext an end up with native TF.

sounds like the process needs a wrapper…

Vlad Ciobancai

05:20:13 PM

make sense

managedkaos

05:21:03 PM

but TF policies are easier to read so .. its a toss up

Alex Jurkiewicz

01:00:15 AM

if I was being strict, GetRepositoryPolicy is probably not required

Alex Jurkiewicz

01:01:32 AM

actually, there are some other errors. GetAuthorizationToken requires permissions against the * resource, right?

Anup Dubey

06:51:43 AM

Hi All, I wanted to initiate a discussion regarding our kubernetes platform and how we handle services running on Kubernetes with 100% spot instances. Specifically, I would like to address a scenario where a service is running on Kubernetes with two pods in a production environment, each deployed on different spot nodes. In this situation, if one of the nodes experiences spot interruption, resulting in the pod being rescheduled to another node, and that second node also gets interrupted same time other pod also reschedule on another node in initialize state, we will encounter an outage as both pods end up in an “initialize” state. Anyone who’s aware of how we are taking care this running 100% on spot???

Alex Jurkiewicz

12:06:01 PM

you can’t completely mitigate this risk if you run on 100% spot.

If the risk is unacceptable, you can:

Run more pods on more instance types across more AZs to minimise the risk, or
run some portion of your workload on on-demand/reserved instances

tommy

12:10:24 PM

Or you could try spot ocean which will help manage the node lifecycle. You can treat spot nodes as if they are normal ones.

Alex Jurkiewicz

12:19:27 PM

the vmware product? It can’t prevent spot interruptions. Sometimes many instances will be interrupted at once

tommy

12:26:15 PM

https://spot.io/product/ocean/

Ocean: Serverless Infrastructure Container Engine

tommy

12:27:39 PM

It is a node pool scheduler, backed by spot, ondemand and reserved instances, but for k8s, it exposes as a scaling group like resource.

tommy

12:33:35 PM

It will mix different kinds of instances to prevent the worst situation.

Hao Wang

12:44:14 PM

neat

2023-05-18

2023-05-19

2023-05-22

underplank

07:37:23 AM

Hi all. Im just starting to use the cloudposse module for eks clusters. Really liking it so far. Currently I have a bitbucket pipeline that uses OIDC to assume a role in AWS to run the terraform. That role has the administrator policy. I’ve enabled the aws auth configmap and put that role inside the config map, and attached it to the “cluster-admin” group, which I assume has full powers to update anything cluster wide. So my terraform looks like this

  map_additional_iam_roles = [
    {
      rolearn  = "arn:aws:iam::***:role/workers"
      username = "system:node:{{EC2PrivateDNSName}}"
      groups   = ["system:bootstrappers", "system:nodes"]
    },
    {
      rolearn  = data.aws_iam_role.AdminRole.arn
      username = "admin-user"
      groups   = ["cluster-admin"]
    },
    {
      rolearn  = aws_iam_role.infrastructure-management.arn
      username = "pipeline"
      groups   = ["cluster-admin"]
    }
  ] 

underplank

07:38:22 AM

This works well when I use data.aws_iam_role.AdminRole.arn to login from my command line. This is temporary creds generated through AWS SSO. However when I use aws_iam_role.infrastructure-management.arn it fails

underplank

07:38:58 AM

Planning failed. Terraform encountered an error while generating this plan.
╷
│ Error: configmaps "aws-auth" is forbidden: User "pipeline" cannot get resource "configmaps" in API group "" in the namespace "kube-system"
│ 
│   with module.workload_1.kubernetes_config_map.aws_auth[0],
│   on .terraform/modules/workload_1/auth.tf line 138, in resource "kubernetes_config_map" "aws_auth":
│  138: resource "kubernetes_config_map" "aws_auth" {
│ 
╵

underplank

07:49:37 AM

Ahh I think I worked it out. I need to use “system:masters” as the group. because cluster-admin is a clusterrolebinding but not a group. The issue is I dont know how I work out what groups there actually are?

Gabriela Campana (Cloud Posse)

05:59:39 PM

@Dan Miller (Cloud Posse)

Dan Miller (Cloud Posse)

06:08:33 PM

this thread is 7 months old. Do you still have a question?

2023-05-23

vicentemanzano6

11:02:53 PM

Hi! I am having issues enrolling an AWS Account under control tower. I am receiving an error saying

AWS Control Tower cannot enroll the account. There's an error in the provisioned product in AWS Service Catalog: ProvisionedProduct with Name: null and Id: *********** doesn't exist

Is there any way to reconcile control tower and service catalog to create a new product when trying to enroll the account?

Hao Wang

11:15:50 PM

CT is hard to use for it is like a black box. Feeling you may not have enough permission or the product doesn’t exist

Alex Jurkiewicz

12:51:09 AM

this sounds like a great question for aws support

kallan.gerard

01:55:31 AM

You’re in for a fun time

kallan.gerard

01:56:27 AM

To be honest I don’t recommend it. If you’re not too deep already I’d back out.

kallan.gerard

01:57:45 AM

The trouble you’re experiencing now is not an isolated incident. These class of problems will be a continuous occurrence with CT

jonjitsu

03:28:52 PM

@kallan.gerard Do you have any recommendations for an alternative to CT?

kallan.gerard

05:19:21 AM

Hi @vicentemanzano6 I would probably just use terraform with the aws provider resources and the aws organization primitives

kallan.gerard

05:21:37 AM

https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/organizations_account

kallan.gerard

05:22:03 AM

The expand from there

kallan.gerard

05:22:34 AM

I typically use an admin/workload tf pattern for this sort of thing. I’m not sure what to call it.

kallan.gerard

05:23:42 AM

One terraform config would be for the organization root account, and would contain things like the aws_organizations_account resources for your member accounts, and the initial seed config

kallan.gerard

05:25:13 AM

Then each member account would have it’s own terraform governance config directory, which you configure with a governance accountadmin role inside each account, and you import a common stack(s) modules inside each config for any resources you want each account to have

kallan.gerard

05:31:34 AM

So like / /admin

# the config directory that runs in the org root account

# provisions org member accounts, trust access for those accounts, whatever else you need etc main.tf … /business-units … /sales /engineering /aws-account-analytics-dev main.tf # call stack module /aws-account-analytics-dev main.tf /modules /stack main.tf # module you want in every account

#aws (2023-05)

Discussion related to Amazon Web Services (AWS)

2023-05-01

2023-05-02

2023-05-03

2023-05-04

2023-05-05

2023-05-08

2023-05-10

2023-05-11

2023-05-12

2023-05-13

2023-05-14

2023-05-15

2023-05-16

2023-05-17

2023-05-18

2023-05-19

2023-05-22

2023-05-23

2023-05-24

2023-05-29