#aws (2020-03)

aws Discussion related to Amazon Web Services (AWS)

aws Discussion related to Amazon Web Services (AWS)

Archive: https://archive.sweetops.com/aws/

2020-03-01

2020-03-02

Manuel Pirez avatar
Manuel Pirez

Hi, I would like to know if someone have an EKS cluster in any environment with only fargate implemented with the terraform-aws-eks-fargate-profile module. I have tried to deploy a cluster with these module and the pods always remain pending. I even deployed with eksctl and I was comparing permissions of IAM, VPC, Tags etc … and apparently it is the same on the AWS side, but with eksctl the pods work and with the terrafom modules not.

cloudposse/terraform-aws-eks-fargate-profile

Terraform module to provision an EKS Fargate Profile - cloudposse/terraform-aws-eks-fargate-profile

Andriy Knysh (Cloud Posse) avatar
Andriy Knysh (Cloud Posse)
cloudposse/terraform-aws-eks-fargate-profile

Terraform module to provision an EKS Fargate Profile - cloudposse/terraform-aws-eks-fargate-profile

Andriy Knysh (Cloud Posse) avatar
Andriy Knysh (Cloud Posse)
cloudposse/terraform-aws-eks-fargate-profile

Terraform module to provision an EKS Fargate Profile - cloudposse/terraform-aws-eks-fargate-profile

Andriy Knysh (Cloud Posse) avatar
Andriy Knysh (Cloud Posse)
cloudposse/terraform-aws-eks-fargate-profile

Terraform module to provision an EKS Fargate Profile - cloudposse/terraform-aws-eks-fargate-profile

Andriy Knysh (Cloud Posse) avatar
Andriy Knysh (Cloud Posse)

before you deploy any k8s resource to the fargate nodes (into the namespace for which the fargate profile was created), the nodes will be in Pending state

Andriy Knysh (Cloud Posse) avatar
Andriy Knysh (Cloud Posse)

@Manuel Pirez ^

Manuel Pirez avatar
Manuel Pirez

Thanks @Andriy Knysh (Cloud Posse)

2020-03-03

Karoline Pauls avatar
Karoline Pauls

So I’m trying to understand what the minimal permissions OpsWorks requires are - https://www.terraform.io/docs/providers/aws/r/opsworks_stack.html.

I’m supposed to give it a service role and give instances a default profile. The only thing I’m supposed to use OpsWorks for are users and SSH keys, I want the rest immutable.

Now, logically this would only require opsworks agents running on instances running on instances to be able to pull keys. (This is VERY simple on Google Cloud… but this is AWS…)

Can I give opsworks an empty policy and the instances - some policy that only allows it to pull keys somehow?

AWS: aws_opsworks_stack - Terraform by HashiCorp

Provides an OpsWorks stack resource.

Karoline Pauls avatar
Karoline Pauls

Does this mean I cannot use OpsWorks to manage SSH keys (and nothing more, ever) on my spot instances in ASGs?

Maciek Strömich avatar
Maciek Strömich

@Karoline Pauls why do you want to use ssh when you have system manager available?

Karoline Pauls avatar
Karoline Pauls

TBH I don’t want to have anything to do with OpsWorks and I don’t know what system manager is.

Karoline Pauls avatar
Karoline Pauls

I just want to get the SSH keys thing out of the way without doing anything stupid in IAM.

Karoline Pauls avatar
Karoline Pauls

I think i’ll just ignore what was done before and will use https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Connect-using-EC2-Instance-Connect.html

Connect to your Linux instances using EC2 Instance Connect.

Karoline Pauls avatar
Karoline Pauls

Because I don’t have time for learning about yet-another-semiredundant-overengineered-feature-of-aws. Don’t want to sound dismissive but I want working infrastructure with few carefully chosen bits of AWS, not a menagerie of AWS services.

1
Karoline Pauls avatar
Karoline Pauls

Looks like AWS also doesn’t recommend OpsWorks for what I’m supposed to use it for because stack creation is taking 20 minutes already.

Nikola Velkovski avatar
Nikola Velkovski

I would avoid OpsWorks competelly, I’ve had the chance of using it and I can only describe it as a mess.

Maciek Strömich avatar
Maciek Strömich

@Nikola Velkovski puppet 4 life! :D

Karoline Pauls avatar
Karoline Pauls

I liked Salt, you could render states with jinja (unlike Ansible when i used it) and since Jinja can almost run like normal Python code and definitely can build data structures, for advanced uses you could use the json filter to output some dynamically built code that would parse as YAML.

Of course, as someone who dabbled with Clojure, I find this whole “let’s render data with text interpolation” thing shameful to our industry. Any expression-oriented dynamically typed programming language with rich data literals may be used to render heterogeneous data.

Karoline Pauls avatar
Karoline Pauls
templatefile - Functions - Configuration Language - Terraform by HashiCorp

The templatefile function reads the file at the given path and renders its content as a template.

Nikola Velkovski avatar
Nikola Velkovski
03:51:35 PM
3
Karoline Pauls avatar
Karoline Pauls

it seems i managed to find the minimal set of permissions to register an instance, now i’m seeing it wants to install Ruby before proceeding (and gets 403)… other than the 403, installing Ruby is exactly something that I want to happen on Spot EC2 Kubernetes workers in an autoscaling group… Where I want startup times to be as fast as possible…

maarten avatar
maarten

Hi, why do you need to install ruby ?

Karoline Pauls avatar
Karoline Pauls

I don’t, OpsWorks agent wants

joshmyers avatar
joshmyers

I’d have thought that would be all done in an omnibus package or similar

joshmyers avatar
joshmyers

Been many years since used Chef (never used opsworks)

maarten avatar
maarten

Opsworks with spot instances for k8s workers, am I reading this right ?

joshmyers avatar
joshmyers

Agree with others here, wouldn’t go near Opsworks especially for your use case. ^^

maarten avatar
maarten

You will loose so much time with opsworks, and it’s from a time docker wasn’t really there yet.

randomy avatar
randomy

Why are you using Opsworks when you clearly don’t want to?

Karoline Pauls avatar
Karoline Pauls

I agree with you all, I wouldn’t touch OpsWorks too

maarten avatar
maarten

but ?

Karoline Pauls avatar
Karoline Pauls

But someone decided to use OpsWorks to manage people’s SSH access while I was busy working in another team porting a particularly blobby bit of software to Python 3.

Karoline Pauls avatar
Karoline Pauls

maarten avatar
maarten

Ah ok, but then it’s a good moment to communicate that to this person that this might not be the best way forward, copy paste the slack to this person when needed.

randomy avatar
randomy

and start again

Nikola Velkovski avatar
Nikola Velkovski

include me in the screenshot!

Nikola Velkovski avatar
Nikola Velkovski

OPSWORKS

maarten avatar
maarten

Also why is ssh access needed to begin with ?

Karoline Pauls avatar
Karoline Pauls

most of the time it’s unnecessary but sometimes it’s useful to SSH to strace something in a container (yes, there is probably a tool for that), or do other sorts of debugging

Karoline Pauls avatar
Karoline Pauls

I don’t really SSH there most of the time

maarten avatar
maarten

For executing inside the container kubectl exec can be used, ssh is not needed there.

Karoline Pauls avatar
Karoline Pauls

you cannot really strace this or other containers from there

Karoline Pauls avatar
Karoline Pauls

IIRC even if you run as the root, you cannot [normally?] strace from docker because that would circumvent security

Karoline Pauls avatar
Karoline Pauls

yeah, you need --cap-add sys_ptrace

maarten avatar
maarten

Ok but in that case you can also run telepresence from a host with trace capabilities and do it there. https://www.telepresence.io

Home - Telepresence

Telepresence: a local development environment for a remote Kubernetes cluster

maarten avatar
maarten

Might look complicated, but it’s far less complicated than maintaining Opsworks

Karoline Pauls avatar
Karoline Pauls

i currently can ssh, except it’s done by a 10 lines long script executed on instance init

Karoline Pauls avatar
Karoline Pauls

looks like it will remain so for now because I’ve got real work to do

Karoline Pauls avatar
Karoline Pauls

The following instructions explain how to connect to your Linux instance using EC2 Instance Connect.

randomy avatar
randomy

https://docs.aws.amazon.com/systems-manager/latest/userguide/ssm-agent.html is worth a look too. It can be used for SSH access (look up SSM Session Manager) and also to run commands on multiple instances and get back the results (look up SSM Run Commands)

Install SSM Agent on Amazon EC2 instances, and on-premises servers and virtual machines (VMs), to enable AWS Systems Manager to update, manage,and configure these resources.

maarten avatar
maarten

here an example of a bastion using the ssm agent

https://github.com/Flaconi/terraform-aws-bastion-ssm-iam

Flaconi/terraform-aws-bastion-ssm-iam

AWS Bastion server which can reside in the private subnet utilizing Systems Manager Sessions - Flaconi/terraform-aws-bastion-ssm-iam

1
Karoline Pauls avatar
Karoline Pauls

looks better

Chris Fowles avatar
Chris Fowles

OpWorks is a steaming pile of trash - I’ve had to do demos of it for AWS and even in that tightly constrained scenario it rarely worked

Chris Fowles avatar
Chris Fowles

SSM instance connect is actually worth a look though - the rest of the SSM space is pretty questionable as far as I’m concerned (except param store) but the instance connect takes away a lot of pain pretty simply - gives you iam controlled, audited connectivity from either console or cli.

2020-03-04

Saichovsky avatar
Saichovsky

Anyone able to help me with a CloudFormation/LandingZone issue? (description in thread)

Saichovsky avatar
Saichovsky

I have two templates - one defining the execution role and the other the assumed role. I need to link these accounts in a trust relationship. How do I import the value of the execution role into the template for the assumed role? I had created an Outputs section in the executionrole.template, hoping to use SSM but I feel that it will get exported to the SSM in the master account while the value will be needed in the child account. So I need to pass the execution role ARN from one account to a stack running in a different account

RB avatar

has anyone got any direction on the tls issues surrounding bucket names with periods ?

the S3 deprecation plan https://aws.amazon.com/blogs/aws/amazon-s3-path-deprecation-plan-the-rest-of-the-story/
Bucket Names with Dots – It is important to note that bucket names with “.” characters are perfectly valid for website hosting and other use cases. However, there are some known issues with TLS and with SSL certificates. We are hard at work on a plan to support virtual-host requests to these buckets, and will share the details well ahead of September 30, 2020.
For example, we cannot use the <https://company.bucket-name.s3.amazonaws.com/mypath> url if our bucket name containers a period. In this case the bucket name is company.bucket-name. If we use curl and its https link, it will fail unless we skip certificate validation using --insecure

loren avatar

don’t use periods in bucket names?

RB avatar

lol. let me just get in my time machine

loren avatar

have you tried using cloudfront as a workaround? you can use cloudfront to front an s3 bucket for https access

RB avatar

we haven’t. for now, we were waiting to hear back from amazon to see what they say while we use the old url

loren avatar

though the article says they are trying to come up with a solution for bucket names with periods…
Bucket Names with Dots – It is important to note that bucket names with “.” characters are perfectly valid for website hosting and other use cases. However, there are some known issues with TLS and with SSL certificates. We are hard at work on a plan to support virtual-host requests to these buckets, and will share the details well ahead of September 30, 2020.

Host a static website on Amazon S3 by configuring your bucket for website hosting and then uploading your content to the bucket.

RB avatar

they did say they would share the details well ahead of the sep 30, 2020 timeframe so looking for the official reply.

RB avatar

i quoted that same bit in the OP

loren avatar

reading is essential!

1
RB avatar

ive asked my TAM and am waiting to hear back

loren avatar

they’ve adjusted the plan once already. i’m half-expecting the date to push out. with these blog posts being a scare-tactic to get customers to come forward with issues and use cases, and start folks changing behavior earlier so there are fewer issues after the real, eventual change

loren avatar

my brain just exploded

RB avatar

we also depend heavily on other company’s s3 urls and they use periods in their buckets

RB avatar

now we have to nag them to get them to add an access point? ridiculous…

loren avatar

yep

loren avatar

well, it sounded from the blog that if the bucket already exists then you’ll be fine. just new buckets after 30 september get impacted
Revised Plan – Support for the path-style model continues for buckets created on or before September 30, 2020. Buckets created after that date must be referenced using the virtual-hosted model.

1
Tan Quach avatar
Tan Quach

Seems the version for this module went from 0.7.0 to 0.3.2 recently https://github.com/cloudposse/terraform-aws-s3-bucket/releases

Is that the correct next version?

cloudposse/terraform-aws-s3-bucket

Terraform module that creates an S3 bucket with an optional IAM user for external CI/CD systems - cloudposse/terraform-aws-s3-bucket

RB avatar

^ maybe in the #terraform channel

cloudposse/terraform-aws-s3-bucket

Terraform module that creates an S3 bucket with an optional IAM user for external CI/CD systems - cloudposse/terraform-aws-s3-bucket

2
Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

We sometimes release a bug fix for earlier versions of the module for 0.11

omerfsen avatar
omerfsen

Issue is the version went backwards I think

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

Also, one heads up on this module: make sure the lifecycle rules are right for you

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

we’re going to disable them by default in an upcoming PR, but still expose the functionality

Tan Quach avatar
Tan Quach

(also not sure where to post that question )

2020-03-05

omerfsen avatar
omerfsen

Hello team. What do you suggest to fine tune permissions on AWS EKS pods ? I am planning to setup an multi-tenant AWS EKS and plan to use Namespaces for seperation but i need some advice I have found https://aws.amazon.com/blogs/opensource/introducing-fine-grained-iam-roles-service-accounts/ . I know AWS EKS does not have built-in fine tuned separation of duties. So what would be your suggestion ? I plan to hold each customer on separate namespace and only allow a service account to accesss to specific namespac

Introducing fine-grained IAM roles for service accounts | Amazon Web Servicesattachment image

Here at AWS we focus first and foremost on customer needs. In the context of access control in Amazon EKS, you asked in issue #23 of our public container roadmap for fine-grained IAM roles in EKS. To address this need, the community came up with a number of open source solutions, such as kube2iam, kiam, […]

omerfsen avatar
omerfsen

Also i am planning to use Fargate on this setup.

omerfsen avatar
omerfsen

The IAM roles for service accounts feature is available on new Amazon EKS Kubernetes version 1.14 clusters, and clusters that were updated to versions 1.14 or 1.13 on or after September 3rd, 2019. Existing clusters can update to version 1.13 or 1.14 to take advantage of this feature. For more information, see

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

We use IAM roles for service accounts, and associate each service account with the pods that need it. E.g. we create an IAM role with the requisite polices for external-dns then associate that with a service account that we pass to the external-dns deployment/pods.

The IAM roles for service accounts feature is available on new Amazon EKS Kubernetes version 1.14 clusters, and clusters that were updated to versions 1.14 or 1.13 on or after September 3rd, 2019. Existing clusters can update to version 1.13 or 1.14 to take advantage of this feature. For more information, see

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

We then do this for every other service we deploy

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

like cert-manager

omerfsen avatar
omerfsen

I see so you dont use kiam or kube2iam but this newly announced feature of AWS Eks https://aws.amazon.com/blogs/opensource/introducing-fine-grained-iam-roles-service-accounts/

Introducing fine-grained IAM roles for service accounts | Amazon Web Servicesattachment image

Here at AWS we focus first and foremost on customer needs. In the context of access control in Amazon EKS, you asked in issue #23 of our public container roadmap for fine-grained IAM roles in EKS. To address this need, the community came up with a number of open source solutions, such as kube2iam, kiam, […]

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

Yes, for EKS we use the new service accounts. for kops we still use kiam- haven’t explored it yet.

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

Service accounts are much better once you have it working.

kskewes avatar
kskewes

Yep to service accounts. For multi tenant if you need hard boundaries I’d consider a separate cluster. Otherwise namespace network policy, service accounts, pod security policy, OPA on admission, etc.

davidvasandani avatar
davidvasandani

When making requests to a private AWI GW through an internal NLB the request is routed to the correct stage but the stage name is also included in the resourcePath which breaks the request. This only happens when making the request thought the NLB.

2020-03-06

Maciek Strömich avatar
Maciek Strömich

hey, does anyone has ready to use cloudwatch alarms for kinesis firehose in cloudformation format e.g. for Delivery.DataFresshness, or ThrottledRecords or even better for those math expressions specified in https://docs.aws.amazon.com/firehose/latest/dev/monitoring-with-cloudwatch-metrics.html#firehose-metric-dimensions? (my friday lazy ass is asking (-: )

Learn how to use CloudWatch Metrics to monitor delivery streams in Amazon Kinesis Data Firehose.

joshmyers avatar
joshmyers

Not for that exact use but math expressions in CL alarms aye…

Maciek Strömich avatar
Maciek Strömich

I wonder why they don’t include cw alarms in the docs of the resource.

Maciek Strömich avatar
Maciek Strömich

e.g. https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-kinesisfirehose-deliverystream.html has a complete example with bucket, roles, policies and deliverystream

Use the AWS CloudFormation AWS::DeliveryStream resource for KinesisFirehose.

Maciek Strömich avatar
Maciek Strömich

why they wouldn’t just add an example cw alarm in there?

Maciek Strömich avatar
Maciek Strömich

ok, so for data freshness it should alarm only if 900 secs mark is crossed because that’s the maximum firehose limit

Maciek Strömich avatar
Maciek Strömich

for anyone interested ;-))

  FirehoseDeliveryDataFreshness:
    Type: AWS::CloudWatch::Alarm
    Properties:
      AlarmDescription: Firehose Delivery Data Freshness Alarm
      AlarmName: FirehoseDeliveryDataFreshness
      ComparisonOperator: GreaterThanThreshold
      Period: 60
      EvaluationPeriods: 3
      Threshold: 150
      Statistic: Maximum
      MetricName: DeliveryToS3.DataFreshness
      Namespace: AWS/Firehose
      Dimensions:
        - Name: DeliveryStreamName
          Value: !Ref FirehoseDeliveryStream
      TreatMissingData: breaching
      AlarmActions:
        - !Ref 'AlarmTopicArn'
      InsufficientDataActions: 
        - !Ref 'AlarmTopicArn'
Maciek Strömich avatar
Maciek Strömich

my bufferinghints are set to 120 secs which means that on avg kinesis will deliver on the 121 second.

Maciek Strömich avatar
Maciek Strömich

I gave it a littlebit more room

2020-03-10

kj22594 avatar
kj22594

Hey all, I have a two questions regarding AWS Parameter Store:

1) Does anyone know of a good way to find when the last time a secret has been accessed? 2) Is there a good way to find how many times a given parameter has been accessed in a given time frame (ideally 24 hours)?

tomv avatar

I think cloudtrail would be best suited to answer both questions

kj22594 avatar
kj22594

So i’ve been looking into using cloudtrail for the latter. For the former I was hopeful that I wouldn’t have to do something like query all cloudtrail logs for the last time GetParameter(s) was called on each secret

Alex Siegman avatar
Alex Siegman

Anyone familiar with Firehose delivery to Redshift? From everything I can tell, it seems like your redshift needs to be publicly addressable (IE in a “public” subnet with an internet gateway) for Firehose to talk to it, complete with a /27 CIDR block range for Firehose to allow through the SG. Seems just weird to me to keep data stores publicly accessible like that. Is there any configurations out there where it can stay in the private subnet, but still get delivered to from firehose?

Maciek Strömich avatar
Maciek Strömich

You can deliver to S3 and pull the data from there into redshift.

Alex Siegman avatar
Alex Siegman

Yeah, thought of that work around this morning before seeing the message. Seems to be the way to do it.

David Lozano avatar
David Lozano

Hi @Alex Siegman . I think that you can achieve this using aws vpc endpoint. With Redshift in the VPC, it should be able to communicate with Firehose through the endpoint without leaving the vpc(aws backbone). I haven’t used this solution but it may work for you.

https://docs.aws.amazon.com/vpc/latest/userguide/vpc-endpoints.html

Use a VPC endpoint to privately connect your VPC to other AWS services and endpoint services.

Alex Siegman avatar
Alex Siegman

my impression was that was for outbound connectivity to services, if redshift was “Reading from” firehose, not the public firehose service delivering to redshift. Could be wrong. Might try it in a test account

casey avatar

anyone have strong opinions on where ETL/ML structure should be set up? i currently have airflow in our staging k8s cluster but its kind of a pain to manage cross account permissions: • could move airflow to our prod k8s (worried about doing this) • could create a new k8s cluster in prod account and move it there • or could create a new aws data account that has access to everything? any insights/experiences are appreciated

Marcin Brański avatar
Marcin Brański

Have you tried/seen kubeflow?

casey avatar

@Marcin Brański does kubeflow do ETL?

casey avatar

but yeah for ML side at least it seems good, but havent tried

2020-03-11

Zachary Loeber avatar
Zachary Loeber

https://github.com/bottlerocket-os/bottlerocket -> AWS’s container OS looks interesting. Has an update strategy that reminds me of upgrading F5 LTMs….

bottlerocket-os/bottlerocket

An operating system designed for hosting containers - bottlerocket-os/bottlerocket

2020-03-12

Riak avatar

Hello, Im looking for . modules that deploy eks with s3 backend + dynamodb for lock state

2020-03-13

Maciek Strömich avatar
Maciek Strömich
Building faster, lower cost, better APIs – HTTP APIs now generally available | Amazon Web Servicesattachment image

In July 2015, AWS announced Amazon API Gateway. This enabled developers to build secure, scalable APIs quickly in front of a variety of different types of architectures. Since then, the API Gateway team continues to build new features and services for customers. Figure 1: API Gateway feature highlights timeline In early 2019, the team evaluated […]

2020-03-16

Karthik Sadhasivam avatar
Karthik Sadhasivam

Hi Guys, I am new to this channel and trying to get some advice on the rolling update EC2 on the ASGs. I am trying to use this module https://registry.terraform.io/modules/cloudposse/ec2-autoscale-group/aws/0.4.0 and seeing that everytime I update userdata, instance type it just creates a new version of launch template but doesnt do any rolling update on the ASG. Is there is any sort of workarounds available as discussed in https://github.com/hashicorp/terraform/issues/1552

aws: Allow rolling updates for ASGs · Issue #1552 · hashicorp/terraform

Once #1109 is fixed, I&#39;d like to be able to use Terraform to actually roll out the updated launch configuration and do it carefully. Whoever decides to not roll out the update and change the LC…

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

Please avoid cross posting, but if you do, link to your original post. #terraform is the correct channel

aws: Allow rolling updates for ASGs · Issue #1552 · hashicorp/terraform

Once #1109 is fixed, I&#39;d like to be able to use Terraform to actually roll out the updated launch configuration and do it carefully. Whoever decides to not roll out the update and change the LC…

Karthik Sadhasivam avatar
Karthik Sadhasivam

got it

Karthik Sadhasivam avatar
Karthik Sadhasivam

thanks

2020-03-18

Santiago Campuzano avatar
Santiago Campuzano

Morning everyone ! Quick VPC/EC2 question. When you change the Route Table for a subnet, is it required to restart/launch new EC2 instances on that subnet to reflect the change ? Is it immediate ?

androogle avatar
androogle

no ec2 instance restart should be required, it should be near immediate

2
androogle avatar
androogle

I guess there are some exceptions, but still not on the EC2 side. Like if you’re using BGP on a VPN and the routes are being propagated from your edge-device, then that device may need to push updates?

androogle avatar
androogle

To be honest I’m not super clear on those VPN / Gateway scenarios. But if you’re just talking about entirely within the context of a single VPC, you should be good

Santiago Campuzano avatar
Santiago Campuzano

Thanks a lot for the clarification @androogle !

1

2020-03-20

David avatar

I’m trying to sign a request on an ECS task to sts using my Instance profile IAM Role. When I check the IAM Request headers on my requests, it looks like the session token for the instance role is not being included.

Is there a way to get the current session token on the task?

Marcin Brański avatar
Marcin Brański

Wouldn’t you prefer to use task role instead of instance role?

2020-03-21

2020-03-23

RB avatar

How do people here lower replication lag in aurora replicas ? I’m looking at write operations and see a direct correlation with spikes in replication lag which is expected but the spikes are well over 100 ms reaching around 1 sec

androogle avatar
androogle

Its been a while since I’ve messed with perf tuning Aurora but I would look at:

• instance classes you’re using since that directly impacts network throughput

• provisioned iops

• memory buffers (in cases of mysql / pgsql)

• any development usage of the replica (data-warehouse / etl / jobs)

1
1
maarten avatar
maarten

Isn’t aurora replication done at I/O layer, have you contacted support ?

RB avatar

we haven’t contacted support because no degradation has been seen. this was just unnerving to see from our developers and i was wondering if any other people have noticed issues and successfully lowered replication lag

RB avatar

according to aws, it looks like replication lag is correlated well with write operations and we can see that side by side in the metrics

RB avatar

for now, we’re raising our alert from 0.2 sec to 1.5 sec since with 1 sec lag we’re not seeing issues

Meb avatar

We removed READ replicas due to the lag on my project. The lag can mount to few seconds on heavy write operations.

1
androogle avatar
androogle

@Meb so you didn’t need high availability?

1
Meb avatar

We were having data state inconsistencies that blew up our workflow, so we end up removing the replica and up sizing the main cluster

Meb avatar

I mean for READ replica’s

Meb avatar

It was a pain the read replica’s in aurora… too much lag

2020-03-24

Santiago Campuzano avatar
Santiago Campuzano

Hello People !! I have a simple problem and I would like to know if anyone here have worked around it. I need to know , from within an EC2 instance, what is the Target Group that the instance is attached to, and then be able to de-register from it…

androogle avatar
androogle

I could be wrong but I don’t think you’ll find anything “standard” in this use-case for the following reasons:

• it predicates the instance is using a role (which isn’t required in general for launching an instance)

• the roles policy permissions would be pretty use-case specific

• sadly there’s no meta-data available for assigned tg’s (https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-categories.html)

Instance Metadata Categories - Amazon Elastic Compute Cloud

The following table lists the categories of instance metadata.

androogle avatar
androogle

It might help if you provided more context around the use-case. I’d wonder why not conditionally assign the instance in whatever process is doing that

Santiago Campuzano avatar
Santiago Campuzano

Thanks for the reply @androogle ! . We have an instance role with the proper permissions for querying the Load Balancers and Target Groups… and registering and de registering from them

androogle avatar
androogle

what is the event that you’d like to trigger the removing of the instance from the target group?

Santiago Campuzano avatar
Santiago Campuzano

We have a super big application with around 20 different Target Groups on Different AWS zones… so I want to create a small Python/bash script that would be executed by the Java JVM whenever there is a critical condition, so the instance can de register itself from the TG

Santiago Campuzano avatar
Santiago Campuzano

So that script needs to know what is the TG the instance is attached to

Santiago Campuzano avatar
Santiago Campuzano

Using the AWS CLI

Santiago Campuzano avatar
Santiago Campuzano
aws elbv2 deregister-targets \
    --target-group-arn arn:aws:elasticloadbalancing:us-west-2:123456789012:targetgroup/my-targets/73e2d6bc24d8a067 \
    --targets Id=i-1234567890abcdef0
androogle avatar
androogle

got it. I’m a python person so I’d probably do it with a mixture of metadata to get the instance-id and boto3 to get the target-group

Santiago Campuzano avatar
Santiago Campuzano

Yep… the Instance ID is a piece of cake … getting the Target Group is more challenging … ’case we have 20+ TargetGroups

Santiago Campuzano avatar
Santiago Campuzano

I mean.. it’s doable on either Python or Bash… I just wanted to avoid rewriting the wheel.. in case someone have already created it

androogle avatar
androogle

yeah I follow. I was just saying because of those initial bullet-points it’d be hard for anyone to make a standardized approach to that. Its a pretty unique use-case

Santiago Campuzano avatar
Santiago Campuzano

Yep… seems like

androogle avatar
androogle

Out of curiosity, what happens with the instances that are failing?

Santiago Campuzano avatar
Santiago Campuzano

Java internal issues …. OutOfMemoryExceptions

androogle avatar
androogle

like is it possible to configure the java app to return an http code that would cause the health-check to fail and remove it from the TG ?

Santiago Campuzano avatar
Santiago Campuzano

and other conditions …. so we want to take the instance of out the load balancer so we can perform post-mortem

Santiago Campuzano avatar
Santiago Campuzano

Otherwise the instance is terminated by the ASG

androogle avatar
androogle

ahhh I gotcha

Santiago Campuzano avatar
Santiago Campuzano

Anyways … @androogle thanks for much for your interest and for the tips !

1
androogle avatar
androogle

Good luck

Santiago Campuzano avatar
Santiago Campuzano

TY … I will post the resulting script..

Santiago Campuzano avatar
Santiago Campuzano

I just don’t want to reinvent the wheel in case someone has a script for this

Santiago Campuzano avatar
Santiago Campuzano

Thanks a lot !

2020-03-25

Pierre Humberdroz avatar
Pierre Humberdroz

Hey,

can I somehow see which instances are out of stock for a specific aws region? we have been unable to launch c5.xlarge in eu-central-1a a couple of times this week and I am trying to find a source that will tell me when something is unavailable

androogle avatar
androogle

do you mean spot or just in-general you couldn’t launch a c5.xlarge ?

Pierre Humberdroz avatar
Pierre Humberdroz

in general

Pierre Humberdroz avatar
Pierre Humberdroz
1
androogle avatar
androogle

I’ve never run into that. hm I wonder if there’s anything in the API

androogle avatar
androogle
DescribeInstanceTypeOfferings - Amazon Elastic Compute Cloud

Returns a list of all instance types offered. The results can be filtered by location (Region or Availability Zone). If no location is specified, the instance types offered in the current Region are returned.

androogle avatar
androogle

maybe that?

androogle avatar
androogle

hmm. is it presently happening? I don’t know if that metric is life or “general” availability

aws ec2 describe-instance-type-offerings  --region=eu-central-1 | grep -B1 -A3 c5.xlarge
        {
            "InstanceType": "c5.xlarge",
            "LocationType": "region",
            "Location": "eu-central-1"
        },
Pierre Humberdroz avatar
Pierre Humberdroz

no happened mostly last two days in the mornings

androogle avatar
androogle

but that still doesn’t narrow it down to zone

androogle avatar
androogle

I’ve had issues where I can get certain instance types in us-east-1a but not like us-east-1e

androogle avatar
androogle

next time it happens you could try: aws ec2 describe-instance-type-offerings --region=eu-central-1 --location-type availability-zone | grep -B1 -A3 c5.xlarge

androogle avatar
androogle

if it doesn’t return anything for the zone its erroring on then you know you can use that call

Pierre Humberdroz avatar
Pierre Humberdroz

I will take a look thanks @androogle

1
Tyrone Meijn avatar
Tyrone Meijn

Hey guys! I have an ECS service running my api in containers and today I noticed that the running containers stopped after 24 hours. It happened on staging as well on production today after exactly 24 hours, with the events separated excatly by time the deployment from staging to production took. Is this expected behavior?

androogle avatar
androogle

are these running as a service or a 1 off task?

Tyrone Meijn avatar
Tyrone Meijn

This means they are running as a service right?

androogle avatar
androogle

yup!

androogle avatar
androogle

have you looked at the service / task logs when they stop?

androogle avatar
androogle

also when you look at the stopped tasks, does it give a reason?

Tyrone Meijn avatar
Tyrone Meijn

This is the event log in the service itself. To me it reads like ECS itself decided to drain connections and stop the containers, rather than an unexpected error occurring in the containers.

androogle avatar
androogle

is the service mapped to a target-group?

androogle avatar
androogle

if the target in the target-group fails health-checks, it will de-register the task

androogle avatar
androogle

Also I’d check if the service is logging to cloudwatch (or anywhere) and check the logs of the api itself

androogle avatar
androogle

but to your question, no thats not normal / expected

androogle avatar
androogle

happy to help debug it with you

Tyrone Meijn avatar
Tyrone Meijn


is the service mapped to a target-group?
Yes, indeed!
if the target in the target-group fails health-checks, it will de-register the task
I see in the ALB logs that there were HTTP 503 codes during the same time!

Tyrone Meijn avatar
Tyrone Meijn


happy to help debug it with you
Thank you, really appreciate that! For your information, I’m quite junior in Ops, but still the most experienced in our company.

androogle avatar
androogle

cool np at all What I would try and do if I were in that position is find the api logs and corollate them to the 503 response codes if you can

androogle avatar
androogle

see if its an app issue, environment issue or combo of both

androogle avatar
androogle

if the services have been setup to use the aws logger, you may find logs in cloudwatch that are for that api service

Tyrone Meijn avatar
Tyrone Meijn

Thanks I found the problem: from the api we ping the postgres database to see if it’s still available. The client from node-postgres had an unhandled error, leading to a crash. Seems like the database drops connections active longer than 24 hours, but I’ll have to investigate more.

Tyrone Meijn avatar
Tyrone Meijn
androogle avatar
androogle

yeah postgres is probably doing a schedule backup or maintenance

androogle avatar
androogle

if you’re using a HA setup of postgres (RDS with replicas) I’d ping the cluster url, not the individual instance

androogle avatar
androogle

hopefully soon AWS RDS Proxy (assuming you’re using RDS) will become GA and make issues like this less prevalent

Tyrone Meijn avatar
Tyrone Meijn

We are using Aurora Serverless and I think that one is using a proxy too already. However, in de api, I make a connection with the database and perform a simple query (SELECT true ).

androogle avatar
androogle

do you know if you’re using the cluster-url vs the ro endpoint?

androogle avatar
androogle

that may make a difference

androogle avatar
androogle

when maintenance period happens or a scale-up, the resource for the replicas may change, but to be honest I don’t have a lot of experience with Aurora Serverless

androogle avatar
androogle

hm ok yeah that looks like a cluster-url format

androogle avatar
androogle

since this is a staging environment, do you leverage Aurora for a cost savings measure and have it turn-off after a certain period?

androogle avatar
androogle

though if the app is constantly doing a select 1 it should reset the countdown timer to shutdown, so thats probably not it. hmm

Tyrone Meijn avatar
Tyrone Meijn

Hmmm, it happened happened exactly 24 hours after container startup, in different envs (staging and prod).

Tyrone Meijn avatar
Tyrone Meijn

hehe, yeah indeed.

androogle avatar
androogle

I guess I’d poke around in Cloudwatch Metrics for that Aurora instance. Look at connection count and other things to see if it drops all connections. See if the Aurora instance has events in its event log about restarting or applying queued maintenance

androogle avatar
androogle

has this been recurring or just a one time thing per environment?

Tyrone Meijn avatar
Tyrone Meijn


has this been recurring or just a one time thing per environment?
We deployed yesterday, looking in the logs now to see if it happened somewhere before the deployment.

I’d like to thank you for your patience and suggestions in the meantime, appreciate it a lot

androogle avatar
androogle

np One thing I’d say to keep in mind, recently AWS updated their RDS CA certificate for secure connections to RDS

androogle avatar
androogle

and gave time for everyone to switch over, its possible that time has passed and anyone who hadn’t switched to using the new CA had it “forced” on them

androogle avatar
androogle

causing a restart

androogle avatar
androogle

hm that was supposed to be March 5th, so maybe it wasn’t that

Tyrone Meijn avatar
Tyrone Meijn

I found one entry on staging staging having the unhandled error, but it didn’t down all the tasks it seems.

androogle avatar
androogle

for the target-group this service is registered in. Does the health-check allow for multiple failures before considered unhealthy?

androogle avatar
androogle

or just a single one? Technically the cluster-url should be 100% available but its hard to calculate what possible edge-cases there could be

Tyrone Meijn avatar
Tyrone Meijn

Not much, but it should be enough for a recovery.

androogle avatar
androogle

hmm ok so if this were on the Aurora side, it’d have to be unavailable for 30+ seconds to trigger this

Tyrone Meijn avatar
Tyrone Meijn

Uhhmm, our app is not robust in that regard: one of those unhandled errors causes the container to crash, so when the connection gets terminated, it’s an unhandled, causing a container crash.

androogle avatar
androogle

ah ok. so if the db was unavailable at the time of check the process exits.

androogle avatar
androogle

I’ve had similar issues, in a lot of cases we used supervisord in our containers

androogle avatar
androogle

and that would be the “watchdog” of the process. but we also had APM and other reporting so we knew when the process crashed

Tyrone Meijn avatar
Tyrone Meijn
SELECT pg_terminate_backend(pid) FROM
pg_stat_activity WHERE application_name = 'api-health-check';

Just ran this, killing all connections from the api and they all crashed.

androogle avatar
androogle

ouch

androogle avatar
androogle

gotcha so error handling and retries might be a good takeaway

Tyrone Meijn avatar
Tyrone Meijn

haha yep Still wondering why it happened at the 24 hour mark though, but could be connection cleanup or something like that…

androogle avatar
androogle

I would look at the Logs & Events Tab in the Aurora Serverless Instance details

androogle avatar
androogle

and see if there were any maintenance or restart events

Tyrone Meijn avatar
Tyrone Meijn

Cannot see anything there happening around that time. Also nothing weird in the postgres logs

1
androogle avatar
androogle

I find APM’s priceless in triaging most issues like this. Depending on your companies appetite there are some reasonable options out there ranging from really cheap DIY, to arm and leg hosted in cloud

androogle avatar
androogle

NewRelic / DataDog are pretty common

androogle avatar
androogle

I’m a fan of Elastic APM as its simple but effective

Tyrone Meijn avatar
Tyrone Meijn

Thanks for the suggestions, unfortunately i’m not given the time (yet…..) to look into those solutions..

androogle avatar
androogle

np, been there. Good luck Feel free to reach out

Tyrone Meijn avatar
Tyrone Meijn

Thanks again! Will keep that in mind

randomy avatar
randomy

No, that doesn’t sound right. I’ve just checked and have Fargate tasks running since January.

2020-03-27

Igor avatar

Is chamber able to read secrets that it didn’t write? I have set the KMS alias, and I can list the keys, but they are showing up with Version 0, and I am unable to read them.

Igor avatar

Looks like it’s related to this issue: https://github.com/segmentio/chamber/issues/251

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

Adding @discourse_forum bot

discourse_forum avatar
discourse_forum
10:01:28 PM

@discourse_forum has joined the channel

2020-03-28

Zach avatar

With RDS IAM authorizations for database users, is it possible to have ‘real users’ (ie, Tom Jones) who log into AWS via Okta SAML and assume a common role (“Developer”) still have unique RDS IAM authorizations? ie TomJones has okta username tJones@myco and assumes the Developer role, I still want him to use a database IAM user tJones, and not have access to the ‘bSmith’ database IAM user that maps to user Bob Smith.

Zach avatar

Answering my own question, we found an Okta blog post that showed us how to pass additional Principal Tags through the SAML response that AWS can use, which allows us to get the exact username from Okta.

kenny avatar

, means that you don’t have to create a user for database, instead use the real user who signed using Okta SAML. Using a tag are you able to control which user should have an access to which database?

Zach avatar

No you still have to create the user on the DB, and grant it the rds_iam flag so that it can use the IAM auth. What I was then seeing was that our ‘userid’ comes in as <roleid>:<oktaname> which was super user unfriendly as a login name. Eventually we found a blog from okta that said we could enable a beta feature on our account that would let us pass extra SAML tags into AWS. Note though - this required a change to ALL trust policies for the SAML, it actually broke some of our logins for awhile

kenny avatar

thanks for sharing it!

2020-03-30

SlackBot avatar
SlackBot
10:38:39 PM

This message was deleted.

2020-03-31

setheryops avatar
setheryops

Anyone have thoughts on only using Workmail to forward mail to another address? Im moving my domain away from GoDaddy to R53. I currently have my email setup to forward to my gmail address and then when I send from there gmail is set to make it look like im sending from my domain address. Can I use Workmail as a “proxy” the same way and have it dump all my email to my gmail address? My other option is to just go with Gsuite basic plan and basically pay $6 a month to do the same thing.

androogle avatar
androogle

FWIW, I had a similar scenario and I accomplished it with SES forwarder

androogle avatar
androogle

it required a lambda and s3 and some policies but it works (for the most part)

androogle avatar
androogle

there’s the caveat that gmail won’t show attached .eml’s inline

androogle avatar
androogle

but thats more of a gmail issue

androogle avatar
androogle

so you have to download it and open it

androogle avatar
androogle
Forward Incoming Email to an External Destination | Amazon Web Servicesattachment image

Note: This post was written by Vesselin Tzvetkov, an AWS Senior Security Architect, and by Rostislav Markov, an AWS Senior Engagement Manager. Amazon SES has included support for incoming email for several years now. However, some customers have told us that they need a solution for forwarding inbound emails to domains that aren’t managed by […]

androogle avatar
androogle

I went that route since workmail is a lot more expensive

setheryops avatar
setheryops

cool…that was my other concern…thanks

1
setheryops avatar
setheryops

ill check that out

randomy avatar
randomy

That blog post has comments where people complain about it, and link to a github project with a lot of issues. I ended up making an SES Lambda forwarder in Python that has seen heavy use and works for the most part. I can try to open source it if there’s interest. (it’s a Terraform module, it also uses Step Functions for retries)

1
randomy avatar
randomy

“heavy use” = forwarding 10-15k emails per month

joshmyers avatar
joshmyers

I do this, I think ?

Mike Martin avatar
Mike Martin

Anyone have any ideas on how to setup any type of Directory Service (ie. AWS Managed Microsoft AD) in the Ningxia (China) region WHILE YOUR SOURCE DIRECTORY LIVES IN VIRGINIA. The documentation I’m reading is pretty much saying you need your source directory in China as well. I haven’t been able to Google this…

1
androogle avatar
androogle

does the Ningxia region have cognito? Maybe setup an identity pool and point to the AD in VA as the User source?

androogle avatar
androogle

more of an SSO than directory replication

Mike Martin avatar
Mike Martin

Hmmm let me dig into that - I appreciate the quick response!

androogle avatar
androogle
Setup AWS Cognito User Pool with an Azure AD identity provider to perform single sign-on (SSO)…attachment image

This a step-by-step tutorial of how to set up an AWS Cognito User Pool with an Azure AD identity provider and perform single sign-on (SSO)…

androogle avatar
androogle

with exception of AD being in AWS instead of Azure

Mike Martin avatar
Mike Martin

I do NOT believe that AWS Workspaces supports Cognito as an IdP. You need to back the “workspaces” instances with a “Directory Service”.

androogle avatar
androogle

ahh ok workspaces. gotcha

Mike Martin avatar
Mike Martin

Yeah - here are my options. -AWS Managed Microsoft AD -Simple AD -AD Connector

I’m looking to back one of these directories with an LDAP server in Virginia if possible. I think I need to open a public IP of sorts…or create a VPN between the two, but that sounds illegal…

androogle avatar
androogle

I’m not familiar with the restrictions of that region, does it not allow things like VPN / VPC peering etc?

androogle avatar
androogle

I don’t even see a Ningxia region in my console

Mike Martin avatar
Mike Martin

Oh, the AWS China partition (Ningxia and Beijing) is completely separate from the AWS that you’re (probably) currently using. It’s a completely different login and console with limited services.

androogle avatar
androogle

ah ok. Well routing issues aside, I suppose you could setup something like FreeIPA in China region

androogle avatar
androogle

and have that be a peer / slave to your VA region, assuming you can solve the connectivity issue

androogle avatar
androogle

I think you could use AD Connector w/that? I’m not sure

Mike Martin avatar
Mike Martin

I’ve never used FreeIPA - I am trying to determine if that would be better than just spinning up another Windows based domain controller in the China AWS region, instead of learning and testing this new tech (FreeIPA).

androogle avatar
androogle

yeah just regular windows might be better if you have the experience there. FreeIPA is great but there’s a curve. I wasn’t sure what the availability was for MS licensing on that region

Mike Martin avatar
Mike Martin

Oh goodness…if they don’t have MS licensing then we’ve got bigger problems lol

1
1
Matt Gowie avatar
Matt Gowie

Hey folks — I’ve got a fun client problem I’m trying to solve. Looking for some input if anybody has a good idea…

I built the environment like so:
3 domains =
CloudFront distribution =
ALB =
ECS Service
This works great, the ECS Service webserver handles providing different views of the application depending on the domain name via the Host header. That in combination with a couple ALB Listener Rules does the trick for what each of those domains are expected to show.

One of the domains however is the “Admin CMS” domain. I just found out that the client would like to IP Whitelist this domain so only a couple VPNs can access it to help further lock down access to the Admin CMS. This is tricky of course since I cannot use standard security control mechanisms (the ALB security group or VPC Network ACLs) since ALL domains are routed through the same ALB, which means I cannot apply a security group rule to only one domain.

So I’m looking for a workaround… Maybe through an additional ALB Listener Rule? Has anyone used Listener Rules to block / allow CIDRs? Is that a thing? Looked into this shortly yesterday and I’m about to start researching more now, but figured I’d ask here in case somebody goes: “Oh yeah of course, do X”.

I could obviously spin up another ALB, point the domain at that, and then use that ALB’s security group to control access. But unfortunately that increases cost and complexity of the system which I’m trying to avoid.

Any suggestions?

androogle avatar
androogle

When you say IP Whitelist, does that mean outbound from the client -> ALB ?

androogle avatar
androogle

or inversely whitelist the clients coming over a VPN to the ALB

androogle avatar
androogle

I think you might be able to do this with a WAF header based rule, using the host header?

Matt Gowie avatar
Matt Gowie

I’m saying whitelisting client IPs so they’re the only ones who can access that domain.

Matt Gowie avatar
Matt Gowie

WAF header rule is interesting. I already have a WAF ACL setup against the CF Distribution for the top 10… Will add that into my research

androogle avatar
androogle

Yeah you could do a webacl at the ALB level, set it as a regular rule with two statements. The first to match the host-header, and the second to match the IP

androogle avatar
androogle

whitelist by default and then blacklist if not matching that pattern maybe

androogle avatar
androogle

or vice versa

Matt Gowie avatar
Matt Gowie

I’ll check that out. Haha it’s funny that using the WAF for typical firewall usage didn’t come to me.

1
Matt Gowie avatar
Matt Gowie

@androogle Was able to get that working via WAF rules. Thanks for the idea man!

androogle avatar
androogle

Np!

niek avatar

Hi just wondering, anyone having some terraform modules to setup an EKS cluster with ondemand (only when needed) gpu scaling for KubeFlow?

kskewes avatar
kskewes

Cluster auto scaler works great. Could set it up in a GPU node pool

    keyboard_arrow_up