SweetOps #aws for August, 2019

Discussion related to Amazon Web Services (AWS)

Archive: https://archive.sweetops.com/aws/

2019-08-01

2019-08-07

btai

07:34:56 PM

anyone ever turn on aws s3 transfer acceleration?

btai

07:35:06 PM

and verify that its worth it?

btai

09:16:02 PM

just uploaded a 300MB file from Los Angeles to an s3 bucket in Mumbai region. 1.7 minutes with transfer acceleration disabled, and 27 seconds with it enabled. so about 377% improvement in speed

2019-08-08

ennio.trojani

11:39:45 AM

hi, I have a question about AWS Codepipeline + Jenkins. anyone has experience on this ?

Andy

12:50:27 PM

Have some basic idea. What is the question?

Leo Starcevic

12:34:39 PM

https://sol.gfxile.net/dontask.html

2019-08-12

joshmyers

12:55:52 PM

Anyone had issues with Firehose > ElasticSearch 6.5 ? the ES cluster returned a JsonParseException. Ensure that the data being put is valid.

maarten

01:05:51 PM

@Maciek Strömich?

Maciek Strömich

01:07:40 PM

nope. we’re at es5 still for our logging.

joshmyers

01:10:13 PM

@Maciek Strömich Are you Firehose > Lambda processor > ES ?

Maciek Strömich

01:11:21 PM

nope. I’m emulating logstash structure in the logs and pass it directly via firehose to es

joshmyers

01:12:20 PM

Is this data from CloudWatch Logs?

Maciek Strömich

01:13:49 PM

nope. we dropped cwl support because it was a pain to send it to es via firehose

joshmyers

01:14:57 PM

hmm, OK thx

Maciek Strömich

01:39:38 PM

we’re not going to contribute back to rsyslog but we created our solution based on https://github.com/rsyslog/rsyslog/blob/master/plugins/external/solr/rsyslog_solr.py, but instead working directly with es we push everything to firehose using boto3 with the same structure as our app logs. way cheaper compared to cwl as well.

rsyslog/rsyslog

a Rocket-fast SYStem for LOG processing. Contribute to rsyslog/rsyslog development by creating an account on GitHub.

Sharanya

05:07:32 PM

Hey people, looking for terraform template on vpc peering ( syntax 0.12) any help plz

jose.amengual

06:42:40 PM

did you look at the cloudposse modules ?

Sharanya

03:22:51 PM

yes

2019-08-13

sarkis

07:41:19 PM

Anyone running AWS Client VPN here? We’re having issues just starting an endpoint even – stuck in Associating/pending state for hours

Ruan Arcega

07:43:52 PM

i am using this tool in my aws environment https://pritunl.com

Enterprise VPN Server

Free open source enterprise distributed VPN server. Virtualize your private networks across datacenters and provide simple remote access in minutes.

sarkis

06:07:48 PM

Thanks for the rec - I do have some pritunl experience and it was way smoother of an experience than AWS Client VPN has been - going to propose that

Blaise Pabon

09:42:04 PM

I’m new to AWS… and I make a lot of mistakes running Terraform, so I end up with errors like:

aws_s3_bucket.build_cache: Error creating S3 bucket: BucketAlreadyOwnedByYou: Your previous request to create the named bucket succeeded and you already own it.
	status code: 409, request id: 54C0B6BA

Blaise Pabon

09:42:32 PM

is there a switch like -p

Blaise Pabon

09:42:46 PM

that will back off it already exists.

Andriy Knysh (Cloud Posse)

09:44:11 PM

If the bucket is already in AWS but not in the state file, use terraform import

Blaise Pabon

10:30:15 PM

It seems that I cannot import the resource, but it also says the resource is not created because it already exists.

Andriy Knysh (Cloud Posse)

10:39:00 PM

That guid is not a resource id

Andriy Knysh (Cloud Posse)

10:39:33 PM

It’s a request id from api call

Andriy Knysh (Cloud Posse)

10:40:11 PM

Go to AWS console and find the resource id

Blaise Pabon

09:44:29 PM

oh!?

Blaise Pabon

09:44:39 PM

wow

Andriy Knysh (Cloud Posse)

09:44:51 PM

If the bucket is in the state file but not in AWS for any reason, use terraform state rm

Blaise Pabon

09:46:00 PM

I think I remember reading about that in…. nowhere ! How very cool.

Blaise Pabon

09:47:38 PM

so I suppose that terraform state rm is less medieval than my rm -rf *tfstate*?

Vitaliy Lobachev

08:02:31 AM

you don’t need to delete the whole state, you can only delete s3: terraform state rm aws_s3_bucket.build_cache

Blaise Pabon

09:49:22 PM

oh sorry I understand now

Andriy Knysh (Cloud Posse)

09:49:49 PM

yea :slightly_smiling_face: because of rm -rf *tfstate* you see the error what you see

Blaise Pabon

09:53:03 PM

the fruits of rm -rf *tfstate*

2019-08-14

2019-08-15

viliam.pucik

01:20:14 PM

Hello, what is the main benefit of shortening SOA TTL to 60 secs? I noticed that in your best practices docs.

Erik Osterman (Cloud Posse)

04:50:17 PM

so in highly elastic environments which are changing or subject to change at any time, a long TTL is a sure fire way to “force” an outage.

Erik Osterman (Cloud Posse)

04:50:40 PM

perhaps the most important TTL is that of the SOA record. by default it’s something like 15m.

Erik Osterman (Cloud Posse)

04:51:43 PM

the SOA (start of authority) works a little bit like a “404” page for DNS (metaphor). when client requests a DNS record for something and nothing is found, the response will be negatively cached for the duration of the SOA.

Erik Osterman (Cloud Posse)

04:53:31 PM

so if your app looks up a DNS record (e.g. for service discovery) and it’s not found, it will cache that for 15m. Suppose after 1m that service is now online. Your app will still cache that failure for 14m causing a prolonged outage.

Erik Osterman (Cloud Posse)

04:53:59 PM

a DNS lookup every request will add up, especially in busy apps. a DNS lookup every 60 seconds is a rounding error.

2019-08-16

Nelson Jeppesen

09:34:53 PM

Interesting, I thought negative ttl was the last value in the data of the SOA. Are you saying negative ttl is reflected by the SOA ttl directly?

dig abc.com soa +short
ns-318.awsdns-39.com. awsdns-hostmaster.amazon.com. 1 7200 900 1209600 86400

Nelson Jeppesen

09:35:20 PM

in this example, i thought 86400 was the negative ttl, but thats not the TTL of the SOA itself

Nelson Jeppesen

09:35:27 PM

unless I’m mixed up

Nelson Jeppesen

04:04:09 AM

Just looked it up, negative ttl is the lower of either the TTL of the SOA _OR_ the last value, 86400 in the above example

Nelson Jeppesen

04:04:40 AM

TLDR; lazy me dropped the TTL of the SOA to 60s; thanks!

2019-08-20

Erik Osterman (Cloud Posse)

04:04:53 PM

thanks @Nelson Jeppesen for the added context

2019-08-22

davidvasandani

09:40:44 AM

https://aws.amazon.com/blogs/aws/amazon-forecast-now-generally-available/

Amazon Forecast – Now Generally Available | Amazon Web Services attachment image

Getting accurate time series forecasts from historical data is not an easy task. Last year at re:Invent we introduced , a fully managed service that requires no experience in machine learning to deliver highly accurate forecasts. I’m excited to share that is generally available today! With , there are no servers to provision. You only need to provide […]

Daniel Minella

01:41:12 PM

Better way to update an ecs task, with only one container. I’m receiving this error: The closest matching (container-instance 5df0ce11-3243-47f7-b18e-2cfc28397f11) is already using a port required by your task

maarten

01:56:00 PM

@Daniel Minella if you use the host port 0 in your task definition, ECS will use dynamic port allocation which works good together with the use of an ALB

Daniel Minella

01:57:03 PM

How ECS will handle with that? It understand that a traffic from the LB at port 8080 has to be foward to any container inside the cluster? In that port?

maarten

01:58:20 PM

ECS will manage that under the hood for you. https://aws.amazon.com/premiumsupport/knowledge-center/dynamic-port-mapping-ecs/

Daniel Minella

02:14:29 PM

Thanks!

Daniel Minella

08:46:46 PM

We made it! Thank you again!

Alejandro Rivera

06:22:21 PM

Hi, I have multiple eks clusters across multiple accounts and I would like to give access to all of them to an S3 bucket in one of the accounts using the IAM profile of the instance nodes, but can’t seem to get it right, any tips on how to get this working?

Alex Siegman

07:21:29 PM

You need two pieces to this:

On the bucket, you need to give permissions such as s3:GetObject as well as add the source roles to the Principals section as well (assume-role policy document)
On the roles that need access to that bucket, you then have to give the permissions for s3 against that resource

Alex Siegman

07:22:55 PM

I do this all the time. The specifics with EKS I can’t help with, but I’d imagine the cluster members have a role they use…

Good example doc here:

https://aws.amazon.com/premiumsupport/knowledge-center/cross-account-access-s3/

Alejandro Rivera

07:46:07 PM

Nice, thanks for the help @Alex Siegman!

Daniel Minella

09:25:11 PM

How can I run this: docker run -d --name sentry-cron -e SENTRY_SECRET_KEY='<secret-key>' --link sentry-postgres:postgres --link sentry-redis:redis sentry run cron at task definition? My concern is about run cron. Is a command? Something like entrypoint: sh and run cron as command?

Alex Siegman

09:46:58 PM

The run cron would be a command. it would pass through whatever entrypoint script is defined in the Dockerfile

Alex Siegman

09:47:03 PM

Also, probably a better question for #docker

Daniel Minella

02:24:33 AM

Thank you! I’ll try

Daniel Minella

04:06:39 AM

run, cron at command works for me

Daniel Minella

04:06:44 AM

Thank you

2019-08-23

oscar

09:04:47 AM

What’s your go-to way of providing external devs/contractors (outside of your corporate AD) access to your AWS accounts? IAM users on Bastion? Cognito?

Samuli

09:20:28 AM

What kind of access you have in mind? Access to accounts or access to resources (ec2?) on accounts?

oscar

09:41:19 AM

Console & CLI access.

I imagine it would be something like:

Give [solution] access to consultant
Consultant uses [solution] to gain access to either console or gain temporary access id/key pair
Consultant can then use console or CLI

oscar

09:42:05 AM

Although we only wish to give them explicit access to our Bastion/Security account, and they then use the credentials above to sqs:assume_role into sub-accounts

Samuli

09:50:40 AM

Isn’t IAM sufficient for that? I would personally go with it but can’t say I’m an expert on the subject

jmccollum

03:35:23 PM

As a consultant it depends on the client Most of the time we get an IAM user in a sharedservices account, then assume roles cross account Others will give us a AD account, then SAML / SSO to an AWS role

oscar

03:40:17 PM

Yeh, it seems that giving consultants limited users on our AD is the favoured approach. Our tech services are looking into it now.. it just doesn’t seem like something that should be managed by Terraform!

jmccollum

03:43:05 PM

Could build the roles out that they would assume at least. For our managed services side, for some clients the client (or us) creates a role in each account that trusts one of our aws account and a specific role in that account. Then we can managed the users who have access to the client’s AWS account without needing to bother them.

Erik Osterman (Cloud Posse)

04:47:57 PM

I think it depends on what they are hired to do for the company.

Erik Osterman (Cloud Posse)

04:52:14 PM

Think about this from the company perspective: they want to eliminate risk, liability, and exposure, embarrassment, while at the same time accelerate development and maintain knowledge transfer.

Erik Osterman (Cloud Posse)

04:53:09 PM

Think about this from the perspective of the developer. They want to operate as unencumbered as possible. They want to quickly prove their worth and get more work.

Erik Osterman (Cloud Posse)

04:49:40 PM

It goes without saying that IAM roles assumed into accounts is one of the mechanisms that will be used.

Erik Osterman (Cloud Posse)

04:50:14 PM

If the contractor was hired to oversee the uptime of production systems, I find it hard to justify anything other than administrator-level roles in the accounts they are responsible for.

Erik Osterman (Cloud Posse)

04:50:35 PM

If trust is an issue, then don’t hire.

Erik Osterman (Cloud Posse)

04:50:52 PM

If the contractor is hired to build out some form of automation, then there should be a sandbox account.

Erik Osterman (Cloud Posse)

04:51:24 PM

The deliverable should include “infrastructure as code” or other kinds automation scripts.

Erik Osterman (Cloud Posse)

04:54:18 PM

I’ll address the latter. Give them a sandbox with administrator level access. They can do everything/anything (within reason) in this account. It can even be a sandbox account specifically for contractors.

Erik Osterman (Cloud Posse)

04:54:38 PM

They’ll check their work into source control with documentation on how to use it.

Erik Osterman (Cloud Posse)

04:54:58 PM

The company is ultimately responsible for operating it and “owning it”, so this forces knowledge transfer.

Erik Osterman (Cloud Posse)

04:55:15 PM

The company and it’s staff must know how to successfully deploy and operate the deliverable.

Erik Osterman (Cloud Posse)

04:55:47 PM

Ideally, you’ve rolled out a GitOps continuous delivery style platform for infrastructure automation.

Erik Osterman (Cloud Posse)

04:56:28 PM

The developer can now open PRs against those environments (without affecting them). The pending changes can be viewed by anyone.

Erik Osterman (Cloud Posse)

04:56:38 PM

Once approved, those changes are applied -> rolled out.

Erik Osterman (Cloud Posse)

04:56:39 PM

Erik Osterman (Cloud Posse)

04:57:31 PM

Regardless of this being a contractor or employee, etc - this is a great workflow. You can radically reduce the number of people who need access at all to AWS and instead focus on git-driven operations with total visibility and oversight.

oscar

11:14:06 PM

Exactly the answer I anticipated from you Erik glad I remembered well

2019-08-25

Maciek Strömich

01:53:03 PM

anyone else experienced rds chocking around 1h ago?

Maciek Strömich

01:54:06 PM

we found our pgsql rds instance stopped resolving hostnames

2019-08-25 13:03:47 UTC::[unknown]@[unknown]:[29469]:WARNING: pg_getnameinfo_all() failed: Temporary failure in name resolution

rumping up db connections and killing our application between 1420 CET

Maciek Strömich

01:54:39 PM

I wonder whether it was RDS general or only our cluster

2019-08-26

Igor

03:24:50 PM

Is it possible to disable root login on AWS accounts that are connected to an Organization?

Alex Siegman

03:27:35 PM

I don’t think it is, which is why it’s very important to secure that root account if you created the account programatically - anyone with access to the email could take over the account

Alex Siegman

03:27:51 PM

If it’s one you joined that used to be an individual account, I’d hope that access is already secure

2019-08-27

nutellinoit

03:20:54 PM

Aurora postgres db seems down on eu-west-1 region

joshmyers

03:24:19 PM

oof if so

nutellinoit

03:52:30 PM

back up

nutellinoit

03:52:34 PM

14 minutes down

Brij S

04:19:26 PM

Hey all, looking for some opinions on how to go about creating VPC’s in a new aws account of mine. I recently setup an ECS cluster with fargate using the ‘get started’ feature in the console and it did a lot of the heavy lifting for me. however I’m trying to automate some of this using Terraform. So I’ll need to create some VPCs for the ECS cluster. What is the most simple, secure setup? One public subnet, private subnet, place the cluster in the private subnet with an ALB in the public subnet?

Maciek Strömich

04:43:39 PM

setup it in a way that you can easily change it to multi-az (one subnet per az for every type of subnet - public, private, db). it doesn’t mean you will use all of them but if the requirements change you will have them already available

Brij S

04:44:11 PM

can you give more detail?

Maciek Strömich

04:46:20 PM

I’ve a vpc with a cird 10.0.0.0/8

Maciek Strömich

04:46:48 PM

and then every subnet in every availability zone uses /24 from that cird

Maciek Strömich

04:47:56 PM

i’ve a total of 8 subnets - public and private for every availability zone

Maciek Strömich

04:48:33 PM

public have outgoing traffic routed via nat gateway

Maciek Strömich

04:48:47 PM

private have only routing for the 10.0.0.0/8

Maciek Strömich

04:49:16 PM

that makes most sense for my cluster

Brij S

04:19:43 PM

can provide more info if needed, but really just looking to get some general guidance on VPC setup

Samuli

05:39:13 AM

See this module. It does the setup the way Maciek describes. https://github.com/terraform-aws-modules/terraform-aws-vpc

terraform-aws-modules/terraform-aws-vpc

Terraform module which creates VPC resources on AWS - terraform-aws-modules/terraform-aws-vpc

2019-08-28

Sharanya

08:01:03 PM

Did anyone Come across NPM memory Issues ?

Erik Osterman (Cloud Posse)

08:06:20 PM

Perhaps share some more details of what you are seeing?

Sharanya

08:08:40 PM

Upgrade Node and NPM on CI/CD server. Observe the npm memory issue.

Sharanya

08:08:57 PM

m new to node…so just want to know where can I check them memory issues

Andriy Knysh (Cloud Posse)

08:09:47 PM

https://sweetops.slack.com/archives/CB3579ZM3/p1567022939006800?thread_ts=1567022458.005500&cid=CB3579ZM3

i suppose you need to upgrade nodejs and npm to the latest versions, then monitor the build server on CI/CD for memory consumption when it builds the node project with npm

2019-08-30

Maciek Strömich

12:40:25 PM

Apparently gp2 EBS docs aren’t as precise as one would thought.

Patient: 100GiB gp2 EBS volume in multi-az RDS cluster

I’m running a few M row update/delete process on one of our mysql clusters and based on docs it would mean that we would be able to burst over base performance of 300IOPS (3IOPS per GiB) for about 20minutes. Apparently in multi-az environments base performance is doubled and the time required to deplete the gathered (since yesterday late evening) credits allowed to burst with avg 1500IOPS for over 2h.

Maciek Strömich

12:46:16 PM

the spike visible around the 8PM yesterday was a test performed on ~200k rows

Maciek Strömich

12:48:35 PM

for the sake of data completeness this graph comes from db.m5.large cluster