#aws (2021-01)
Discussion related to Amazon Web Services (AWS)
Discussion related to Amazon Web Services (AWS)
Archive: https://archive.sweetops.com/aws/
2021-01-22
2021-01-21
2021-01-20

Hello everyone , I am trying to save output of the terraformed eks cluster into a json file

do you mean to save the terraform outputs in a JSON format from a root module that provisions an eks cluster?

asking this here cause we didn’t get it it in office hours any tips for improving global s3 upload speed? (think india, hong kong, etc) what other optimizations could I possibly make after turning on s3 transfer acceleration and using multipart uploads?

is this from an app or scripting? are you parallelizing the upload processes? is that possible in your app/script? not sure if you are talking about 1 large file or many smaller ones. multipart + multiple upload workers might help in some scenarios
https://pypi.org/project/s3-parallel-put/ or https://netdevops.me/2018/uploading-multiple-files-to-aws-s3-in-parallel/
Have you ever tried to upload thousands of small/medium files to the AWS S3? If you had, you might also noticed ridiculously slow upload speeds when the upload was triggered through the AWS Management Console. Recently I tried to upload 4k html files and was immediately discouraged by the progress reported by the AWS Console upload manager. It was something close to the 0.5% per 10s. Clearly, the choke point was the network (as usual, brothers!). Comer here, Google, we need to find a better way to handle this kind of an upload.

from a web app

not exactly large files. a 20mb file can take minutes to upload for users in bangladore.

form my experience, using custom python boto-based code for downloading was never even close when it comes to the speed of CLI aws s3 cp ...
- so maybe try uploading using aws-cli
as a test first?

multipart uploads and transfer acceleration are the easy ones, which you already have turned on. the only other things I can think of: (1) uploading to a regional s3 bucket and replicating the files back to the central/main bucket or (2) having another intermediary service that would first consume the upload and then put to where you wanted it e.g. write your own service that runs on some regional infra that closer to the enduser (ec2/fargate) or like https://www.dataexpedition.com/clouddat/aws/. i think in my head that the upload times would appear faster to the end user, but over all processing time of getting the file to where it needs to go might be longer if there’s more steps in the process to get the files to the workers that will actually do something with the file.

Also, did you play around with the chunksize? https://boto3.amazonaws.com/v1/documentation/api/latest/reference/customizations/s3.html - multipart_chunksize – The partition size of each part for a multipart transfer.
if you have a 20MB file, you’ll only get 3 chunks with the default of 8mb. if you go to the 5MB min, maybe you’ll get another chunk or two uploading in parallel.
looks like you can play around with the s3 API with the AWS cli config, if you want to try it out before modifying the app:
[profile development]
aws_access_key_id=foo
aws_secret_access_key=bar
s3 =
max_concurrent_requests = 20
max_queue_size = 10000
multipart_threshold = 64MB
multipart_chunksize = 16MB
max_bandwidth = 50MB/s
use_accelerate_endpoint = true
addressing_style = path

yeah thanks @Jonathan Le, ill play around w/ chunk size default first before. Are you also solving this problem? How are you actually testing it’s improving speed? To test, we RDP into an azure vm (in asia or europe) and try uploading from the vm. Unfortunately even testing that route hasn’t been giving us “realistic” timing (much faster) when being compared to our actual clients in those areas. My guess is because Azure servers might still have preferential routing to AWS servers when compared to someone w/ their ISP at home.

Most of the time, upload speed is more to do with the client networks than yours. If you have access at that end, then tcp window scaling and QOS can help, depending on the specifics of their setups.

@btai i had to deal an analogous issue a couple of years ago (before transfer acceleration even), but it wasn’t the same issue. my problem was needed to optimize a data pipeline replicating terabytes of data across regions for processes on a routine basis.
testing and getting realistic results will be hard, esp. with what @ brought up. running an end user test on optimized cloud networks won’t be 100% realistic….my guess is that you’ll need to get friendly with a couple of end users that can do a small number of localized before and after tests and take some measurements. probably not the best, but sometimes you work with the hand your dealt.

Yeah, I’ve reached out to some of our global customers that we have a good relationship with and so I’ll do that and try to get numbers as a pulse check. I didn’t think so, but I was curious if anyone else had other clever ways of testing.

Oh. Hop on fiverr.com or something like that. create some disposable 1 day test accounts. Maybe there are some affordable software testers in the regions you need on home based internet and devices.

@Jonathan Le thats a great idea! im gonna run it by our engineering leadership

You can thanks a tiktoker and home gym deadlifts and TGIF for that idea!
2021-01-19

Morning everyone ! Any experience migrating EBS volumes from gp2 to gp3 ? I have a large Kafka cluster with huge (1.5 TB) EBS volumes attached to every single broker

Be sure to read up on gp3 before you migrate.

While they offer higher potential IOPS for lower capacity volumes, at a slightly lower cost, the tradeoff is higher latency.

Yes @Ives Stoddard I’ve been reading a lot about gp3

How much is the difference in the latency ?


Just recently, I found out that AWS has introduced a new type of Elastic Block Storage called gp3 in addition to the popular gp2 volume…

Hmmmm… thanks for the info !!!

for most workloads, this likely isn’t an issue.

not all that glitters is gold

It’s Kafka…. I need to read

but if you’re pushing around millions / billions of files, that latency can add up.

Kafka is I/O sensitive

as with all things at this layer, it’s usually best to do some load testing with your applications and workflows.

different patterns and workloads can have very different requirements.

Absolutely right… well… what I like about AWS is that I can go back to gp2, if needed

are you using self-managed kafka, or one of the managed services?

self-managed Kafka

with the newer volume types, you may be able to switch on the fly. if for some reason that ends up being a problem, you can always swap them via LVM volume replacement.

if you have multiple nodes, you can try running on gp3 for a while and see if there is a measurable impact to performance.

Right… we have like 50 brokers … so I will try on some of them ….

depending on your write / read throughput, and application sensitivity to latency, gp3 might work just fine.


latency of 1-2 ms on the tail end of a request is negligible, whereas stacking latency for rsync of 10 million files would be a different story.

for example, an additional 2 ms delay on 10 million file rsync could add up to 5.5 hours in latency (single process).

whereas a 2ms delay on asynchronous event fired from a front-end user request wouldn’t be perceptible to a user.

(or closer to 4ms for write + read delay, in the event of semi-synchronous events, like thumbnail generation on upload, or comment publishing, etc.)

Wow… I really appreciate all your recommendations on this matter ….

I will proceed with caution

one other area to be mindful of is local ec2 root volumes. if you leverage swap at all, that latency might slow down memory. likely not an issue for kafka, as your JVM is likely configured for (Xmx).

Right…. with Kafka we avoid using SWAP at all ….

We have instances with lots of RAM so, that should not be an issue

in those cases, consider emphemeral instance storage for swap.


good luck.

Thanks @Ives Stoddard !!!! Have a great day !
2021-01-18

Can I run command on AWS Fargate when container go to deactivation ? https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-lifecycle.html
When a task is started, either manually or as part of a service, it can pass through several states before it finishes on its own or is stopped manually. Some tasks are meant to run as batch jobs that naturally progress through from PENDING to RUNNING

Is it possible to add this to the task definition?
When a task is started, either manually or as part of a service, it can pass through several states before it finishes on its own or is stopped manually. Some tasks are meant to run as batch jobs that naturally progress through from PENDING to RUNNING

Hi, I have a query about RSYNC i have an ec2 server with 3 diff apps folder 2 GB each(master server) i want to clone and maintain these folder to multiple ec2, lets say 5 ec2. if i setup cron job of rsync for every hour does it bottle neck my local bandwidth? and does it makes my cpu utilisation high ?


cant, it is has python code so all the process become slow

well python once on pyc files loads in memory so it should not be slow

the data for whatever you are running can be in a localdisk but code can be in efs

so my framework is bit old when i update some thing it will check for all the files and requirements which makes the whole process slow

use code deploy in that case if you are deploying code and such

but rsync is network/io dependant not cpu dependant

i am using efs for now, my site goes down for 20 min when i update. i want to explore other options

codeDeploy is an aws product that does code deploys

you could use EFS, Ramdisk to load your code if is that IO dependant

yes, but my framework do all the checks when we update the code

then check codeDeploy , it is like a fancy rsync

where EFS struggles is the latency on reading lots of small files. any forking processes which have to reload those files will suffer. anything on the front-end will likely see long request delays when loading those files for a new process (recycling a process handling requests).

as an alternative, you should also investigate the use of containers. the layers in the container are stored as larger files, so deployments from a container would be less of a concern if pulling the images locally to the ec2 instance and running them.

if you don’t need to manage that process yourself, look at ECS, EKS, or Fargate for a managed container cluster framework.

you would integrate the pushing and pulling of images with either your CI (push early) or CD pipeline (either early or on-demand).

if you don’t want to go the route of containers, you might also consider distributing files via tarball instead. in the ruby world, one might use something like capistrano to orchestrate the process.

some ideas to get you thinking about it…

https://blog.nuventure.in/2020/02/19/auto-deploying-a-django-app-using-capistrano-with-gitlab-ci-cd/

Yes, you read that right. We are using Capistrano - a very popular application deployment tool that is written in Ruby to deploy a Python Django app. In

Capistrano style deployments with fabric. Contribute to dlapiduz/fabistrano development by creating an account on GitHub.

i would seriously consider containers though. while a little extra overhead in getting up to speed, deployments and testing across your dev >> test >> staging >> production pipeline is a lot easier when they’re all guaranteed to run the same container image / code.

yes, smalls file in EFS is not the best (NFS4) but how big is you deployment? how big the code base could be? but at any rate, the share volume can be used to share the code and then copy local to a disk or ramdisk etc

I used capistrano for years, CodeDeploy is basically capistrano

@PePe: EFS also has additional write latency (multiple-AZ synchronous write), so things like rsync can take a lot longer than self-managed NFS.

I understand but this techniques has been use for many years 20+ years…this is nothing new, capistrano must be 10+ years old? do not get me wrong, containers is the way to go I think but if you really need to share code as ( here is the artifact now deploy it) there is nothing too bad on NFS

you can download the artifact from S3 too( which will be faster)
2021-01-17
2021-01-16

hey all are there modules (or some best practices docs) for creating cloudwatch alarms (use case: alarm on CPU/disk space apache kafka (MSK))?

Hey, you can use https://github.com/terraform-aws-modules/terraform-aws-cloudwatch/blob/v1.3.0/examples/multiple-lambda-metric-alarm/main.tf update namespace and dimension to match MSK read more here https://docs.aws.amazon.com/msk/latest/developerguide/monitoring.html
Terraform module which creates Cloudwatch resources on AWS - terraform-aws-modules/terraform-aws-cloudwatch
Learn how to monitor your Amazon MSK cluster.

2021-01-15

FYI for the containers people: the last run of COM-401 “Scaling containers on AWS” is in 15-ish minutes at https://virtual.awsevents.com/media/0_6ekffvm8 There are some massive Fargate improvements that got no other announcements as far as I know

Seems quite interesting - wondering if this relates to what hashicorp has done with their nomad testing on top of AWS

morning everyone ! Quick question, is there any performance overhead/impact when live migrating an EBS volume from gp2 to gp3 ?

Hi. I have a question about CloudFront. My application is deployed on Heroku and I am using the Heroku endpoint in CloudFront. It works fine but when I try to open a page by specifying path in URL then the CloudFront URL is routed to use the Heroku endpoint. For example http://myherokuendpoint is the application link in Heroku and [d829203example.cloudfront.net> is my cloudfront address to access my app. When I try to access d829203example.cloudfront.net/admin it changes the address to <http://myherokuendpoint/admin](http://d829203example.cloudfront.net) I tried adding origins but it did not work.
If I attach ALB link in CloudFront distribution it works fine. Is there a way I can make it work with Heroku link?

Question about egress filtering -
Ideally you don’t want processes to have the ability to reach out to the internet with the exception of specific cases like calling other AWS services, downloading yum updates, contacting other services like Trendmicro SaaS etc. Some AWS services support in VPC endpoints but last I checked this only worked for some services and generally only within the same region. IP filtering seems solid but it would be a huge pain to setup and maintain. DNS blocking would seem to be easier to maintain but would not prevent connections that don’t require DNS.
Anyway, are there best practicesrecommendations for setting up egress filtering? Are there other options?
Thanks!
Tim

What’s your budget? We see well funded enterprises use Next Generation Firewalls for this (CHKP, PANW, FTNT, etc)

Another option is the new AWS Firewall

Error: Error creating aggregator: OrganizationAccessDeniedException: This action can only be performed if you are a registered delegated administrator for AWS Config with permissions to call ListDelegatedAdministrators API.
Anyone had this before. the account I’m executing is actually delegated as such and can call ListDelegatedAdministrators succesfully.

Hey All! I’m working on editing a cloudposse module for our use, but I’m having a weird issue. I think I added all the things in I need and got all the variables in correctly, but now I’m getting a super generic error and I’m unclear on how to troubleshoot it. It’s unfortunately not telling me at all what’s wrong with the construction, and I’d love ot know how to trouble shoot from here:

2021/01/15 1505 [TRACE] vertex “module.cdn.aws_cloudfront_origin_access_identity.default”: visit complete 2021/01/15 1505 [TRACE] vertex “module.cdn.aws_cloudfront_origin_access_identity.default (expand)”: dynamic subgraph completed successfully 2021/01/15 1505 [TRACE] vertex “module.cdn.aws_cloudfront_origin_access_identity.default (expand)”: visit complete 2021/01/15 1505 [TRACE] Re-validating config for “module.cdn.aws_cloudfront_distribution.default[0]” 2021/01/15 1505 [TRACE] GRPCProvider: ValidateResourceTypeConfig 2021-01-15T1505.440-0800 [INFO] plugin.terraform-provider-aws_v3.24.1_x5: 2021/01/15 1505 [WARN] Truncating attribute path of 0 diagnostics for TypeSet: timestamp=2021-01-15T1505.440-0800 2021-01-15T1505.440-0800 [INFO] plugin.terraform-provider-aws_v3.24.1_x5: 2021/01/15 1505 [WARN] Truncating attribute path of 0 diagnostics for TypeSet: timestamp=2021-01-15T1505.440-0800 2021-01-15T1505.440-0800 [INFO] plugin.terraform-provider-aws_v3.24.1_x5: 2021/01/15 1505 [WARN] Truncating attribute path of 0 diagnostics for TypeSet: timestamp=2021-01-15T1505.440-0800 2021-01-15T1505.440-0800 [INFO] plugin.terraform-provider-aws_v3.24.1_x5: 2021/01/15 1505 [WARN] Truncating attribute path of 0 diagnostics for TypeSet: timestamp=2021-01-15T1505.440-0800 2021-01-15T1505.440-0800 [INFO] plugin.terraform-provider-aws_v3.24.1_x5: 2021/01/15 1505 [WARN] Truncating attribute path of 0 diagnostics for TypeSet: timestamp=2021-01-15T1505.440-0800 2021-01-15T1505.440-0800 [INFO] plugin.terraform-provider-aws_v3.24.1_x5: 2021/01/15 1505 [WARN] Truncating attribute path of 0 diagnostics for TypeSet: timestamp=2021-01-15T1505.440-0800 2021-01-15T1505.440-0800 [INFO] plugin.terraform-provider-aws_v3.24.1_x5: 2021/01/15 1505 [WARN] Truncating attribute path of 10 diagnostics for TypeSet: timestamp=2021-01-15T1505.440-0800 2021-01-15T1505.440-0800 [INFO] plugin.terraform-provider-aws_v3.24.1_x5: 2021/01/15 1505 [WARN] Truncating attribute path of 0 diagnostics for TypeSet: timestamp=2021-01-15T1505.440-0800 2021-01-15T1505.440-0800 [INFO] plugin.terraform-provider-aws_v3.24.1_x5: 2021/01/15 1505 [WARN] Truncating attribute path of 0 diagnostics for TypeSet: timestamp=2021-01-15T1505.440-0800 2021-01-15T1505.440-0800 [INFO] plugin.terraform-provider-aws_v3.24.1_x5: 2021/01/15 1505 [WARN] Truncating attribute path of 0 diagnostics for TypeSet: timestamp=2021-01-15T1505.440-0800 2021-01-15T1505.440-0800 [INFO] plugin.terraform-provider-aws_v3.24.1_x5: 2021/01/15 1505 [WARN] Truncating attribute path of 0 diagnostics for TypeSet: timestamp=2021-01-15T1505.440-0800 2021-01-15T1505.440-0800 [INFO] plugin.terraform-provider-aws_v3.24.1_x5: 2021/01/15 1505 [WARN] Truncating attribute path of 0 diagnostics for TypeSet: timestamp=2021-01-15T1505.440-0800 2021-01-15T1505.440-0800 [INFO] plugin.terraform-provider-aws_v3.24.1_x5: 2021/01/15 1505 [WARN] Truncating attribute path of 0 diagnostics for TypeSet: timestamp=2021-01-15T1505.440-0800 2021-01-15T1505.440-0800 [INFO] plugin.terraform-provider-aws_v3.24.1_x5: 2021/01/15 1505 [WARN] Truncating attribute path of 0 diagnostics for TypeSet: timestamp=2021-01-15T1505.440-0800 2021-01-15T1505.440-0800 [INFO] plugin.terraform-provider-aws_v3.24.1_x5: 2021/01/15 1505 [WARN] Truncating attribute path of 0 diagnostics for TypeSet: timestamp=2021-01-15T1505.440-0800 2021-01-15T1505.441-0800 [INFO] plugin.terraform-provider-aws_v3.24.1_x5: 2021/01/15 1505 [WARN] Truncating attribute path of 0 diagnostics for TypeSet: timestamp=2021-01-15T1505.441-0800 2021-01-15T1505.441-0800 [INFO] plugin.terraform-provider-aws_v3.24.1_x5: 2021/01/15 1505 [WARN] Truncating attribute path of 0 diagnostics for TypeSet: timestamp=2021-01-15T1505.441-0800 2021-01-15T1505.441-0800 [INFO] plugin.terraform-provider-aws_v3.24.1_x5: 2021/01/15 1505 [WARN] Truncating attribute path of 0 diagnostics for TypeSet: timestamp=2021-01-15T1505.441-0800 2021-01-15T1505.441-0800 [INFO] plugin.terraform-provider-aws_v3.24.1_x5: 2021/01/15 1505 [WARN] Truncating attribute path of 0 diagnostics for TypeSet: timestamp=2021-01-15T1505.441-0800 2021-01-15T1505.441-0800 [INFO] plugin.terraform-provider-aws_v3.24.1_x5: 2021/01/15 1505 [WARN] Truncating attribute path of 0 diagnostics for TypeSet: timestamp=2021-01-15T1505.441-0800 2021-01-15T1505.441-0800 [INFO] plugin.terraform-provider-aws_v3.24.1_x5: 2021/01/15 1505 [WARN] Truncating attribute path of 0 diagnostics for TypeSet: timestamp=2021-01-15T1505.441-0800 2021-01-15T1505.441-0800 [INFO] plugin.terraform-provider-aws_v3.24.1_x5: 2021/01/15 1505 [WARN] Truncating attribute path of 0 diagnostics for TypeSet: timestamp=2021-01-15T1505.441-0800 2021-01-15T1505.441-0800 [INFO] plugin.terraform-provider-aws_v3.24.1_x5: 2021/01/15 1505 [WARN] Truncating attribute path of 0 diagnostics for TypeSet: timestamp=2021-01-15T1505.441-0800 2021-01-15T1505.441-0800 [INFO] plugin.terraform-provider-aws_v3.24.1_x5: 2021/01/15 1505 [WARN] Truncating attribute path of 0 diagnostics for TypeSet: timestamp=2021-01-15T1505.441-0800 2021-01-15T1505.441-0800 [INFO] plugin.terraform-provider-aws_v3.24.1_x5: 2021/01/15 1505 [WARN] Truncating attribute path of 0 diagnostics for TypeSet: timestamp=2021-01-15T1505.441-0800 2021-01-15T1505.441-0800 [INFO] plugin.terraform-provider-aws_v3.24.1_x5: 2021/01/15 1505 [WARN] Truncating attribute path of 0 diagnostics for TypeSet: timestamp=2021-01-15T1505.441-0800 2021-01-15T1505.441-0800 [INFO] plugin.terraform-provider-aws_v3.24.1_x5: 2021/01/15 1505 [WARN] Truncating attribute path of 0 diagnostics for TypeSet: timestamp=2021-01-15T1505.441-0800 2021-01-15T1505.441-0800 [INFO] plugin.terraform-provider-aws_v3.24.1_x5: 2021/01/15 1505 [WARN] Truncating attribute path of 0 diagnostics for TypeSet: timestamp=2021-01-15T1505.441-0800 2021/01/15 1505 [TRACE] vertex “module.cdn.aws_cloudfront_distribution.default[0]”: visit complete 2021/01/15 1505 [TRACE] vertex “module.cdn.aws_cloudfront_distribution.default”: dynamic subgraph encountered errors 2021/01/15 1505 [TRACE] vertex “module.cdn.aws_cloudfront_distribution.default”: visit complete 2021/01/15 1505 [TRACE] vertex “module.cdn.aws_cloudfront_distribution.default (expand)”: dynamic subgraph encountered errors 2021/01/15 1505 [TRACE] vertex “module.cdn.aws_cloudfront_distribution.default (expand)”: visit complete 2021/01/15 1505 [TRACE] dag/walk: upstream of “provider["registry.terraform.io/hashicorp/aws"] (close)” errored, so skipping 2021/01/15 1505 [TRACE] dag/walk: upstream of “module.cdn (close)” errored, so skipping 2021/01/15 1505 [TRACE] dag/walk: upstream of “meta.count-boundary (EachMode fixup)” errored, so skipping 2021/01/15 1505 [TRACE] dag/walk: upstream of “root” errored, so skipping 2021/01/15 1505 [INFO] backend/local: plan operation completed
Error: Required attribute is not set
on ../fork/terraform-aws-cloudfront-cdn/main.tf line 30, in resource “aws_cloudfront_distribution” “default”: 30: resource “aws_cloudfront_distribution” “default” {
Error: Required attribute is not set
on ../fork/terraform-aws-cloudfront-cdn/main.tf line 30, in resource “aws_cloudfront_distribution” “default”: 30: resource “aws_cloudfront_distribution” “default” {
Error: Required attribute is not set
on ../fork/terraform-aws-cloudfront-cdn/main.tf line 30, in resource “aws_cloudfront_distribution” “default”: 30: resource “aws_cloudfront_distribution” “default” {
Error: Required attribute is not set
on ../fork/terraform-aws-cloudfront-cdn/main.tf line 30, in resource “aws_cloudfront_distribution” “default”: 30: resource “aws_cloudfront_distribution” “default” {
Error: Required attribute is not set
on ../fork/terraform-aws-cloudfront-cdn/main.tf line 30, in resource “aws_cloudfront_distribution” “default”: 30: resource “aws_cloudfront_distribution” “default” {
Error: Required attribute is not set
on ../fork/terraform-aws-cloudfront-cdn/main.tf line 30, in resource “aws_cloudfront_distribution” “default”: 30: resource “aws_cloudfront_distribution” “default” {
Error: Required attribute is not set
on ../fork/terraform-aws-cloudfront-cdn/main.tf line 30, in resource “aws_cloudfront_distribution” “default”: 30: resource “aws_cloudfront_distribution” “default” {
Error: Required attribute is not set
on ../fork/terraform-aws-cloudfront-cdn/main.tf line 30, in resource “aws_cloudfront_distribution” “default”: 30: resource “aws_cloudfront_distribution” “default” {
Error: Required attribute is not set
on ../fork/terraform-aws-cloudfront-cdn/main.tf line 30, in resource “aws_cloudfront_distribution” “default”: 30: resource “aws_cloudfront_distribution” “default” {
2021/01/15 1505 [TRACE] statemgr.Filesystem: removing lock metadata file .terraform.tfstate.lock.info Error: Required attribute is not set
on ../fork/terraform-aws-cloudfront-cdn/main.tf line 30, in resource “aws_cloudfront_distribution” “default”: 2021/01/15 1505 [TRACE] statemgr.Filesystem: unlocking terraform.tfstate using fcntl flock 30: resource “aws_cloudfront_distribution” “default” {
2021-01-15T1505.461-0800 [WARN] plugin.stdio: received EOF, stopping recv loop: err=”rpc error: code = Unavailable desc = transport is closing” 2021-01-15T1505.465-0800 [DEBUG] plugin: plugin process exited: path=.terraform/providers/registry.terraform.io/hashicorp/aws/3.24.1/darwin_amd64/terraform-provider-aws_v3.24.1_x5 pid=99698 2021-01-15T1505.465-0800 [DEBUG] plugin: plugin exited

Do you have a link to fork and parameters of call to it

No, I hadn’t cleaned it up adequately yet

Was still just trying to get it to work

Is there any way to have it actually tell me what attribute isn’t set?

Not that I’m aware of

Hrm, thanks @

@ I’ve never seen error messages like that ^

are you using terraform?

in general, all Cloud Posse modules require at least one of the following inputs: namespace
, environment
, stage
, name

we use https://github.com/cloudposse/terraform-null-label to uniquely and consistently name all the resources
Terraform Module to define a consistent naming convention by (namespace, stage, name, [attributes]) - cloudposse/terraform-null-label

so the ID of a resource will have the format like: namespace-environment-stage-name

you can skip any of the parameters, but at least one is required

Thanks @Andriy Knysh (Cloud Posse)! I figured it out…. It was a canadianism: I used the word “behaviour” instead of “behavior” in a couple of places deep in the module, and that caused it to error out like that. LOL

I now have a working prototype at least, I’ll see how it goes from here!

ok, I figure it out

is a VPC name unique across regions in the same account?
e.g. can i have a dev
VPC in Ireland and another dev
VPC in Singapore within the same account?

The vpc name is just a tag, it’s not unique in any way. Can be the same name in the same account, in the same region. There’s no restriction on the name
2021-01-14

If anyone’s using AWS AppMesh here - do you know if there’s a way for the virtual gateway to NOT overwrite the host? we’re using the host in order to do some analytics in the container.. might also be similar for Istio users.

am I blind or is there really no way to get container details for a running tasks in the new ECS portal?

Going to check it out right now. What details are missing?

all the details that you could access from the collapse menus in the current version – not the best example since it’s an x-ray container, but you’ll get the point

Hah! That’s pretty bad. I’m seeing the same issue in my UI

new or old UI?

new

I left feedback, first time I’ve had to do that because of a UI change

I wonder if it’s because this information is a part of the task definition and they’re trying to not have it in multiple places?

but that’s how my team knows how to find the CloudWatch Log Stream associated with that task..

I do not like the new one

I can appreciate that they’re trying to streamline information, but they dropped the important bits

ok, glad I’m not the only one maybe there will be enough reactions to get them to fix it before it becomes the default view

the new UI also removed the links to the monitoring/logs
2021-01-13

I’m having a bit of a wood-for-trees moment with ACM Private CA - it’s reasonably straightforward to set up a root and subordinate CA in a single AWS account and RAM share that out to other accounts that need certificates (although it seems necessary to share both the root and the subordinate for the subordinate to appear in the ACM PCA console in the target account). However, the best practice (https://docs.aws.amazon.com/acm-pca/latest/userguide/ca-best-practices.html) recommends having the root alone in its own account, and subordinate(s) in another account. My problem is that the process of signing the subordinate CA manually when the root CA is in another account is really not clear. The docs cover the case of both in the same account, or an using an external CA. Anyone done this before?
Learn the best ways to use ACM Private CA.

Has anyone ever seen the AWS ElasticSearch Service take over an hour to create a domain (i.e. Domain is in “Loading” state and nothing is accessible)? I’ve seen long update / creation times from this service before… but this seems absurd.

Never timed it, but the long creation/update times are something I ran into in the past too.

I remember doing ES Domain maintain activities took a very long time, esp. if I tuned on Automated backups

Does anyone need to periodically restart the aws ssm agent ?

no sir

Nope.

nvm, found out it was an old ami from 2018. upgrading the ami seemed to fix the issue.

¯_(ツ)_/¯

[thread] Troubleshooting — EC2 Instance for Windows isn’t working with ECS tasks, ssm session manager, and RDP reporting User Profile service Can’t start

The source for this AMI = packer built. It’s an ebs based image, encrypted by default 50gb.
I notice the launch configuration for this has no detail on volume, just uses the defaults from the image I believe.
in console I see this means the ebs volume is not optimized and not encrypted.
Is this a false track to go down? pretty sure the instance has permissions for kms + full ssm and more. I want to know if any other ideas before i call it a night
2021-01-12

Anyone using on AWS App Mesh? Thoughts?

I think that it’s a limited implementation of envoy but it’s serving my needs which are also limited currently

I did a POC

only works on awsvpc mode with ECS

so that was a show stopper for me

but I moved it to awsvpc and then I had issues with Service discovery, the docs are far from great if you do not run on fargate
2021-01-11

i’m seeing some funky issues with grpc and nlb’s when rolling pods in eks. anyone got experience in this area?

We get 500s when roll ingress nginx as pod process terminates before nlb health check times out Some work upstream around 1.19 but we are still on 1.15… :(
2021-01-08

Hey I want to work with a subdomain on CloudFlare but unfortunately it does not support working with subdomains. What are my options? Will Route53/Cloudfront be useful?

Guys I’ve been using AWS SSM Parameter store for storing my credentials like RDS database credentials so that I can access them via API or CLI in my pipelines or infrastructure code. I am thinking to put my AWS credentials in SSM parameter store because my Rails application (which is deployed in ECS via Terraform) demands AWS keys for accessing an S3 bucket. Should I put AWS credentials in SSM? I just feel that it is not right way to deal with this problem.

https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-iam-roles.html
The key points are:
With IAM roles for Amazon ECS tasks, you can specify an IAM role that can be used by the containers in a task.
Instead of creating and distributing your AWS credentials to the containers or using the EC2 instance’s role, you can associate an IAM role with an ECS task definition or RunTask
API operation.
So you need to:
- Create IAM role for your ECS task, grant this role required S3 permissions
- Update your task definition to use this IAM role
- Make sure that your ruby app uses AWS SDK’s automatic configuration for credentials
With IAM roles for Amazon ECS tasks, you can specify an IAM role that can be used by the containers in a task. Applications must sign their AWS API requests with AWS credentials, and this feature provides a strategy for managing credentials for your applications to use, similar to the way that Amazon EC2 instance profiles provide credentials to EC2 instances. Instead of creating and distributing your AWS credentials to the containers or using the EC2 instance’s role, you can associate an IAM role with an ECS task definition or

Nice. I did not know this option is available. Thanks.

When I am using this Task execution role, Will this allow me run AWS cli commands? For example Will I be able to run aws s3 ls
inside the container running my Ruby app?

- You mixing up Task Execution Role (https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_execution_IAM_role.html) and Task Role. Task Execution Role is used by container agents. Task Role is used by containers in you Task. You grant Task Execution Role permissions to access your secrets, pull image from ECR, write logs etc. And you grant Task Role permissions which your applications need (e.g. s3:ListBucket etc.).
- AWS CLI uses AWS SDK under the hood and can also auto-discover the credentials (https://docs.aws.amazon.com/cli/latest/topic/config-vars.html#using-aws-iam-roles). No need to hard-code them.
The task execution role grants the Amazon ECS container and Fargate agents permission to make AWS API calls on your behalf. The task execution IAM role is required depending on the requirements of your task. You can have multiple task execution roles for different purposes and services associated with your account.

In my code I created one role and passed the same role ARN for both Task Role and Task Execution Role. Thanks for clarifying I did not know the difference.

I logged into the ECS instance and then to my task (docker exec -it <containerid> bash) and tried curl 169.254.170.2$AWS_CONTAINER_CREDENTIALS_RELATIVE_URI
it returned the same Role ARN which is passed in task-role parameter in my infra. So I thought aws s3 ls
should work but it returned aws command not found
error.

Well obviously your container image should have aws cli installed if you want to use it inside the container.

Yes I thought ECS AMI should have installed it. My issue is resolved. Thanks

2021-01-07

Hi I am trying to find a tool/vault to manage my passwords (a tool like roboform, lastpass). I am looking for an open source tool like this which I can configure on my Linux EC2 instance and access via a UI. Is there any tool like this?


Yes I explored
- BitWarden
- Passbolt
- HyperVault What do you think is it a good option to self host the service? I mean we’ll probably need to provision an RDS database with the EC2 instance. I have a feeling that the cost may increase. Password managers like Roboform costs around 3 USD per user per month. So I am little confused here what to use. Self hosted or buy a plan for my organization?

it depends on your use case. how many ppl are you going to use it? I have oly used BW for personal usage so I cannot really speak enterprise wise

Around 20 to 50 users

Hashicorp Vault = opensource + UI + production grade solution

That’s also one option. I’ll explore more @ Thanks.
2021-01-06

bump going to tackle this in next couple days so would love to know your experience

https://docs.aws.amazon.com/systems-manager/latest/userguide/distributor.html
this? never heard of it but sounds interesting
Create, manage, and deploy software packages on Systems Manager managed instances.

I have a system in place with choco + auto builds, but I’m trying to eliminate using multiple toolchains for each. If I can wrap up all into a single package I’ll probably have better luck maintaining it

Having an internal debate and I’m curious what your guys’ thoughts are: How many people do you think there are (worldwide) who are using Terraform with AWS today? Please include your rationale for your answer.
Some stats:
• The AWS Provider github repo has 5k stars and 2k contributors.
• The AWS Provider has been downloaded 216.2M times.
• This channel has 2,221 members.

engineers who know and actively write HCL or total number of developers that inherently are “using” Terraform?

because I’d barter that those two numbers are dramatically different

Oh great question! Let’s focus on the first one - those actually writing HCL. It can then be a proxy to the second one.

I’d guesstimate that it’s somewhere around 250:1 in regards to users:contributors – so half-million? (my initial 500:1 sounded silly after I thought about it)

how many people is also different of how many infrastructures are managed. we’re ~8 in my company but we managed dozens of different stacks

8 what - HCL developers, total developers, total people?

8 people who write HCL every days

for 150 developpers

So I would count 8 users in this case. This is a theoretical question, but you can then derive a bunch of different things from it - like how many companies use AWS+TF, how many developers in the world indirectly use AWS+TF (like the 150 developers you mentioned), etc.
Once we can answer these questions, we can compare AWS+TF’s usage with that of other open source technologies and see what level of adoption it has.

I think I’ve already answer to a survey with that kind of question

Where?

I’m guessing the number is at least 100,000 but no more than a 1,000,000 just for everyday folks that use Terraform w/ AWS. I think you could easily cross the 2-3 mil threshold if you include folks who do simple POCs and make minor contributions to custom modules. I don’t know how to make an accurate guesstimate further than this. Is this how many people have used it over the lifetime of the products in question, or people that have used it within the last week, month, year?

according to the Google Machine there were 21m software developers in the world in 2016 – if you estimated 24m today…I’d be shocked to learn that more than 2% engineers write HCL actively – so I’m sticking with my 500k number

@ can’t remember sorry

At @Issif’s company, 5% of developers write HCL, but maybe they’re a more advanced company. Larger companies, with thousands of developers, probably have an even lower percentage of developers who write HCL. I wonder if 2% is too high even.
@MattyB good point, the focus is on “active” users, so let’s say they use it on a regular basis (weekly).

Another vote for ~100k Terraform developers in the whole world. So perhaps 70k Terraform AWS devs

@ what’s your rationale for that number?

I estimate 1% of software developers write Terraform, plus Darren’s number of 20m software devs globally

So I tried validating that 1% or 2%. My closest approach is to look at GitHub repos stats: https://github.com/oprogramador/github-languages#all-active
If you focus only on repos that have been active for the year before the report ran (given HCL is new and growing quickly), you’ll see that far far less than 1% of all repos have HCL.
Can this be a reliable proxy to determining the percentage of developers who work with HCL?

I don’t know. Multiple repos can be owned by a single developer. Maybe if you could extrapolate people who have contributed to all repos you could get a minimum number?

@ are you saying 100k terraform AWS developers or just terraform? If that’s just terraform then there should be far fewer terraform AWS since it’s used for a ton of providers

@ that’s clever and on reflection yes I think 1% estimate was way too high.

Continuing along the GitHub line of thinking, GitHub reported 40M users a year ago. Let’s say it’s closer to 45M now. 1% would be 450,000. As I said, I think it’s far less than 1%. At 0.25% for example, it would support the 100,000 number Alex used.
I took the table from the link above and summed the total number of repos, it’s 18.5m in total (yes, I know many repos can have multiple languages, but trying to get a broad stroke). Of those, 0.21% have HCL.
So 0.21% of 45M is 94,500, close to the 100,000 number.

I think a lot of github metrics can underweight big “old” languages like Java and C++, and overweight modern github-first tools like Terraform. So IMO that 0.21% might still be an overestimate

I also think to attribute 100% of GitHub Users as people who are actively writing code is large assumption

Good point @Darren Cunningham. I wonder if there’s a way to answer what % of GitHub Users are coders, and what % of coders have GitHub useres.

% are abandoned accounts

GitHub claiming 40m users is a publicity stunt IMO

How do you define abandoned accounts?

My Digital Ocean account that I haven’t logged into since 2010

technically I’m “a user” of their platform

LOL

how many people have just created a new account rather than updated their email in GitHub?

how many have 2 accounts…I know I do

So do I actually. I also work with a guy that has 3 or 4.

using number of accounts seems like you’re creating a new complexity factor that’s way out of control

Yeah, I agree. So if we go back to the 20m or so developers, and 0.21%, we’re looking at 42k HCL devs

Hi All - any help greatly appreciated (wasn’t sure which channel was best for this one)
https://sweetops.slack.com/archives/CB6GHNLG0/p1609975378080100
Hi All - first post so excuse if silly :)
Wondering what’s best practice for creating kafka topics post cluster creation using the CloudPosse MSK Module?
AWS doesn’t appear to support anything directly on MSK and even references the apache shell scripts ([here >points to <https://kafka.apache.org/documentation/#topicconfigs | here](https://docs.aws.amazon.com/msk/latest/developerguide/msk-configuration-properties.html#msk-topic-confinguration)) |
If really cli only, is it possible to run a template file after the MSK Cluster is created to run the shell scripts? e.g.
$ bin/kafka-topics.sh --bootstrap-server localhost:9092 --create --topic my-topic --partitions 1 \
--replication-factor 1 --config max.message.bytes=64000 --config flush.messages=1
Thanks for any help
2021-01-05

In https://github.com/cloudposse/reference-architectures#3-delegate-dns Can some one explains An available domain we can use for DNS-base service discovery (E.g. [ourcompany.co](http://ourcompany.co)
). This domain must not be in use elsewhere as the master account will need to be the authoritative name server (SOA
).
[WIP] Get up and running quickly with one of our reference architecture using our fully automated cold-start process. - cloudposse/reference-architectures

does anyone have a “best practice” for where to have the S3 bucket for access logs for an account in the account itself or in the security account and replicate

I pretty much asked about this a few weeks ago…
https://sweetops.slack.com/archives/CCT1E7JJY/p1608486517167200
I’m thinking about creating an account dedicated to storage within my AWS Org –
My initial plan is to consolidate the S3 bucket sprawl I have between accounts into a single bucket within that account that provides my Org access to write and each account access to read it’s files using a prefix condition on the bucket policy.
Bucket sprawl for Service logs: VPC Flow, S3 Access, CloudFront logs, etc – application buckets would remain in their account….not looking to fix that sprawl.
Any words of wisdom or horror stories that I should be aware of before I embark on my quest?

the responses weren’t threaded though

I loved the word “best practice” before, now I hate it, there is no best practice - I mean - one way to do a thing, there are patterns which mean multiple valid ways to do it. Every use case is unique, so the answer depends on your use case:
• single AWS account - use same account.
• multiple AWS account - create Access-log s3 bucket in the logging account/security account/or audit account and send data there. in both cases setup retention settings and archiving.

@ i completely agree with you i hate the word as well, i just wondered really what people do and why

Anyone use AWS distributor for packages? I prefer choco and dsc but I need to create a datadog package and want to cover linux + windows. I’d like to know if various distros can be handled easily, configure etc. Overall if I any problems using or smooth sailing.
Otherwise I have to do a mix of ansible+ dsc and more and it’s unlikely others will be comfortable with that.
In addition while I’m a fan of ansible I primarily use AWS SSM to manage a mix of Windows and Linux instances. At this time AWS SSM only will run playbooks for Linux.

We use cloudsmith

For the packages themselves
2021-01-04

theres no db slack channel, so I’m asking here since I’m using RDS (and theyre deprecating support for my postgres version). Anyone thats done the postgres 9 -> postgres 10/11 migration have any gotchas we should be concerned about when doing it?

Ah geez — When are they deprecating 9 exactly?

we did a migration from 10-11 and it was fine but that all depends if you are using version specific features but for what I have seen this happens more on the mysql side then postgres

they usually do not have breaking changes

not helpful as to the “how”, but as to the “why”…when deciding to upgrade PostgreSQL this is a great resource: https://why-upgrade.depesz.com/show?from=9.6.20&to=12.4&keywords=




Have you ever wondered what it would feel like to change planes while it is in flight? I for sure do not want to be on that flight, but…

this is the guts of it. we ended up enhancing this process using scripts

but it’s all predicated on “can you allow downtime.. or not?”

9.5 is being deprecated right now. There’s a couple of weeks remaining before AWS mark it EOL

We’re on 9.5 — Feb 16 @Matt Gowie
Upgrade your RDS for PostgreSQL 9.5 databases before Feb 16, 2021
The RDS for PostgreSQL 9.5 end-of-life date is approaching. Your database is using a version that must be upgraded to 12 or higher as soon as possible. We plan to automatically upgrade RDS for PostgreSQL 9.5 databases to 12 starting February 16, 2021 00:00:01 AM UTC.

Ah 9.5 — Thanks man

@rms1000watt how much resource pressure happened (read iops, throughput, etc) on the source DB during the 4 days of synchronization?

it wasn’t bad only because we overprovisioned to becareful of this situation

like we increased IOPS, CPU, etc

(to be fair, the most kudos need to be given to @Ronak for the first migration, and these subsequent two migrations. He’s the real mastermind.)

but there were a few tests beforehand to measure the effects of a migration on a DB of our size

(like stand up a second DB from a snapshot with the same resources and try the migration from that, just to test how it’d behave)

do you mind giving the details of your db size?

24x.large .. waaaaaay over provisioned

is that a typo?

jeez

didnt even know 24xlarges exist


so our 2xlarge will migrate in a much shorter amount of time


yea, we had like.. 3TB?

We can afford downtime on the weekends as long as we put together some language for our customers. I’m tempted to spin up a second RDS instance w/ snapshot, upgrade the db to pg11 and do the cutover at the route53 level.

Did you guys run into any backwards incompatible changes at the application level?


but if you can afford downtime, don’t even mess with pglogical

for the DBs we could afford downtime we just did:
• RDS upgrade from 9.6 -> 11.8
• Create Aurora Read Replicas from the 11.8
• Promote to a Aurora Read Replica
• Cutover the application configs to Aurora endpoint

actually we went to latest 11.9, but it’s the same process

awesome. your two way replication solution is definitely a thing of beauty. But I do think I can afford to spin up a second RDS instance over the weekend upgraded to pg11 and do the cutover at route53. my solution probably wont warrant a cool medium blog about it though.
