#aws (2022-08)
Discussion related to Amazon Web Services (AWS)
Discussion related to Amazon Web Services (AWS)
Archive: https://archive.sweetops.com/aws/
2022-08-02

I have an unusual situation with a client. They manage many remote sites and have physical devices (up to 20) at each location. Each device needs to send metrics to cloudwatch and upload files to S3 and they currently use static aws credentials (~~~/.aws/credentials). I would like to move them to IAM anywhere to use temporary credentials. The ask is if a device gets compromised how can we disable access to AWS from that particular device. I was thinking to use an IAM Role per device however they are expecting to have ~~~k devices online by the end of the year. I’d use Terraform to manage the roles and AWS organizations to use multiple accounts since there’s a 5k IAM role quota per account. Does this sound manageable? or is there a better approach?

I’m thinking a role per device is not the best approach. Maybe you can have fewer roles that are based on the device class/location/etc. That way you can have one role for “Ohio” devices and another for “Utah” devices (just giving examples). The roles would be tightly locked down to only allow writes to CloudWatch and S3 based on their class/location/etc. I would also limit access to CloudWatch and S3 by the class as well…no need to give any more permissions if you are worried about compromise of the device.

If we used a role per location like “Ohio” for example and a single device is compromised how would we just deny access to that device instead of all of Ohio since they’ll be using the same role?

Isn’t this what you want? https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_revoke-sessions.html
Immediately revoke permissions from a console session or a role whose credentials have been compromised or are suspected of being compromised.

I guess I’m not sure what sort of compromise you’re trying to mitigate.
Indeed if you had long-lived AWS cred sitting on disk for each device, some gets those and you have to reset everything everywhere.
With IAM creds they expire regularly and are regenerated. So if someone grabs those creds (and the associated session token) they are only good for a short time.
However, if someone is camped on your device, yes, they could likely use the role creds.
Is it the case that someone could sit on your device undetected? How would they connect? If its via SSH, have you considered removing SSH access altogether and using an SSM agent?
I have lots of questions!


If these are IOT devices (vs server devices) perhaps there are managed IOT services that you can use in AWS for the same purpose (CloudWatch logging and uploads).

The idea is to brick them, not necessarily to brick the OS but the running application

got it

Does each device have its own log group?


@Joe Niland - Yes they do. The cloudwatch agent is sending logs to their respective log groups

@Erik Osterman (Cloud Posse) - Thanks for posting the link. It looks like the only option is to revoke ALL sessions for a particular role. I’m looking to revoke or deny a device session without affecting the rest of the devices.

I’m wondering if the IoT services have any faculty for helping with that. I’m thinking of the TF plans you’ll have to run to manage all those roles and its giving me chills
But seriously, at that scale, I’m thinking there should be a better approach than having to manage 10k roles across multiple accounts

@Patrick McDonald and they each have their own IAM user right?

I’ve just recently taken a look at IAM Roles Anywhere… From what I’ve understood so far, you need a CA to serve as trust anchor. In my case I’m using Hashicorp Vault for my POC testing. If you issued unique certificates for each device under the CA then if one were to become compromised you should be able to revoke the certificate of that device. The CRL URL should be part of the CA certificate and I would assume it would be queried to check if the certificate was validate besides just being signed by the CA. This would allow to use the same role if the devices didn’t require unique IAM permissions but still have unique device authentication.

Jeremy, that’s an interesting approach which makes total sense. I’m going to look into that! Thank you.

IAM isn’t really designed to be supported at scale in an account for your users like that. The solution that we usually help our customers implement is using our platform to generate private/public keys per device, and then verifying those keys on your service side. Exposing all of these clients direct access to your cloudwatch and S3, is a huge risk. Throwing a CF + lambda@edge plus service client authentication goes a long way. If you really want to allow direct access, because you consider the huge risk to be worth the small decrease in cost, you could proxy the requests through CF/APIGW directly to the service API.
Alternatively, you could take the private/public key signed JWT that you have (either custom built or using our platform) and use cognito identity pool to vend temporary AWS tokens.

Another question i have is the network access. That is, is the device using a private network when in service? That could be another way to lock down access by limiting access to devices on the private network. That assumes if the device is compromised/stolen and then connects to a different network, access to AWS resources would be blocked.

check out today’s office hours recording for some good suggestions

is the recording already published?

@Patrick McDonald https://www.youtube.com/watch?v=zANiSr_PzcQ&ab_channel=CloudPosse
2022-08-03
2022-08-04

Hey everyone! Does anyone have an example of a CloudWatch Agent config. with a multi_line_start_pattern
and/or filter.expression
that includes RE2 escaped characters? I’m having a devil of a time getting this to work as expected. I’ve tried single escapes (e.g. \]
for the literal ]
character) and double-escapes (e.g. \\]
for the same literal) and neither seems to be working right. For what it’s worth, the configuration is deployed with Ansible, so the inputs are YAML and converted to JSON on the fly when the values are interpolated in the template file.
For example, any filters that I specify get translated as follows (snippet from a Jinja2 template file for the agent config):
"filters": {{ __log_file.filters | to_json }},
The templating is working as-expected; I’m just not sure the final syntax is right. I’ve been checking against the RE2 syntax but I can’t find a good example with escaped characters. Thanks for any help!
2022-08-08

Hey everyone! Is it possible to attach multiple load balancers to ECS service like ALB for internal user and NLB for external use ?

Yes

i ahve added nlb and lab both but when i add nlb target group to the ecs service it says InvalidParameterException: loadBalancerName and targetGroupArn cannot both be specified. You must specify either a loadBalancerName or a targetGroupArn.

i am using cloudposse ecs-alb-service-task
ecs-container-definition`

You have to set the load balancer name to null and give it multiple target groups that are attached as listener rules to the same load balancer

@RB Thanks man I did that it worked thanks but the target container are not getting registered. Registration and deregistration in loop

I’d look into why that’s happening. Are the health checks failing?

yes i think with helath checks 401 in logs

Sharing an approach I used for a similar situation with ECS serving two LBs (internal ALB and external NLB)…
Instead of having two sets of target groups, I configured the ECS to use the target group associated with the internal ALB. That way, deployments are only updated in one place.
Then I created an externally available NLB that uses the internal ALB as its target.
This has been working great.

But in my infra the container has two ports one for internal use and one for external which will be connected to NLB and internal to alb so two separate target groups are needed

That’s interesting! I’m curious why you would need two ports on the container if its providing the same service internally or externally… does the app/service/container do any sort of processing based on where the client is connecting from?
In my experience, the only need for internal vs external access if the DNS. on VPN, clients use an internal address. On internet, they use the external address. Once they hit the application, the app forwards it to an IAM service for authentication, then they get redirected back to the app.
if possible, is there a way for you to only use one target group? Asking because, if i recall correctly, i started to go down the multiple target group route and it looked like I would need two deployments of the same application, one for the internal ALB TGs and one for what would have been an external ALB TG.
If all else fails, and you have to keep the applicaiton configured as is (with two ports), it might just be easier to consider it as two seperate services in the same cluster. One service for internal ALB with its own TG and another server for external ALB and TG (i would def use ALB with this approach and not NLB). When you deploy, just deploy to both services at the same time.

hello all, We are using aurora serverless v1 and few of them has a strange issue. Sometimes it dropts all connections. In error log around that time I see the db restarted. but did not see anything before that. any idea ?

Sounds like the db is stopping, as it does. You can try v2

yes, i see it is stopping and starting, but why only on few clusters and not on all with same usage?

it started few days ago and happens in every 4 hours

I cannot try the v2 because of the price and it is production traffic. Is there any zero downtime upgrade ?
2022-08-09
2022-08-10

Hi All, anyone who got experience with https://vantage.sh/ or other cost analyser?

Vantage is a self-service cloud cost platform that gives developers the tools they need to analyze, report on and optimize AWS and GCP costs.

A friend of mine is the CEO of this early-stage startup and asked that I share it around, in case anyone finds it interesting: https://www.usage.ai/

We are testing v2 aurora… is that possible that v2 is slower than v1 on same size?

Need some help with kinesis agent not spamming to kinesis… i can see stuff happening in /var/log/aws-kinesis-agent/aws-kinesis-agent.log
to the right kinesis, but i just dont see any blip from kinesis monitoring, anything else im missing?
2022-08-11

Not sure where else to post ist so I’ll try it here.
I have a bucket with CORS configuration allowing only certain domains as origins.
I am trying to use Cloudflare image resizing /cdn-cgi/image
proxy but I get CORS errors when accessing something like:
<https://www.example.com/cdn-cgi/image/https://my-bucket.s3-eu-central-1.amazonaws.com/images/myimage.jpg>
It works when I allow all origins on my bucket but I don’t want to do that.
Anybody had experience with something like this?

Have you tried adding www.example.com as an allowed origin for the S3 bucket?

“I have a bucket with CORS configuration allowing only certain domains as origins.” Al ready mentioned that

example.com is one of the certain domains

Maybe open a support ticket with Cloudflare? This seems like a fairly common task, surprised there’s not more documentation (after doing a quick google)

Yes, I am also thinking it might be something simple in the end. There is this https://developers.cloudflare.com/images/image-resizing/control-origin-access/ I hope that will not be solution to have to have cloudflare workers

Choose between Cloudflare Images and Cloudflare Image Resizing, two products tailored to your different needs.

You can use Cloudflare Image Resizing with either a pre-defined URL format or, for advanced use cases, with Cloudflare Workers.
Which of those resizing option you’re using ?

https://<ZONE>/cdn-cgi/image/<OPTIONS>/<SOURCE-IMAGE>

as mentioned in the original question
<https://www.example.com/cdn-cgi/image/https://my-bucket.s3-eu-central-1.amazonaws.com/images/myimage.jpg>

Looking for something to run custom devops scripts from. What do people tend to use ATM? Lamdba/Rundeck/Airflow/ArgoWorkflows? We use AWS with EKS so a k8s solution is an option.

We use Lambdas, we use SSM Docs, we use terraform local-exec, we use kubernetes jobs. So, my answer would be “it depends”

can you elaborate on what the script will do ? that will make answering your question easier

I just downloaded the terraform codebuild project but make keeps erroring out

You need to share the exact output in order for anyone to be able to help

makes sense @Erik Osterman (Cloud Posse)


So we assume you have curl installed

ah

im running on windows and installed a vm of debian

ill install that now

any other packages?

make: *** /mnt/c/Users/S903935/OneDrive: Is a directory. Stop

im running make Makefile @Erik Osterman (Cloud Posse)

is there a guide somewhere @Erik Osterman (Cloud Posse).

I often use make to compile programs. But, sometimes, only with some packages, when the directory contains a space, it says: No such file or directory Example: If I run make in the directory /home/

it seems like i am running into this

im just moving my github directory, that will solve this

i want to make sure I understand everything correctly, this is a template which will allow me to automate my codebuild projects, correct?

Terraform Module to easily leverage AWS CodeBuild for Continuous Integration

please message me with a reply as I am going to head to bed and will check back when i wake up

i dont want the conversation to be buried
2022-08-12

hey is anyone around?

i was able to migrate the github project into a place where there is no spaces, now i am getting curl could not resolve host cloudposse.tools when i try make Makefile

is there some other guides with more details in it?

i would like to know where exactly you are supposed to customize the variables as well as which ones you shouldnt customize

anyone around?

I guess not

hmm, it is friday

i really wanted to give the build a shot but its hard without detailed info

ive been trying to find an easy way to build out our codebuild projects for our ci/cd pipeline

it is a bit of a pain to create projects for each micro-service manually

any help would be much appreciated

You are using github, just use GH actions. Codebuild is almost never the answer

i rely heavily on the buildspec part

deploy out to my eks cluster

works really well

The cloudposse.tools domain resolves for me

i downloaded the file just as it was and tried compiling


There is no command that I’m familiar with that is make Makefile

What problem are you trying to solve?

Are you trying to consume the terraform module?

how do you run it?

i thought it needed to be compiled and that was what the Makefile was

hence why i ran that

I would follow the readme. There should be a usage section

thats the problem, i did

its not clear

You create a new directory, copy and paste the terraform code from the usage section, then run terraform init and terraform plan

That’s it :)

ok, let me try that

Here’s a good blog post on using modules in general
https://spacelift.io/blog/what-are-terraform-modules-and-how-do-they-work

Terraform modules are a way of extending your present Terraform configuration with already existing parts of reusable code, to reduce the amount of code you have to develop for similar infrastructure components. Others would say that the module definition is a single or many .tf files stacked together in their own directory. Both sides would be right.


havent really used terraform

read some about it

but this is the first time i am actually trying to use it

i usually stick to cloudformation

Let’s move this over to the #terraform channel

sure
2022-08-15

I need to know about lambda workflow we have code on S3 every time it will fetch the code while execute or it cache some where

The code is stored in Lambda, not S3. S3 is just used as a mechanism to upload large code bundles to AWS, and then it is copied from S3 into the Lambda environment.

This is evident as you can edit the code in the Lambda console, using the editor in the web UI, and then if you checked some previously uploaded code in S3, the changes will only exist in the Lambda environment

Thank you so much

Code Source

Let’s not ping randow people. If you are stuck with something specific post a code example with the error you are getting. It’s the quickest way to get help.

?

Any one please

ok @Warren Parad

Hi all!
Has anyone here have any experience with docker compose - ecs integration? I am trying to make service discovery work but the the cloudmap namespace can not be resolved inside the ecs container. Any help is appreciated!
2022-08-16

I am currently having problems pulling aws docker images from their public registries:
$ docker pull 602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.3
Error response from daemon: Head "<https://602401143452.dkr.ecr.us-west-2.amazonaws.com/v2/amazon/aws-load-balancer-controller/manifests/v2.4.3>": no basic auth credentials
$ docker pull 602401143452.dkr.ecr.eu-west-1.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.3
Error response from daemon: Head "<https://602401143452.dkr.ecr.eu-west-1.amazonaws.com/v2/amazon/aws-load-balancer-controller/manifests/v2.4.3>": no basic auth credentials
Can anyone relate? From inside or outside the AWS network, with or without docker-login credentials, still same error

Hi all! Found a bug? What can I do? Please help me. Thanks. Murphy.

Submit a pull request to fix it

• Yes, I will, if I cannot get an answer. And I’d rather it was just a configuration error.

use the old provider

an file the issue

The aws version is 4.25.0.


Hi everyone. I’m looking for a way to set log retention with aws_glue_job, but the result doesn’t seem to work as it suggests that it should in the documentation. It just occurred to me, as I’m writing the question, that I’ve been trying to specify the logging parameters as NonOverridableArguments, so I’m going to look into not doing that, but I was curious if anyone knew of a resource that had a clear explanation of how the different options of logging behaved ( including the unrelated security_configuration object that also includes settings for logs, and appears to mandate the name of the log group. )

Ran into an issue with the most recent changes to the Cloudposse SFTP Transfer module. The latest PR adds functionality to set home directories via the sftp_user input variable (very helpful). I noticed if users are created with the restricted_home variable set to true, and then later that variable is set to false; the in-place updates to users fail with the following errors:
Error: error updating Transfer User (redacted): InvalidParameter: 1 validation error(s) found.
│ - minimum field size of 1, UpdateUserInput.HomeDirectoryMappings.
│
│
│ with module.transfer-sftp.aws_transfer_user.default["redacted"],
│ on .terraform/modules/transfer-sftp/main.tf line 53, in resource "aws_transfer_user" "default":
│ 53: resource "aws_transfer_user" "default" {
│
╵
╷
│ Error: error updating Transfer User (redacted): InvalidParameter: 1 validation error(s) found.
│ - minimum field size of 1, UpdateUserInput.HomeDirectoryMappings.
│
│
│ with module.transfer-sftp.aws_transfer_user.default["redacted"],
│ on .terraform/modules/transfer-sftp/main.tf line 53, in resource "aws_transfer_user" "default":
│ 53: resource "aws_transfer_user" "default" {
│
╵

I wsa able to get around this issue by deleting all of the users, and re-creating them with the restricted_home variable set to false. Should the restricted_home variable be set per_user as an input?

@Chandler Forrest please create an issue with a minimal viable reproducible example and we can then see what the error is. The additional inputs off the sftp_users
should be optional so if they are not, then this may be a bug but it’s hard to debug without all the necessary context

Can we get s3 bucket access logs to cloudwatch logs ?

Yes. But your question is too generic, please be more specific

The bucket access logs need to sync with cloudwatch logs

this logs @Alex Jurkiewicz

S3 access logs can only be written to another S3 bucket. But you can process log files as they come in and write to cloudwatch logs using custom compute. For example, a Lambda function. However, I’m not sure why you would need this. What’s wrong with having the access logs in S3?

just for info i have config on same bucket

we have some file on S3 we need know access count for each file
2022-08-17

Hello, team!
2022-08-19

Hey all, possibly-stupid question: can AWS Support plans be set up with Terraform? Or is there some other way of sensibly managing support plans across a nontrivial org?

Neither the mainline AWS or the AWSCC providers seem to have a “support” related resource, so I think that’s a no.

Support plans can only be changed by the root account creds -> which aren’t often used with IAC.

In AWS org, you only need to turn on support at the org root. Not each individual account

This can only be done by submitting a support ticket under Account management > General Account Question ;
and more specifically you need to specify each sub account you want to enable under the root, as the plan is technically billed per account
https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/consolidatedbilling-support.html

Thanks all
2022-08-20
2022-08-22
2022-08-23

hello!
has anyone ever deployed an SSR nextjs app on AWS? I am experimenting with Amplify and want to ask what are some problems you all had with Amplify hosting. Thanks!
2022-08-24

Hello team, During effort to use newer version of terraform module cloudposse/terraform-aws-emr-cluster I come accross issue and before opening Issue on github I am suggested in template to ask for support here first.
While trying to update from version tag 0.23.0 and use latest version tag 1.0.0 I come on following error during terraform plan:
Error: Unsupported block type
on ../.tfmodules/terraform-aws-emr-cluster/main.tf line 523, in resource "aws_emr_cluster" "default":
523: dynamic "auto_termination_policy" {
Blocks of type "auto_termination_policy" are not expected here.
Any suggestion how to proceed or to open issue on github instead?


Make sure youre using the latest aws provider version

I see it in latest 3.x provider version too so you must be using an older version of the aws provider

What confused me when I was looking at it earlier is that there is no reference in my code to auto_termination_policy which is marked as optional
Also currently at 3.63.0
AWS provider and requirement in our docs says requirement is aws >= 3.5.0

Perhaps the requirement needs to be bumped

for now, you can remove your pin, or remove the .terraform.lock.hcl
file locally and run another terraform init
to see if it will resolve the issue

thanks for suggestions

did it work?

sorry but would not be able to provide you with answer since I am not trying it atm

no worries, let us know when you can

Tailscale - I have been using tailscale to connect to my private resources and it has been working great! However, I ran into a big problem recently. I got some contractors and shared my node with them. I just found out that it doesn’t share the Subnet relayed traffic (reference: https://tailscale.com/kb/1084/sharing/#sharing--subnets-subnet-routers) Anyone have alternatives to tailscale that will help solve my problem?
I did think of creating a Gmail Account for them with the same domain as me but that would be an extra cost for each contractor just to have access to Tailscale.
Learn how to give another Tailscale user access to a private device within your network, without exposing it publicly.

Looking for that same underlying technology like tailscale if possible. I love this tool and would love to recommend it to more people but this is a big annoyance
Learn how to give another Tailscale user access to a private device within your network, without exposing it publicly.

Lots of competitors in the market — What I’d suggest looking into is these tools in the following order: StrongDM >= Teleport > CloudFlare Access > HashiCorp Boundary > AWS Client VPN. Search for any of those tools in this Slack and you’ll get plenty of hits. BeyondCorp is one of the most talked about topics.
Tailscale is the best though and for a very good price. I would suggest just accepting the Gmail account cost — You’ll save more of your own money + time not switching away to reduce a hundred or two hundred in yearly spend?

Thank you Matt! I actually found a good workaround bc I really wanted to stick with tailscale. Since they are developer contractors, I was able to use GitHub as the Identity Provider.
Also a nice note to everyone here, tailscale does support several OpenID Connect identity providers. Found here: https://tailscale.com/kb/1119/sso-saml-oidc/
This will need to be a new task for me to provide a better identity provider
Tailscale allows WireGuard® to support several OpenID Connect (OIDC) identity providers, including Google, Azure AD, Okta, and others. Almost everyone can use one of the included providers.

That works too. And definitely best to stick with tailscale — it’s a great tool.

Anyone use AWS code artifact? love it? hate it? We are considering it for pip
. CI would push (already uses OIDC with circle CI), and devs would pull for docker container based environments

Might be a good question for #office-hours

Thought the same thing. I had missed it when I posted this
2022-08-25
2022-08-28

Hey guys, Is AWS subnets operate at network layer 2?

what do you mean “operate at”?

It’s probably most correct to say subnets operate at layer 3, since they primarily group traffic by IP. But “operate at” is a very broad phrase…
A subnet is a range of IP addresses in your VPC. You can launch AWS resources into a specified subnet. Use a public subnet for resources that must be connected to the internet, and a private subnet for resources that won’t be connected to the internet.

yep its layer 3

thanks Alex
2022-08-29
2022-08-30
2022-08-31

Let’s say I have a S3 bucket called “abc” and a folder in it like “abc/tmp”. I already restricted IAM policy to restrict access to only PutObject in “abc/tmp” and “abc/tmp/*”. But somehow the api I created can still upload files to a random folder like “abc/tmp99999”… Any other restrictions I have to set??

How certain are you the restriction is for abc/tmp/*
and not abc/tmp*
?

Pretty straightforward like Effect:Allow, ActionPutObject, Resourceaws
::abc/tmp, arn:aws
::abc/tmp/*

That policy is on the iam user/role/group? Is there more than one policy, maybe allowing more s3 actions than you thought? Also, what is the s3 bucket policy? Does it have an allow statement that might match?

Hi, Anyone aware of avoiding bot attacks ? my project has been attacked by bots, my infra is aws and cloudflare.

You might need to be a little more specific as to the type of attack

My developer said, it is an just bot attack, i dont know the more details.

Preliminary Security group check needs to do verify whether any protocol has anywhere access [ 0.0.0.0/0 ]

No @sai kumar


@Max Thank you.