#aws (2022-08)

aws Discussion related to Amazon Web Services (AWS)

aws Discussion related to Amazon Web Services (AWS)

Archive: https://archive.sweetops.com/aws/

2022-08-02

Patrick McDonald avatar
Patrick McDonald

I have an unusual situation with a client. They manage many remote sites and have physical devices (up to 20) at each location. Each device needs to send metrics to cloudwatch and upload files to S3 and they currently use static aws credentials (~~~/.aws/credentials). I would like to move them to IAM anywhere to use temporary credentials. The ask is if a device gets compromised how can we disable access to AWS from that particular device. I was thinking to use an IAM Role per device however they are expecting to have ~~~k devices online by the end of the year. I’d use Terraform to manage the roles and AWS organizations to use multiple accounts since there’s a 5k IAM role quota per account. Does this sound manageable? or is there a better approach?

managedkaos avatar
managedkaos

I’m thinking a role per device is not the best approach. Maybe you can have fewer roles that are based on the device class/location/etc. That way you can have one role for “Ohio” devices and another for “Utah” devices (just giving examples). The roles would be tightly locked down to only allow writes to CloudWatch and S3 based on their class/location/etc. I would also limit access to CloudWatch and S3 by the class as well…no need to give any more permissions if you are worried about compromise of the device.

Patrick McDonald avatar
Patrick McDonald

If we used a role per location like “Ohio” for example and a single device is compromised how would we just deny access to that device instead of all of Ohio since they’ll be using the same role?

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)
Revoking IAM role temporary security credentials - AWS Identity and Access Management

Immediately revoke permissions from a console session or a role whose credentials have been compromised or are suspected of being compromised.

1
managedkaos avatar
managedkaos

I guess I’m not sure what sort of compromise you’re trying to mitigate.

Indeed if you had long-lived AWS cred sitting on disk for each device, some gets those and you have to reset everything everywhere.

With IAM creds they expire regularly and are regenerated. So if someone grabs those creds (and the associated session token) they are only good for a short time.

However, if someone is camped on your device, yes, they could likely use the role creds.

Is it the case that someone could sit on your device undetected? How would they connect? If its via SSH, have you considered removing SSH access altogether and using an SSM agent?

I have lots of questions!

Patrick McDonald avatar
Patrick McDonald

These devices sit on utility poles and have been stolen in the past.

1
managedkaos avatar
managedkaos

If these are IOT devices (vs server devices) perhaps there are managed IOT services that you can use in AWS for the same purpose (CloudWatch logging and uploads).

Patrick McDonald avatar
Patrick McDonald

The idea is to brick them, not necessarily to brick the OS but the running application

managedkaos avatar
managedkaos

got it

Joe Niland avatar
Joe Niland

Does each device have its own log group?

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

This is a good one for #office-hours

1
Patrick McDonald avatar
Patrick McDonald

@Joe Niland - Yes they do. The cloudwatch agent is sending logs to their respective log groups

Patrick McDonald avatar
Patrick McDonald

@Erik Osterman (Cloud Posse) - Thanks for posting the link. It looks like the only option is to revoke ALL sessions for a particular role. I’m looking to revoke or deny a device session without affecting the rest of the devices.

managedkaos avatar
managedkaos

I’m wondering if the IoT services have any faculty for helping with that. I’m thinking of the TF plans you’ll have to run to manage all those roles and its giving me chills

But seriously, at that scale, I’m thinking there should be a better approach than having to manage 10k roles across multiple accounts

Joe Niland avatar
Joe Niland

@Patrick McDonald and they each have their own IAM user right?

Jeremy (UnderGrid Network Services) avatar
Jeremy (UnderGrid Network Services)

I’ve just recently taken a look at IAM Roles Anywhere… From what I’ve understood so far, you need a CA to serve as trust anchor. In my case I’m using Hashicorp Vault for my POC testing. If you issued unique certificates for each device under the CA then if one were to become compromised you should be able to revoke the certificate of that device. The CRL URL should be part of the CA certificate and I would assume it would be queried to check if the certificate was validate besides just being signed by the CA. This would allow to use the same role if the devices didn’t require unique IAM permissions but still have unique device authentication.

Patrick McDonald avatar
Patrick McDonald

Jeremy, that’s an interesting approach which makes total sense. I’m going to look into that! Thank you.

Warren Parad avatar
Warren Parad

IAM isn’t really designed to be supported at scale in an account for your users like that. The solution that we usually help our customers implement is using our platform to generate private/public keys per device, and then verifying those keys on your service side. Exposing all of these clients direct access to your cloudwatch and S3, is a huge risk. Throwing a CF + [email protected] plus service client authentication goes a long way. If you really want to allow direct access, because you consider the huge risk to be worth the small decrease in cost, you could proxy the requests through CF/APIGW directly to the service API.

Alternatively, you could take the private/public key signed JWT that you have (either custom built or using our platform) and use cognito identity pool to vend temporary AWS tokens.

managedkaos avatar
managedkaos

Another question i have is the network access. That is, is the device using a private network when in service? That could be another way to lock down access by limiting access to devices on the private network. That assumes if the device is compromised/stolen and then connects to a different network, access to AWS resources would be blocked.

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

check out today’s office hours recording for some good suggestions

Patrick McDonald avatar
Patrick McDonald

is the recording already published?

2022-08-03

2022-08-04

Jonathan Backhaus avatar
Jonathan Backhaus

Hey everyone! Does anyone have an example of a CloudWatch Agent config. with a multi_line_start_pattern and/or filter.expression that includes RE2 escaped characters? I’m having a devil of a time getting this to work as expected. I’ve tried single escapes (e.g. \] for the literal ] character) and double-escapes (e.g. \\] for the same literal) and neither seems to be working right. For what it’s worth, the configuration is deployed with Ansible, so the inputs are YAML and converted to JSON on the fly when the values are interpolated in the template file.

For example, any filters that I specify get translated as follows (snippet from a Jinja2 template file for the agent config):

  "filters": {{ __log_file.filters | to_json }},

The templating is working as-expected; I’m just not sure the final syntax is right. I’ve been checking against the RE2 syntax but I can’t find a good example with escaped characters. Thanks for any help!

2022-08-08

Adarsh Hiwrale avatar
Adarsh Hiwrale

Hey everyone! Is it possible to attach multiple load balancers to ECS service like ALB for internal user and NLB for external use ?

Alex Jurkiewicz avatar
Alex Jurkiewicz

Yes

Adarsh Hiwrale avatar
Adarsh Hiwrale

i ahve added nlb and lab both but when i add nlb target group to the ecs service it says InvalidParameterException: loadBalancerName and targetGroupArn cannot both be specified. You must specify either a loadBalancerName or a targetGroupArn.

Adarsh Hiwrale avatar
Adarsh Hiwrale

i am using cloudposse ecs-alb-service-task ecs-container-definition`

RB (Ronak) (Cloud Posse) avatar
RB (Ronak) (Cloud Posse)

You have to set the load balancer name to null and give it multiple target groups that are attached as listener rules to the same load balancer

Adarsh Hiwrale avatar
Adarsh Hiwrale

@RB (Ronak) (Cloud Posse) Thanks man I did that it worked thanks but the target container are not getting registered. Registration and deregistration in loop

RB (Ronak) (Cloud Posse) avatar
RB (Ronak) (Cloud Posse)

I’d look into why that’s happening. Are the health checks failing?

Adarsh Hiwrale avatar
Adarsh Hiwrale

yes i think with helath checks 401 in logs

managedkaos avatar
managedkaos

Sharing an approach I used for a similar situation with ECS serving two LBs (internal ALB and external NLB)…

Instead of having two sets of target groups, I configured the ECS to use the target group associated with the internal ALB. That way, deployments are only updated in one place.

Then I created an externally available NLB that uses the internal ALB as its target.

This has been working great.

1
Adarsh Hiwrale avatar
Adarsh Hiwrale

But in my infra the container has two ports one for internal use and one for external which will be connected to NLB and internal to alb so two separate target groups are needed

1
managedkaos avatar
managedkaos

That’s interesting! I’m curious why you would need two ports on the container if its providing the same service internally or externally… does the app/service/container do any sort of processing based on where the client is connecting from?

In my experience, the only need for internal vs external access if the DNS. on VPN, clients use an internal address. On internet, they use the external address. Once they hit the application, the app forwards it to an IAM service for authentication, then they get redirected back to the app.

if possible, is there a way for you to only use one target group? Asking because, if i recall correctly, i started to go down the multiple target group route and it looked like I would need two deployments of the same application, one for the internal ALB TGs and one for what would have been an external ALB TG.

If all else fails, and you have to keep the applicaiton configured as is (with two ports), it might just be easier to consider it as two seperate services in the same cluster. One service for internal ALB with its own TG and another server for external ALB and TG (i would def use ALB with this approach and not NLB). When you deploy, just deploy to both services at the same time.

Balazs Varga avatar
Balazs Varga

hello all, We are using aurora serverless v1 and few of them has a strange issue. Sometimes it dropts all connections. In error log around that time I see the db restarted. but did not see anything before that. any idea ?

Alex Jurkiewicz avatar
Alex Jurkiewicz

Sounds like the db is stopping, as it does. You can try v2

Balazs Varga avatar
Balazs Varga

yes, i see it is stopping and starting, but why only on few clusters and not on all with same usage?

Balazs Varga avatar
Balazs Varga

it started few days ago and happens in every 4 hours

Balazs Varga avatar
Balazs Varga

I cannot try the v2 because of the price and it is production traffic. Is there any zero downtime upgrade ?

2022-08-09

    keyboard_arrow_up