#aws (2021-02)
Discussion related to Amazon Web Services (AWS)
Discussion related to Amazon Web Services (AWS)
Archive: https://archive.sweetops.com/aws/
2021-02-01

How do you configure auditing / access logging in your aws account? do you have a separate organization just for storing the audit logs across environments (dev,staging..)

I’m trying to see if there’s any general best-practice regarding the whole auditing part of a fully-cloud architecture

And any other comments regarding that phase / recommendations would be very beneficial as well

Same org, dedicated audit account, all dev-stage-prod-sharedServices accounts send logs to audit account, like cloudtrail - access logs - anything with logs only sec/audit team has access to this account, you enable guardduty and securityhub for Centralized sec/compliance dashboards there

Do you create compliance dashboard using cloudwatch?

sechub


any chance there’s some reference guide regarding all the auditing aspects? I can read on each of them separately but it feels that there might be some overlaps between them

I think you can start with sending cloudtrail logs, enable guarduty and security hub as a start. also check https://aws.amazon.com/blogs/aws/aws-audit-manager-simplifies-audit-preparation/ which is a new service

Gathering evidence in a timely manner to support an audit can be a significant challenge due to manual, error-prone, and sometimes, distributed processes. If your business is subject to compliance requirements, preparing for an audit can cause significant lost productivity and disruption as a result. You might also have trouble applying traditional audit practices, which […]
2021-02-02

Hi all, does anyone know of a way to compare a CloudWatch metric between two different dates? e.g. comparing RequestCount from an ALB so we can say “Oh we’re 10% busier compared to the previous Tuesday”. I know we can open two separate windows and tile them side by side, but would be nicer to have a more precise method.

What about using the AWS CLI and or API to grab the metrics ?

Yup, that’s doable. I was hoping there was a way to overlay the graphs somehow

Hey all! I’m trying to geo-replicate my RDS backups, but I’m not too sure which would be the best way to do that. I managed to automatically geo-replicate the RDS Snapshots taken every day by exporting them to S3 and then setting up S3 replication to another region, but it does not seem possible to restore a database from those exported snapshots since they are in the parquet format. Anyone implemented such a solution in here?

did you check AWS Backup?


Oh, very nice!

Will look into it, thanks

but more than old s3 backup replication

I’ll have to check their pricing, might be worth it if it’s less overhead to manage

Has anyone hit the issue of “Note that you must create the Lambda function from the same account as the container registry in Amazon ECR.” when using lambda container images? https://docs.aws.amazon.com/lambda/latest/dg/images-create.html. This seems like a rough limitation when your ECR is in a separate account. Has anyone heard of any progress of this changing or a possible workaround?
Create a container image for a Lambda function by using an AWS provided base image or an alternative base image.

It’s being actively worked on and is expected “soon” according to AWS Support
Create a container image for a Lambda function by using an AWS provided base image or an alternative base image.

The new AWS ECS UI appears to be missing service logs. Why does AWS keep on doing this with their UI improvements?

I have no clue, I switched to the old UI

I’d assume it’s because the majority of people don’t leave feedback


I’ve seen them respond to quite a bit of feedback overtime… but I agree, some of the new UIs are a regression

feature regression but the UI is overall better. I guess they hope for better long term results

I agree that the direction is good, but regression of features that make it harder to get to information we need to be successful is

Holy shit Bezos is stepping down, @ajassy to be new Amazon CEO in Q3.
2021-02-03

does anyone use aws-vault here? We’re using it, and it seems that actions that were performed by an aws-vault authenticated user, are logged with a session id instead of it’s username. Anyone familiar?

Can you paster your aws-vault config, while masking values of course

yes -
[profile root]
region=us-east-1
[profile dev]
region=us-east-1
role_arn=arn:aws:iam::yyyyyy:role/OrganizationAccountAccessRole
source_profile=root

i’m using the root credentials in order to access all other accounts(dev, staging etc) in the organization

hmm, that uses a role, not a user nor group, here is my config for aws-vault
with SSO
( permission set on SSO
):
1 │ [default]
2 │ output=yaml-stream
3 │ cli_pager=
4 │
11 │ [profile XYZ-master]
12 │ region=eu-west-1
13 │
14 │ [profile XYZ-shared]
15 │ sso_start_url=<https://XYZ.awsapps.com/start>
16 │ sso_region=eu-west-1
17 │ sso_account_id=123456789012
18 │ sso_role_name=SuperAdmins
19 │ region=eu-west-1
20 │
21 │ [profile XYZ-dev]
22 │ sso_start_url=<https://XYZ.awsapps.com/start>
23 │ sso_region=eu-west-1
24 │ sso_account_id=123456789012
25 │ sso_role_name=SuperAdmins
26 │ region=eu-west-1
27 │
28 │ [profile XYZ-prod]
29 │ sso_start_url=<https://XYZ.awsapps.com/start>
30 │ sso_region=eu-west-1
31 │ sso_account_id=123456789012
32 │ sso_role_name=SuperAdmins
33 │ region=eu-west-1

This allows us to use SSO single source of truth for access from users, and usernames appear on cloudtrail
logs


Any suggestions for debugging Route53 DNS latency issues? I have a DataDog TCP synthetics test for an NLB TCP service in a client’s production account. It tests that the TCP connection takes less than 500ms. This works 95% of the time, but every other day or so the client has the monitor alerting because the DNS lookup takes 450ms+ which combined with the latency from the NLB / Service triggers the alert. This seems a bit off out there to me and I’m wondering what tools I can use to debug that.
The root domain is delegated from another account:
- The TTL of the NS record in the delegated account is 172800 seconds.
- The TTL of the NS record in the primary account that points $delegated.$base_host to the delegated hosted zone NS record has a TTL of 60 seconds. Any tools / thoughts / suggestions on this problem?

We ran into a similar issue (also via DataDog). So we set it to alert only if it fails in a few consecutive attempts.

Yeah we are rolling out that same change… but this is making me wonder if client connections are occasionally experiencing long DNS lookup times as well, which is the concerning bit.

There are many possible causes here… could be Route53, could be the DNS servers located closer to DataDog, etc. You can use tools like this: https://www.dnsperf.com/dns-speed-benchmark

That will help isolate if it’s a DataDog thing or a DNS thing.

Hm for some reason that tool doesn’t seem to work with my client’s domain. Odd.

Yeah it seems to only work with “www.“’s. Maybe if you go premium ?

Oh, it also works with the top level, like so: https://www.dnsperf.com/dns-speed-benchmark?id=1okq398kkppk1cu

Haha it looks like a great tool — shame that it has that limitation.

I’m running into the same issue with Rabbit MQ behind NLB - Route53 is delegated in dev account, look like a common issue

most of the time we lost the first tcp packet we send to NLB

so in the sender we add a check for the NLB and then send the message

We had the same problem so we did the same as @Yoni Leitersdorf (Indeni Cloudrail)

This is very similar to the new approach I’m seeing to memory leaks in containers: Just kill the container every 10 minutes (instead of finding out why it’s happening…)

Datadog edge synthetics servers are no so reliable

we have a lots of false positives from different locations

We just shared a self-case study of our own journey to securing our cloud environment. The conclusion of it can be summed up as:
• Shift left is far far better than just CSPM on live cloud environments. That is, if you’re actually interested in fixing things and not just visibility.
• The relationship between developers and security doesn’t have to be a standoff. With the right process, personnel approach and tools you can actually have your developers fix issues (and not chase them for months on a single ticket…). Would love to hear other people’s experience.
Case study: https://indeni.com/blog/indeni-cloudrail-case-study-eating-dogfood-and-enjoying-it/

Hi all - I am running an app on Fargate and have logconfig as logDriver: Splunk to ingest logs to splunk. I did set the docker container log path which consists of multiple log files. but as of now its only ingesting docker default logs stdout
or stderr
to Splunk. Can someone help me on how to ingest all the log files in the logs directory to splunk ?
Thank you very much in advance.

How are you folks handling Lambda & ECS (Fargate) CI/CD alongside long lived infrastructure? We’re trying to standardize on tools like aws sam, openfaas, serverless, etc… rather than rolling our own for handling Lambda / APIGW. I noticed AWS Copilot and ECS CLI but not sure what else is out there unless you go to EKS unless I’m misunderstanding the services. Pretty much just preparing for a lot of on-prem -> cloud migration.

FYI we’re planning on using Terraform for our long lived infrastructure.

Some of the teams here are starting to use Serveless framework, which do a lot of heavy lifting regarding zip/upload the code. It also allows for you to setup a lot of the lambda’s integration

Sorry to interject here after two weeks but did anyone of you go with terraform + docker approach for AWS lambda deployments?

Talking about this new feature: https://aws.amazon.com/blogs/aws/new-for-aws-lambda-container-image-support/

February 9, 2021: Post updated with the current regional availability of container image support for AWS Lambda. With AWS Lambda, you upload your code and run it without thinking about servers. Many customers enjoy the way this works, but if you’ve invested in container tooling for your development workflows, it’s not easy to use the […]

Yeah, that’s available via AWS SAM which I used for a POC. I thought it was neat to use but haven’t spent a lot of time with it.
2021-02-04
2021-02-05

Anyone has some experience with AWS Control Tower? would love to hear how is it in practice

Hi everyone! I have created an EKS cluster with the terraform_aws_eks module and the cluster was created with a particular access key and secret key. On a client machine, I cannot use that access key but have to use another set of accesskeys and then assume a role using the aws sts command. After assuming the role, I have “admin access”. When I then call kubectl get pods, I do not have access. I thought I could solve this by including this bit in the cluster creation:
map_roles = [
{
rolearn = “arniam:role/my-role”
username = “my-role”
groups = [“system:masters”]
}
]
where rolearn is the role that I assumed… but when executing kubectl get pods, I still have no access. Could someone point me to a solution ?

if you are using this module
<https://github.com/cloudposse/terraform-aws-eks-cluster>

then the variable to add additional roles is
variable "map_additional_iam_roles" {
description = "Additional IAM roles to add to `config-map-aws-auth` ConfigMap"
type = list(object({
rolearn = string
username = string
groups = list(string)
}))
default = []
}

@Thomas Hoefkens ^

Hi @Andriy Knysh (Cloud Posse), thanks - that is exactly what I used if you look at the question I posted… is it maybe because of the groups? Are there any default groups that I can assign to the role such as the one the cluster owner AWS access key may have?

I used this: map_roles = [
{
rolearn = “arniam:role/my-role”
username = “my-role”
groups = [“system:masters”]
}
]

This is the setup:
- EKS cluster is created with access key 1
- I can only access the cluster using access key 2 and an assume role command to turn me into identity arn
stsassumed-role/my-assumed-role/my-session-name
- I would now like to access “kubectl get pods”

does the username have to exist as an iam user?

(nit: use codeblocks
)

@Thomas Hoefkens here is what we used

# EKS IAM Authentication settings
# By default, you can authenticate to EKS cluster only by assuming the role that created the cluster (e.g. `xxxx-dev-terraform`).
# After the Auth Config Map is applied, the other IAM roles in `map_additional_iam_roles` will be able to authenticate.
# Add other IAM roles and map them to Kubernetes groups here
# <https://kubernetes.io/docs/reference/access-authn-authz/rbac/>
map_additional_iam_roles = [
{
rolearn = "arn:aws:iam::xxxxxxxxxxxx:role/xxx-dev-admin"
username = "dev-admin"
groups = ["system:masters"]
},
{
rolearn = "arn:aws:iam::xxxxxxxxxxxx:role/xxx-master-admin"
username = "master-admin"
groups = ["system:masters"]
},
{
rolearn = "arn:aws:iam::xxxxxxxxxxxx:role/xxx-dev-read-only"
username = "dev-read-only"
groups = ["system:authenticated"]
},
{
rolearn = "arn:aws:iam::xxxxxxxxxxxx:role/xxx-master-read-only"
username = "master-read-only"
groups = ["system:authenticated"]
}
]

username = "dev-admin" is not related to IAM

my question about the variable map_additional_iam_roles
was if you actually assigned your map to this variable when you instantiate the module


you can do it step by step:

- Assume the role that created the cluster

- Get
kubeconfig
from the cluster by, for example, executing these commands:

aws --profile <AWS PROFILE WITH THE ROLE ARN THAT CREATED THE CLUSTER> eks update-kubeconfig --name=my-eks-cluster --region=<MY REGION> --kubeconfig=/dev/shm/my-kubecfg
chmod 600 /dev/shm/my-kubecfg
export KUBECONFIG=/dev/shm/my-kubecfg
kubectl get pods --all-namespaces
kubectl get nodes

where --profile <AWS PROFILE WITH THE ROLE ARN THAT CREATED THE CLUSTER>
is the AWS profile in ~/.aws/config
for the IAM role that created the cluster

if the above works, then you can add additional IAM roles to the config map to allow them to access the cluster

module "eks" {
source = ""
........
map_additional_iam_roles = [
{
rolearn = "arn:aws:iam::844857508710:role/my-role"
username = "my-role"
groups = ["system:masters"]
}
]

then create an AWS profile with that role

and try to execute the same commands with the new profile
aws --profile <MY AWS PROFILE> eks update-kubeconfig --name=my-eks-cluster --region=<MY REGION> --kubeconfig=/dev/shm/my-kubecfg
chmod 600 /dev/shm/my-kubecfg
export KUBECONFIG=/dev/shm/my-kubecfg
kubectl get pods --all-namespaces
kubectl get nodes

@Andriy Knysh (Cloud Posse) thanks a lot for that, I will try that tomorrow and update the thread

Hi @Andriy Knysh (Cloud Posse)I have tested this I assume a role and create the cluster - then I can update kubeconfig and actually see the pods In the AWS console, my logged-on user assumes another role which I have added in map_additional_iam_roles. However, in the console now,when looking at EKS, the console says “your current user or role does not have access to Kubernetes object on this cluster”. Perhaps this is only a console issue..?

after you update kubeconfig
and it’s on your computer, look inside - you should see all other IAM roles that can access the cluster listed there (and only those, plus the role that created the cluster, will be able to access the cluster)

anyone else having issues with their ECS/Fargate deployments today? I know AWS was forcing deployment updates today

I just had a bunch of ECS deployments issues

task in Pending state for ever

same

ohhhhhh

First it couldn’t pull a container from docker hub

do you have a capacity provider?

no

ok, we thought was a problem with the capacity provider

this ecs task has been running for 6 months with no issue

we just changes instance types and then it died

we have done that before and never had an issue

we were on version 1.3.0 of fargate and when I redeployed i started getting
“CannotPullContainerError: Error response from daemon: Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)Read only root file systemfalse”

we are on ECS+EC2

well, turns out one of the teams deleted all the nacls

ohhhhhhhh


ON FRIDAY!!!!

yah it took me a while to track it down, went through their terraform repo and saw they added aws_default_network_acl
to their terraform without defining rules

… almost as much fun as someone manually adding a vpc endpoint but not configuring it properly

ohhhh hey

that may explain why we saw a 1x daily task suddenly kick off mid afternoon
2021-02-06

This could be handy, for generating minimal iam policies… https://github.com/iann0036/iamlive
Generate a basic IAM policy from AWS client-side monitoring (CSM) - iann0036/iamlive

Also this one! Generates IAM from your application code https://github.com/iann0036/iamfast
IAM policy generation from application code. Contribute to iann0036/iamfast development by creating an account on GitHub.

That is cool, would love to see the resulting policy incorporated into the docs of each terraform resource and data source
IAM policy generation from application code. Contribute to iann0036/iamfast development by creating an account on GitHub.
2021-02-07
2021-02-08

Alfred workflow to Open AWS in your browser via aws-vault - kangaechu/aws-vault-alfred-workflow

has anyone figured out how to run aws-vault –server in the background here ?

been doing it in a detached screen session so far

Hi guys! any recommendation of VPN server
? we want SSO with google, OpenVpn
only accepts Gsuite with LDAP plan, which we dont want to pay. We want as well to have groups of users, and depending on the group to can allow the users go to one IP range tdestination or not.

check out tailscale? uses wireguard, very actively developed…

Pritunl

(Also search slack - some recent discussions on this)

I 2nd Pritunl


+1 for Pritunl

two options:
• HashiCorp boundary
• AWS Client VPN ( tested with TLS auth or AWS SSO auth or Okta Auth) my preferred

fwiw, Pritunl will probably work with DocumentDB (as an alternative to Mongo)

+1 for tailscale - have good experiences tried to get boundary running, but it felt a bit cumbersome OpenVPN can have custom authorization via plugins / scripts - but be careful - it has serious impact on the forwarding engine (single blocking threat)

Anyone get AWS Network Firewall to Prod yet? We’re thinking about trying it out for East<->West traffic filtering. Wondering if anyone has an impression of how reliable it is during spikey/scale out traffic events. Also, since it’s a newish service, if you find it pretty solid or still kinda buggy.

Anyone has experience with iotcore? I’m trying to understand why some arquitectures use kinesis firehose to send data to a s3 bucket.. Im able to the same with iotcore rules. Thanks

Might be a possibility for sending the data directly to redshift and firehose supports this by dumping first to s3 and then to redshift.

Hi everyone! I have a strange issue and wonder whether any of you have encountered it or managed to solve it.. I deploy an EKS cluster with fargate profiles using terraform, and this works perfectly the first time round. Then I issue a TF destroy and all resources are gone, so far so good. Now, when again applying the TF scripts, with the same cluster name, the creation gets stuck on creating fargate profiles.. as if something is hindering AWS from recreating the same fargate profile names (which have been correctly deleted by TF): module.eks.module.fargate.aws_eks_fargate_profile.this[“default”]: Still creating… [44m50s elapsed] Is this is a bug or is there a workaround for this? Often I can see that the Profile got created for the cluster, yet TF is somehow not “seeing” that the creation is complete…
2021-02-09

Nice. If you attempt to create a CloudWatch dashboard with a hex colour specified with fewer than six hex characters (eg #ddd
), you get an error like
The dashboard body is invalid, there are 1 validation errors: [ { "dataPath": "/widgets/0/properties/metrics/0", "message": "Should NOT have more than 4 items" } ]
clear as mud.
2021-02-10

Question on Kinesis: if data ordering is critical, can we only have one lambda consuming the data - or is there an elegant way of writing the data in the correct order to Dynamo for example..?

yes. read the docs on kinesis shards and data partitioning

I read the docs… just wanted to find out whether e.g. the lambda event trigger will assure that there will be no parallel processing of a single shard…

basically does AWS make sure when a lambda is triggered that this will always be a single lambda invocation… but as I understand it, a batch of records will always be passed into the lambda.. @Alex Jurkiewicz just wanted to make sure no 2 batches ever get fed to 2 different lambda instances so that records being written may overtake each other…

yes, the docs are a little ambiguous. We have a similar requirement and confirmed with AWS support that things work the expected way. I think there is some risk when resharding a stream but not in normal operation

ok, great, thanks for the info

Hi, I have an issue - when I’m doing ‘rolling deployment’ - changing instance AMI in my ECS cluster, which consists of 1 instance. Old instance is set to draining, but the new instance doesn’t run the task until the old instance dies. Any tips?


Learn how to use Git panel for software source control.

Has anyone here tried to create a nested cloudformation using the stackset resource? all works well until I try to retrieve the output values for the nested cloudformation that “lives” on the target account
I was following the docs and also found a blog explaining it. But at the end of the blog he says the following:
“outputs of the stacks created using StackSets are not easily accessible. If you want to reference something from the sub-template, the only way is to synthesize the resource names / ARNs. This is not always possible, e.g., B. with generated ARNs such as ACM certificates*
Does anyone know what he means? In my case is just an ec2 All help is appreciated it. thank you

it means you can predict the ARN in some cases if it depends on data you have provided. For example if you create an IAM Role with a hardcoded name, the ARN is predictable. But if you are looking at a resource name where the ARN is not predictable, you are SoL

However, it is possible to read the stack set directly, as if it were a normal stack. You just need access to the target account and knowledge of the stack set name. Then just inspect the stack’s outputs as per normal. The passage above refers to the fact the outputs aren’t consolidated up at the master stackset level automatically

anyone here have experience wiring up SES with IAM? Looks like it’s a bit different to everything else
edit: never mind, it’s normal, but there’s an extra thing that I was conflating with IAM
2021-02-11

Hi, does anyone knows if it is possible to use a bastion server to access containers in AWS Fargate or the best approach to establish secure connections to Fargate containers? I have found some examples of people building SSH containers and exposing it through public IP however, I don’t like the idea of having developers using a private key to ssh into those containers. I would appreciate any hint.

Is the Fargate task running in ECS or in EKS? Haven’t tried it myself but in EKS it might be possible to use kubectl exec
for fargate tasks - couldn’t find a result in a quick search.
https://docs.aws.amazon.com/toolkit-for-jetbrains/latest/userguide/ecs-debug.html seems quite interesting - so it must be possible to connect somehow to fargate tasks
Describes how to use the AWS Toolkit for JetBrains to debug code in an Amazon Elastic Container Service (Amazon ECS) cluster in an AWS account.

If normal SSH is not an option - how about the reverse principle? Running a sidecar container that connects to a service where you can tunnel to it - I’ve seen somewhere that someone used AWS SSM (with custom instance registration) for having a “serverless bastion”. Can’t remember though if he only used it to tunnel into the network, or if he used it to access aspects of the Fargate task

Hi @Patrick Jahns, the Fargate task is running in ECS - I will have a look on what you’ve suggested and maybe try to run the SSM agent as a sidecar or as a Fargate task. Thanks for the hint!
2021-02-12

Guys how do I install AWS CLI v2 if I am using the base image that is derived from alpine linux? (inside Gitlab CI, but the CI tool doesn’t matter that much)

Take a look at https://github.com/aws/aws-cli/issues/4685
docker run -it –rm docker:latest sh wget "https://d1vvhvl2y92vvt.cloudfront.net/awscli-exe-linux-x86_64.zip" -O "awscliv2.zip" # curl not installed unzip awscliv2.zip ./aws/ins…

Hopefully that helps, i use https://github.com/aws/aws-cli/issues/4685#issuecomment-700923581 looks like a user made more improvements, I havent personally tested those
docker run -it –rm docker:latest sh wget "https://d1vvhvl2y92vvt.cloudfront.net/awscli-exe-linux-x86_64.zip" -O "awscliv2.zip" # curl not installed unzip awscliv2.zip ./aws/ins…

Thanks! @Tom Dugan

Hi anyone has experience achieving https://docs.aws.amazon.com/apigateway/latest/developerguide/getting-started-client-side-ssl-authentication.html but instead of the REST api with the HTTP api ? https://docs.aws.amazon.com/apigateway/latest/developerguide/http-api.html thanks !
Learn how to enable backend SSL authentication of an API using the API Gateway console.

I’m using ALB - if I don’t specify target groups, but set healthchecks to EC2 & ELB, then do ELB healthchecks work at all? Thanks!
2021-02-16

Hi all i just going through this link :- https://docs.aws.amazon.com/general/latest/gr/signing_aws_api_requests.html
my question is how to check what version of Signature am using? is there any clues in aws console ?
Learn how to sign AWS API requests.

Putting this here since I’m using RDS (postgres) and there’s no database channel. I upgraded my production database from postgres 9.5.22 to 12.4 over the weekend as well as upgrading the instance type from r4.2xl to r5.2xl. In this upgrade, my read iops went from 12,000 -> 2,000 and I cannot figure out why. I understand that both the instance type upgrade and the postgres version upgrade introduce performance gains just by merely performing the upgrade, but I wasn’t expecting this big of an improvement. No changes application-wise and I haven’t seen any issues w/ our applications in regards to query data. Obviously I’d be happy with this much of an improvement if it’s expected, but I’m a bit weary as well.

thats a huge upgrade to do in one shot o.O

we went from 11 -> 12 in prod a week ago and all hell broke loose. I’d highly recommend running an ANALYZE on all your indexes, we had a lot of problems until we did that

@Zach oof thats a bit worrisome, we ran the upgrades for all our environments over the span of 2 months (we have many - qa, demo, trial, and 6 or so other environments) where either internal users or external users (customers) are accessing our many sites there and never got any complaints. We have 2 production RDS servers, one which we upgraded 2 weekends ago and one which we upgraded this past weekend.

What problems did you have?

horrible performance on indexed operations

we initially just ran renidexing, but then later found some documentation saying that computed indices during a major version upgrade really need to have Analyze run

we caught this issue when I finally found that RDS does a pgupgrade, which doesn’t catch all the recommended actions from postgres on major version upgrades. They actually recommend doing a pgdump and restore … obviously not so great on RDS but we didn’t realize initially that RDS didn’t conform to that

Interesting, other than the re-indexing and analyze and perfomance hits, you didn’t run into any other issues? My worry is more-so with data loss or data corruption. we did plan on doing the reindexing as mentioned here and will definitely run an analyze as well now: https://info.crunchydata.com/blog/just-upgrade-how-postgresql-12-can-improve-your-performance

Just upgrade and take advantage of performance improvements in PostgreSQL 12.

that was the only issue, and we had problems with just about every database on 3 different RDS until we did the analyze

queries were taking forever to complete

hmm, we have roughly 1000 customers (big and small) and have not heard any complaints in regards to application performance. we migrated roughly 500 of them over 10 days ago.

well thats good then. You seem to be seeing the perf increases we had hoped we’d see from pg12

I’ve sanity checked many of their sites but nonetheless I’m still EXTREMELY anxious about it.

was the re-indexing resource intensive? I know pg12 allows for concurrent re-indexing now which is nice

yes big cpu/load spikes even with the concurrent

also make sure you are nowhere near low storage, because it has to make copies of the indices

thanks for answering my many questions @Zach. how much storage are you using and what instance type?

we actually are WAY overprovisioned on storage as we had to scale up to handle upwards of 12,000 iops. We’re currently using 500GB out of 4TB of storage.

we’re pretty small, these are like m5.2xl with 500gb

we tend to be cpu bound more than memory or iops for some reason

does ANALYZE do anything in regards to improving performance?

Allows the query planner to be smarter?
ANALYZE collects statistics about the contents of tables in the database, and stores the results in the pg_statistic system catalog. Subsequently, the query planner uses these statistics to help determine the most efficient execution plans for queries.

if the indices are using ‘computed’ values, it regenerates those

our problem seemed to be upgrade + reindex leaving lots of stale values in the indexes, so the planner decided to just ignore it

I’m not a DB guy so thats just what I learned on the fly as we put the fire out

neither am I. I’m wondering if your issue was specific to indicies w/ computed values and if my team doesn’t use computed values

I’m asking as we speak

I think our issue was computed values + really bloated partition tables

how did you verify analyze + reindex worked? I assumed since it was a fire, just by having the application re-run its queries?

db load shot back down

it was operating at like 40-50x load for the # of vcpus

hmm. my db load is low

yah your useage seems very different

or alternatively, my teams write bad sql

yeah knock on wood but I might be lucky with this one. I’m definitely going to keep digging though. much appreciated @Zach

for the record, our cpu load is hanging around 20-40%

vcpu sessions in perf insights between 1~2 out of 8vcpus

sorry last question @Zach, if you dont mind roughly how many databases are you running in your server? im curious because we have many databases in a single server, my index tables could be small compared to yours even though we run basically the same size DB Server?

~10 services on one, which is where the fire came from when some of the unanalyzed indexes caused everything else to slam to a halt

the others are 1-2 services per rds

Are you saying performance is the same, but read iops reduced by 80%+? Or your DB is 80%+ slower?

our performance looks roughly the same, the applications are functioning the same but my read iops have dropped by 80%

two variables that have changed (pg 9.5.22 -> 12.4 and r4 -> r5)

that’s very cool. I was about to deploy some new clusters, will make sure devs use 12 instead of 11

are you using more memory? is more data being cached?

I went from r4.2xlarge -> r5.2xlarge so roughly 3gb more memory.

I am a bit worried by what @Zach mentioned, I am very weary that those 2 things could introduce that big of a performance boost. All the sanity checks I’ve done look fine though.

looking forward to hearing the final conclusion

its been over 14 days, still nothing has come up from this upgrade in regards to performance or data issues. We went from peaks of 12k iops to <1k iops. Soo much healthier. We’re just way overprovisioned on storage space now because of the iops balance being tied to storage (pretty cheap so not doing anything about it right now)

I don’t know why the drop in iops was so drastic, but I assume it was fixed somewhere between postgres 9.5 and postgres 12

you could artificially limit the amount of memory postgres uses to see if that increases iops again
2021-02-17

Do any folks here install any open source or paid product for server hardening / security monitoring software on their EC2 instances? I have a client going through PCI / SOC2 and there have been requests from the auditing team to validate that we have security monitoring tooling installed on all servers. The client primarily runs applications on EKS Fargate so we hand-wave around this for the most part, but there are a couple EC2 instances in each environment account to support a bastion and system resources running on EKS Node Groups. All are using the base, up-to-date Amazon Linux 2 AMIs.
Looking for any recommendations around something simple to satisfy this requirement. Also, I’d be completely happy to hear that the community consensus is “Hey you should just hand-wave around that”.

ClamAV

Yeah, that’s the one tool that I’ve seen used in this space. Question for you: Do you actually find it useful that you install that or is it just a checked box?

I guess it goes along with the saying “it is better to have it and not need it, then need it and not have it”

there is a bunch of Enterprise where this type of softwares are mandatory, we installed some at EA what were bakes on the images

EA as in Electronic Arts? Is that where you work Pepe?


the only commercial platform i’d come close to recommending is trend micro’s deep security - but i’d only recommend that if something like ClamAV doesn’t tick the boxes required - and that recommendation is quite a few years old now

I’m either going to with ClamAV OR I have a spike to investigate AWS Inspector and that has an “Inspector Agent” model which you can run on your instances, which I’m now interested in as well.

Thanks for weighing in gents.

I use inspector, you can then do auto remediation and such based off findings using ssm

Inspector is an aws thing?

Yea

I have a lambda that auto triggers an inspector assessment on all newly created ec2s and if they don’t meet a certain threshold it gets terminated or marked for termination as well as sns notifications etc


we investigated a few vendors (paloalto twistlock, aquasec, stackrox) and went with Stackrox as it was the cheapest of the three yet they still checked all our boxes for container security monitors for certain benchmarks (CIS, SOC2, HIPAA, etc.), kubernetes security best practice checks on running containers and intrusion detection. None of the vendors in this space are cheap though.

How are everyone handling EC2 ASG Lifecycle Hooks with SQS or SNS where you want to run a script on the instance before termination? Eg: ASG Lifecycle Hook -> SNS/SQS -> consumer in instance
We were going to avoid SQS and cron AWS CLI calls but Spinnaker requires notification and role arn when it creates ASG’s. It creates a unique ASG per deploy (version number suffix) and destroys old ASG. Or some variance in multiple stages.
Facilitating graceful shutdowns in AWS autoscaling groups - scopely/shudder

Turns out Spinnaker only supports SNS Lifecycle Hooks so we going down that path with lifecycled. :)
Facilitating graceful shutdowns in AWS autoscaling groups - scopely/shudder
2021-02-18

Hey, anyone using Eventbridge with ECS tasks here? I’m trying to figure out a way to pass the event details to ecs task but can’t find anything particular in the documentation.
https://docs.aws.amazon.com/eventbridge/latest/userguide/eventbridge-tutorial-ecs.html https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-events-rule-ecsparameters.html https://docs.aws.amazon.com/eventbridge/latest/userguide/eventbridge-tutorial-ecs.html
Learn how to use Amazon EventBridge to run an Amazon ECS task whenever a file is uploaded to a certain Amazon S3 bucket.
Use the AWS CloudFormation AWS::Rule.EcsParameters resource for Events.
2021-02-19

Hi all,
am using AWS ses service, generated my SMTP credentials using the SES Console. sending email is working well am using boto3 to that, but after sometime i reviced a mail saying this….
Hello, If you have already migrated your credentials from Signature Version 2 to Signature Version 4, you can ignore this communication. We have observed Signature Version 2 requests (on an Amazon SES SMTP endpoint) originating from your account over the last week. Please note that Amazon Simple Email Service (SES) is working on an infrastructure upgrade with improved security controls. As a result, Signature Version 2 is being deprecated in favor of Signature Version 4 which offers enhanced security for authentication and authorization of Amazon SES customers by using a signing key instead of your secret access key……..
now i generated the new keys and replaced the old keys but am still getting the error saying this…
The Canonical String for this request should have been ’POST /
content-type:application/x-www-form-urlencoded; charset=utf-8 host//email.us-east-1.amazonaws.com) x-amz-date:20210219T081335Z
content-type;host;x-amz-date
The String-to-Sign should have been ’AWS4-HMAC-SHA256 20210219T081335Z 20210219/us-east-1/ses/aws4_request
any one having any idea ??? what extra steps i have to to

just guessing…any chance you just need to update to a newer version of boto3?

thank you @Darren Cunningham just updated the boto3 still getting same error

AWS SSO Q: Has anybody found a way to ‘mass-sign-in’ to multiple AWS SSO accounts? Our org has dozens of AWS accounts, and I wanted to login or have profiles for many of them. Right now, I have to pre-setup a profile for EACH account:
~/.aws/config
[profile account1.readonly]
sso_start_url = <https://mycompanysso.awsapps.com/start/#/>
sso_region = us-west-2
sso_account_id = 1111111111
sso_role_name = AWSReadOnly
region = us-east-1
[profile account2.readonly]
sso_start_url = <https://mycompanysso.awsapps.com/start/#/>
sso_region = us-west-2
sso_account_id = 2222222222
sso_role_name = AWSReadOnly
region = us-east-1
...
..then sign on to EVERY profile manually:
aws sso login --profile account1.readonly
(opens by browser, I have to enter the 8-character code)
aws sso login --profile account2.readonly
# repeat for DOZENS of accounts!!!
There has to be a better way.

that does suck. though i thought you only had to login to a given sso instance (e.g. the start_url
) once per session, then every sso profile that used that same endpoint would be available… no? i.e…
aws sso login --profile account1.readonly
aws s3 ls --profile account2.readonly

… I just tested it out, and YES you’re right!! I just have to go through the whole rigamarole once , then I point to the profiles. The work is setting up the ~/.aws/config
file with all hundred accounts.role’s I have.
and my SSO login should work for the default 12 hours.

whew! it is a lot of setup for all the profiles, but at least once that’s done, the login is only needed once per session!

there’s some discussion about simplifying the sso config using “defaults” or some kind of “base” profile…
Currently, aws sso login operates on a particular profile, even requiring that sso_account_id and sso_role_name be present in the profile even though it does not use them, only fetching the token (…
At the moment, I can put sso_start_url and sso_region under [default] so that configuring a new profile doesn't require as much typing, but the tool still adds those values, along with region a…

If you have a standard set of roles in each account, then you can easily write a script to generate the aws config . That’s what I did
2021-02-21

does anyone know how to obtain the correct account ID for the AWS CNI docker image when switching regions?
2021-02-22

I’m just curious to see what others are using to manage/track the lifecycle of their AWS resources/assets? Any CMDB or other type of single pane of glass inventory solution?

Morning Denis, what do you mean by AWS resources/assets ? Accounts ?

sorry @Santiago Campuzano, I was referring to any type of AWS resource, especially EC2 and RDS instances for example. One of the use cases is to track the lifecycle of an instance, for audit purposes for example. Back in the day, when everything was done manually, someone would create a VM and then create an asset record in a CMDB tool. When the VM was deleted, the record would have to be deleted as well.

The most easy and powerful way of doing that is to use proper tagging. We have 4-5 tags on average per resource.

If you properly tag your resources you can the use 3rd party or in-house tools to inventory and lifecycle your resources

I’ve used ServiceNow in the past, where our pipelines would call the ServiceNow API to create/update/delete records. But I’m trying to stay away from ServiceNow, but I’m also trying to cover scenarios where we don’t necessarily have full control of the deployment process. For example, a sandbox environment where users may create their instances manually.

To give you an example, we have: Billing, Owner, CreationTime, Team, etc

I completely agree with that, Tag is certainly key to the success here. But you mentioned 3rd party inventory tools and this is what I’m interested on. Any recommendations on that?

TBH, we have worked with in-house custom solutions… Python applications creating/updating Google Spreadsheets

yeah, that’s always an option. However, I’m trying to stay away from in-house solutions, given that we are a really small team and the overhead to maintain such a system can be significant, right?

Yep, you’re right. In our case we need pretty much basic information about the resources, our account is not that large (~2.000 ec2 instances)

well, that’s a pretty decent size. We are probably half of that size today, but my biggest problem is the size of the team. If I can find a SaaS solution with reasonable cost, I’d definitely go for that

You’re right… if the price is reasonable, go for it !

And let us know what your decision was about it

sounds good. Thanks for the feedback btw. Let’s see if anyone else has had experience with any other tools, it’d be nice to hear what others are doing in this space.

have you tried starting with aws config? it has an inventory feature for aws resources

I’ve looked into AWS config in the past, but I don’t think it had support for multi-account/Organization at the time. I’ll have another look, thanks !

multi-account is a bit harder, no matter what solution you go with. but it does now at least have organizations-level support, if all your accounts are at least part of one aws organization

and if not you can still use the config aggregator, but managing it and the account setup more directly

yeah, I think they are moving towards that, by adding support for most services at the org-level.

unfortunately i need multi-org management

I’ll give that a go and see how that plays out, thanks again. At this point we are only managing a single org

yeah, that’s a bigger problem for sure and I know I’ll have that same problem with multi-org and multi-vendor in the near future

if nothing else, it may at least give you one integration point to point some other tool at

agreed. Our Security Team is using Prisma Cloud, which has inventory capabilities. But it’s not always easy to convince security teams to give us API access, etc

I came across this project yesterday, but I haven’t had a chance to give it a go just yet. I thought I’d still share it here, in case anyone is interested: https://github.com/turnerlabs/antiope
AWS Inventory and Compliance Framework. Contribute to turnerlabs/antiope development by creating an account on GitHub.

Have you looked at CloudAware? I know some F500 companies that use it as their Cloud CMDB for the big 3 providers

Thanks @Sean Holmes ! I haven’t heard of them before, but will certainly have a look now. Thanks for sharing!

How can I setup SSM so it doesn’t start overriding the current prompt with new characters after certain number of characters?
2021-02-23

Hi all, sorry for asking again,
I followed this code example (https://docs.aws.amazon.com/ses/latest/DeveloperGuide/examples-send-raw-using-sdk.html) to send SES sendRawEmail.
Which worked perfectly fine for the SMTP credentials which I created like 6 months ago,
But when I use the same code and replaced the newly created smtp credentials, I got the following error
.
.
The request signature we calculated does not match the signature you provided. Check your AWS Secret Access Key and signing method. Consult the service documentation for details.
The Canonical String for this request should have been
'POST
/
content-type:application/x-www-form-urlencoded; charset=utf-8
host:email.us-east-1.amazonaws.com...........................................
I believe old the credentials which I used was using SigV2 and with the new creds which uses SigV4 am getting the above error. please let me know if anyone know how to solve this or any link or blog which shows how to use sigv4 creds to send email using python..
Thanks in advance (edited)
Provides a code example for sending email using the AWS SDK for Python (Boto).

People in Multi-Account organizations (really, all of us at this point): how have you managed access to SSM Explorer and AWS Trusted Advisor reports for security/best-practice flagging? We find that running an org-wide trusted advisor report requires us to be in the Management/master account, and that’s extremely locked down. Can you delegate this stuff down to a sub-account?

Does anyone have a recommended way of monitoring SAML cert expiry in AWS?

maybe you can use a cloudwatch syntetic?
2021-02-24

Can someone confirm that this will be possible?
VPC A <---peering---> VPC B
(10.10.0.0/16) (172.31.254.0/24)
Third party private CIDR: 10.115.0.0/24
VPC B routing and IPSec connection to third party private CIDR network VPC A route traffic to this third party private CIDR thru VPC B

This seems possible. No overlap of CIDRs. Did you try?

We did 1:1 NAT on VPC B side, because you cannot route traffic to different network CIDRs than the peered one

when we add 10.115.0.0/24 to VPC A route table to go thru peering nothing happened

I think thru peering you can push only traffic to peered VPC B CIDR

Oh that makes sense. Peering would forward traffic that it knows the other side is publishing. What I don’t fully understand is how the other side has a subnet with a CIDR that is not included in the VPC’s CIDR

There is custom side to side VPN solution.

Ahhh OK

Hi All,
we have a requirement where i have to use an S3 bucket as a debian repository(client access server for the repository),
options we found is
1. Use s3 bucket as a static website hosting, but the problem with this option is, it can be only used as http
2. Other option is to use Cloud Front, which we can be used for https
our problem is with both options the s3 bucket will be public, we dont want everyone to access our repository.
we tried with api gateway mutual TLS(https://aws.amazon.com/blogs/compute/introducing-mutual-tls-authentication-for-amazon-api-gateway/) but it is not working for us, also we followed this link (https://www.rapyder.com/blogs/static-website-authentication-using-lambda/#<i class="em em-~"</i>text=Configuring%20Cloudfront%20for%20S3%20website,click%20on%20Create%20new%20Identity>) here authentication is heppening only through browser, we need something like CLI auth.
Is there any way or any method to add authentication for the debian repository or s3 bucket. so that only authorized system can download the package.
Thanks in advance

Mutual TLS (mTLS) for API Gateway is generally available today at no additional cost. It’s available in all AWS commercial Regions, AWS GovCloud (US) Regions, and China Regions. It supports configuration via the API Gateway console, AWS CLI, SDKs, and AWS CloudFormation.

Some websites require basic common authentication to protect private data. If website is running on the server, its not much difficult to add authentication. But when it comes to serverless like S3, creating an authentication

2. Other option is to use Cloud Front, which we can be used for https
our problem is with both options the s3 bucket will be public, we dont want everyone to access our repository.
You can configure a S3 bucket without public access to be the origin for a CloudFront distribution, using a “Cloudfront access origin identity”. It sounds like that is what you want to do.

Mutual TLS (mTLS) for API Gateway is generally available today at no additional cost. It’s available in all AWS commercial Regions, AWS GovCloud (US) Regions, and China Regions. It supports configuration via the API Gateway console, AWS CLI, SDKs, and AWS CloudFormation.

Some websites require basic common authentication to protect private data. If website is running on the server, its not much difficult to add authentication. But when it comes to serverless like S3, creating an authentication

You could also use a service like https://cloudsmith.com and we’ll take care of it for you. Including handling authentication, and things like per-customer access keys for private distribution. See the cloudsmith channel for some more information (we power hosting for Cloud Posse behind the scenes). Happy to help with questions.
Cloudsmith provides public and private repository hosting, for ultra-fast and secure delivery of your packages. FREE 14 day Trial. Sign up Today


thanks guys

Ya, we decided to stop managing our own package repositories at cloudposse and moved to cloudsmith

Gonna chime in and say this is something you don’t want to roll on your own if it’s not your core business. Too much work . Use a service or at least something flexible.
Ran across this as a random resource that’s github driven. Not saying use it… But just sharing as I found it interesting.

Use GitHub releases as a Debian/Ubuntu apt repository - rpatterson/github-apt-repos

Does anyone know if you can put a Cloudfront in front of Cognito yourself to raise the minimum tls to 1.2 ? AWS Support says to use an ALB but that doesn’t really go with me.. Thoughts ?

If you want to make it available via custom domain you can migrate to fips endpoint which by the end of the month will support only tls1.2+ (https://aws.amazon.com/security/security-bulletins/AWS-2020-001/). Bear in mind that the standard endpoint will still be available as you can’t disable it and it will expose tls/1.0 and tls/1.1 on the cognito domain.
AFAIR you can put your own cloudfront distribution (with no caching) in front of cognito. The default custom domain setup is actually a cloudfront distribution but it’s not exposing tls configuration.

FIPS is only for US + Canada

Trying to run trusted advisor recommendations from CLI. is there a way to get trusted-advisor-checks against a single resource (ie. an RDS instance)? I seem to be able to pull all trust-advisor-recommendations for the account but theres no filter for resource-id (or ARN).
aws support describe-trusted-advisor-checks \
--profile ${account_profile} \
--language en \
| jq -r ".checks[] | \"$account_profile,\(.id),\(.name),\(.category),\(.metadata)\" "
## sample output:
account123456,nNauJivDiT,Amazon RDS Security Group Access Risk,security,["Region","RDS Security Group Name","Ingress Rule","Status","Reason"]
^^^ there’s a check-id
but nothing else tying the check to a resource.

Did you manage to resolve this ?

My call with aws support seemed to find that there’s no way to find trusted-advisor-recommendations per resource, only by the trusted-advisor-check. Ultimately, I’m trying to setup a database health/scorecard. So presumably, each DB would have either all green TA checks (100% happy), or some yellow/red results against it (meaning <100%, or lower score).
2021-02-25
2021-02-27
2021-02-28

What is the latest greatest ECS blueGreen deployments in AWS? I have used CodeDeploy with multiple ASGs before and looked at App Mesh too but I wonder what is the easiest these days? ( weighted TGs?)

weighted TGs work well but I haven’t done it specifically with ECS. Also curious as i’ll be exploring this in the next month and would like a solution without codedeploy as well

I imagine you’d want to just create a new taskset with your staged code and gradually shift over using weighted TGs. https://docs.aws.amazon.com/AmazonECS/latest/developerguide/service-create-loadbalancer-bluegreen.html