SweetOps #aws for February, 2024

vpc_id = “vpc-000xxxxxx” subnet = “subnet-xxxxxx” instance_type = “t2.medium” ssh_key_pair = “xxxx” region = “ap-south-1” associate_public_ip_address = true availability_zone = “ap-south-1a” security_groups = [ “sg-vvvvvvvv” ] monitoring = false

main.tf

module “instance” { source = “cloudposse/ec2-instance/aws” # Cloud Posse recommends pinning every module to a specific version # version = “x.x.x” ssh_key_pair = var.ssh_key_pair instance_type = var.instance_type vpc_id = var.vpc_id security_groups = var.security_groups subnet = var.subnet name = “ec2” namespace = “eg” stage = “dev” monitoring = var.monitoring }

datasources

data “aws_ami” “amzn_linux” { most_recent = true owners = [“amazon”] filter { name = “name” values = [“amzn2-ami-kernel-*”] } filter { name = “root-device-type” values = [“ebs”] } filter { name = “virtualization-type” values = [“hvm”] } filter { name = “architecture” values = [“x86_64”] } }

Joe Niland

06:40:02 AM

The AMI value is not set for module "instance"

Mahesh

07:42:33 AM

I tried with adding ami id and data sources. module “instance” { source = “cloudposse/ec2-instance/aws” # Cloud Posse recommends pinning every module to a specific version # version = “x.x.x” ssh_key_pair = var.ssh_key_pair instance_type = var.instance_type vpc_id = var.vpc_id security_groups = var.security_groups #ami = “ami-0d63de463e6604d0a” ami = data.aws_ami.amzn_linux.id subnet = var.subnet name = “ec2” namespace = “eg” stage = “dev” monitoring = var.monitoring }

Mahesh

07:43:36 AM

module.instance.aws_instance.default[0]: Creating… 2024-02-05T1311.749+0530 [ERROR] provider.terraform-provider-aws_v5.34.0_x5.exe: Response contains error diagnostic: tf_proto_version=5.4 tf_req_id=2be98416-ed4b-a900-b440-414c8c681655 tf_rpc=ApplyResourceChange diagnostic_severity=ERROR diagnostic_summary=”creating EC2 Instance: Unsupported: The requested configuration is currently not supported. Please check the documentation for supported configurations. status code: 400, request id: e8da9f9e-5ed7-4c8a-85c9-6c7be60a129a” diagnostic_detail= tf_provider_addr=registry.terraform.io/hashicorp/aws tf_resource_type=aws_instance @caller=github.com/hashicorp/[email protected]/tfprotov5/internal/diag/diagnostics.go//github.com/hashicorp/[email protected]/tfprotov5/internal/diag/diagnostics.go:62) @module=sdk.proto timestamp=2024-02-05T1311.749+0530 2024-02-05T1311.753+0530 [ERROR] vertex “module.instance.aws_instance.default[0]” error: creating EC2 Instance: Unsupported: The requested configuration is currently not supported. Please check the documentation for supported configurations. status code: 400, request id: e8da9f9e-5ed7-4c8a-85c9-6c7be60a129a ╷ │ Error: creating EC2 Instance: Unsupported: The requested configuration is currently not supported. Please check the documentation for supported configurations. │ status code: 400, request id: e8da9f9e-5ed7-4c8a-85c9-6c7be60a129a │ │ with module.instance.aws_instance.default[0], │ on .terraform\modules\instance[main.tf line 104, in resource “aws_instance” “default”: │ 104: resource “aws_instance” “default” {

Joe Niland

08:03:28 AM

if you run TF_LOG=trace terraform apply you may get more info from the 400 error

Mahesh

03:59:20 PM

│ Error: creating EC2 Instance: Unsupported: The requested configuration is currently not supported. Please check the documentation for supported configurations. │ status code: 400, request id: d5719aa4-c001-4de5-a78f-319672e5cbeb │ │ with module.instance.aws_instance.default[0], │ on .terraform\modules\instance[main.tf](http://main.tf) line 104, in resource “aws_instance” “default”: │ 104: resource “aws_instance” “default” { │ ╵ 2024-02-05T1640.739+0530 [DEBUG] provider.stdio: received EOF, stopping recv loop: err=”rpc error: code = Unavailable desc = error reading from server: EOF” 2024-02-05T1640.796+0530 [DEBUG] provider: plugin process exited: path=.terraform/providers/registry.terraform.io/hashicorp/aws/5.34.0/windows_386/terraform-provider-aws_v5.34.0_x5.exe pid=22176 2024-02-05T1640.797+0530 [DEBUG] provider: plugin exited

Mahesh

03:59:30 PM

I don’t see much information with trace

Mahesh

06:01:28 AM

After disabling ebs optimized, issue resolved.

Joe Niland

06:58:20 AM

Ah yes it is on by default and you are using t2

Joe Niland

06:58:33 AM

How did you determine it was the issue?

Mahesh

04:29:25 PM

carefully gone through logs from trace :)

Joe Niland

05:03:08 AM

Ok good to know!

2024-02-04

2024-02-05

2024-02-07

AdamP

02:11:38 PM

Lets say I have a private RDS instance, and a 3rd party needs to be “whitelisted” to access said RDS instance. Without privatelink or any site to site vpns, I think the best solution would be to have a Network Load Balancer in a public subnet, that routes allowed traffic to the private subnets allowing the 3rd party to hit the private RDS instance? I think thats the simplest solution, assuming the small handful of 3rd parties aren’t AWS customers and we cant do any cross account roles are anything like that. Thoughts?

Darren Cunningham

02:20:18 PM

how I’ve solved this in the past is to put a jumphost in the private subnets because the RDS was in an isolated subnet, then restricted access to the bastion from a known IPs list. this comes with maintenance overhead due to routine patching and audit logging requirements, but when you have limited options it was the best I could come up with.

Andrey Taranik

04:22:54 PM

https://aws.amazon.com/rds/proxy/

Scalable Applications - Amazon RDS Proxy - AWS

Amazon RDS Proxy improves database efficiency and application scalability by allowing applications to pool and share connections established with the database.

AdamP

04:30:02 PM

There will be proxies involved for sure, however that doesn’t solve the external access aspect, since

“RDS Proxy can be used only within a VPC, and can’t be publicly accessible (although the DB instance can be). If you connect from outside a private network, then your connection times out” • https://repost.aws/knowledge-center/rds-proxy-connection-issues# Elaborating my scenario & my thoughts… it will likely be an NLB in a public subnet, that routes allowed traffic to the private subnet where the RDS proxy lives, which then connects to the Private RDS instance. I think thats the only option, besides a jump host but the NLB aspect seems easier. We’ll see I’m going to spin up some things in my sandbox VPC and test things out

Darren Cunningham

04:35:22 PM

NLB -> RDS will be easier if your RDS is in a subnet that can be routed to from the public subnet (though that’s a fairly weak network security posture). if you do that, you should make sure to lock down the security group to only known IPs. probably best to even have a SG per external client group too. ideally each rule would have a clear description too.

AdamP

04:36:35 PM

edit: Sorry I misread your recent message

Darren Cunningham

04:37:42 PM

by fairly weak, I mean all that needs to happen is one person do the wrong thing to the security group (like drop a 0.0.0.0 on it while testing) and now you have an RDS exposed to the world

AdamP

04:38:03 PM

totally, I’ve seen that before at previous jobs LOL

Darren Cunningham

04:38:05 PM

but necessary evils sometimes

Darren Cunningham

04:41:02 PM

oh I’ve seen orgs put their RDS in public subnets because they wanted it to be easy for data engineers to connect third party services back to them

Darren Cunningham

04:41:17 PM

I screamed and ran for the hills

AdamP

04:41:48 PM

that is pretty much the same scenario I’m dealing with

Darren Cunningham

04:43:58 PM

I didn’t actually run, I spelled out the security implications clearly and documented the sign-offs from the higher ups on the project that they owned the risk.

AdamP

04:44:59 PM

I like that risk ownership aspect a lot, I will keep that in my playbook for future things like this for sure

Andrey Taranik

05:08:01 PM

JFYI: https://aws.amazon.com/blogs/networking-and-content-delivery/hostname-as-target-for-network-load-balancers/

Joe Perez

05:09:28 PM

I’ve used that for ALB targets, but I don’t believe it supports RDS endpoints

Joe Perez

05:10:38 PM

I’ve also ran into the “data team needs to connect DBs to FiveTran” and the privatelink support is a $40k upsell

AdamP

05:11:46 PM

wow! what an upsell. DBT wanted 20k for enterprise, so I could connect DBT to OKTA for SAML No thanks

Joe Perez

05:12:43 PM

Yeah, not too happy that baseline security stuff is paywalled

michaeljaweed

03:08:57 AM

Hi everyone, I have hopefully an easy question.
I’m using some python modules that are not natively available on AWS lambda. I’m seeing this error message: { "errorMessage": "Unable to import module '[transcribe-job.app](http://transcribe-job.app)': No module named 'pusher'", "errorType": "Runtime.ImportModuleError", "requestId": "0218478d-f56c-4b59-89f7-15d43e26a665", "stackTrace": [] }

I’m wondering how I can fix this issue. I currently am using lambda layers that has this already prepackaged. I’m doing a number of imports but the only one it seems to be complaining about is pusher. I’m wondering has anyone else experienced this and what would were your solutions to it. I’m hoping to solve this in lambda layers because I don’t care to export the app.zip as a pay load. I understand pusher is probably not installed hence why it’s throwing this error. Hence why I packaged this under the lambda layer. When I inspect the lambda layer, I see pusher is there

import boto3
import botocore
import botocore.session
import json
import os
import pusher
import pymysql.cursors
import webvtt
from aws_secretsmanager_caching import SecretCache, SecretCacheConfig
from botocore.exceptions import ClientError,NoCredentialsError
from dotenv import load_dotenv
from flashtext import KeywordProcessor
from io import StringIO
from urllib.parse import urlparse

This is also using sls for more context

service: lambda-transcribe

provider:
  name: aws
  runtime: python3.10
  region: us-west-2
  stage: ${opt:stage, 'pg'}

functions:
  proleagueLambdaTranscribe:
    environment: ${file(env.${opt:stage, self:provider.stage}.json)}
    handler: transcribe.app.lambda_handler
    name: ${self:provider.stage}-proleague-lambda-transcribe
    layers:
arn:aws:lambda:${self:provider.region}:${aws:accountId}:layer:pythonAppDependencies:${opt:layerVersion, '4'}
    events:
eventBridge:
          eventBus: default
          pattern:
            source:
"aws.transcribe"
            detail-type:
"Transcribe Job State Change"
            detail:
              TranscriptionJobStatus:
"COMPLETED"

michaeljaweed

09:08:50 PM

Figured it out after 2 days finally. It was an issue with how lambda unpacks the layers. It expects it in python/blah I just zipped it up as such and it works

2024-02-09

Balazs Varga

11:36:45 AM

do I need to reboot writers when I change acu ? I still see I reached the max connection, but I increased the max capacity of my serverless v2 instance

E-Love

03:00:02 PM

Anyone else get bitten by the new charge for all public IPv4 IPs as of Feb 1? Turns out running a SaaS with a single tenant architecture on EKS (one namespace per customer) with the AWS LB controller (without using IngressGroups) is a recipe for a ton of public IPs (number of AZs times number of ALBs).

E-Love

03:04:15 PM

Our quick fix is to go with IngressGroups but I’m curious if others have made the IPv6 migration or are using other techniques to reduce the number of IPv4 IPs they’re using.

We use Cloudflare as our WAF so we’re also thinking of standing up Cloudflare Tunnels for all ingress (currently using security groups to restrict Ingress to Cloudflare IPs).

Cloudflare Tunnel · Cloudflare Zero Trust docs attachment image

Cloudflare Tunnel provides you with a secure way to connect your resources to Cloudflare without a publicly routable IP address. With Tunnel, you do …

Dan Miller (Cloud Posse)

06:00:47 PM

We host our EKS clusters in a private network then use ALB controller for ingress. So we’ve definitely been hit with running out of IP space, but not the the charge for public IPs

E-Love

06:33:07 PM

yeah, that’s how we do it too, it was just a surprise that if you don’t use the [alb.ingress.kubernetes.io/group.name](http://alb.ingress.kubernetes.io/group.name) annotation, the controller allocates a new ALB per Ingress (not terribly surprising in hindsight) and since we have at least one Ingress per namespace, that ballooned our IP usage fast!

Dan Miller (Cloud Posse)

07:32:41 PM

oh I see. yeah that group.name annotation has got me before too, but we use the same ALB for many Ingresses across namespaces. So our use-case isnt really the same

Erik Osterman (Cloud Posse)

04:09:49 PM

@E-Love here’s our component that we use for that https://github.com/cloudposse/terraform-aws-components/tree/main/modules/eks/alb-controller-ingress-group

2024-02-11

2024-02-13

Matt Gowie

10:19:55 PM

Hey folks – We typically enforce S3 encryption on our buckets via the below policies / via the allow_ssl_requests_only + allow_encrypted_uploads_only flags in the cloudposse/s3-bucket/aws module.

Policies:

		{
			"Sid": "DenyIncorrectEncryptionHeader",
			"Effect": "Deny",
			"Principal": "*",
			"Action": "s3:PutObject",
			"Resource": "arn:aws:s3:::allma-ue1-production-rag-system-text/*",
			"Condition": {
				"StringNotEquals": {
					"s3:x-amz-server-side-encryption": "AES256"
				}
			}
		},
		{
			"Sid": "DenyUnEncryptedObjectUploads",
			"Effect": "Deny",
			"Principal": "*",
			"Action": "s3:PutObject",
			"Resource": "arn:aws:s3:::allma-ue1-production-rag-system-text/*",
			"Condition": {
				"Null": {
					"s3:x-amz-server-side-encryption": "true"
				}
			}
		},

I am just seeing this update in the AWS Using server-side encryption with Amazon S3 docs:
Amazon S3 now applies server-side encryption with Amazon S3 managed keys (SSE-S3) as the base level of encryption for every bucket in Amazon S3. Starting January 5, 2023, all new object uploads to Amazon S3 are automatically encrypted at no additional cost and with no impact on performance. The automatic encryption status for S3 bucket default encryption configuration and for new object uploads is available in AWS CloudTrail logs, S3 Inventory, S3 Storage Lens, the Amazon S3 console, and as an additional Amazon S3 API response header in the AWS Command Line Interface and AWS SDKs. For more information, see Default encryption FAQ. This seems to say to me… that we don’t need to enforce those policies any longer as all objects will get applied that same AES256 encryption regardless.

Is that correct? Anyone else gone down that rabbit hole before?

Using server-side encryption with Amazon S3 managed keys (SSE-S3) - Amazon Simple Storage Service

With server-side encryption, Amazon S3 manages encryption and decryption for you.

jose.amengual

11:18:49 PM

yes that is what it means

Using server-side encryption with Amazon S3 managed keys (SSE-S3) - Amazon Simple Storage Service

With server-side encryption, Amazon S3 manages encryption and decryption for you.

jose.amengual

11:19:14 PM

I used to do KMS before this existed and then we switched to this which is FAR easier

Matt Gowie

04:48:25 PM

Matt Gowie

04:48:41 PM

Thoughts on removing those flags from the module? I may upstream that change.

Igor Rodionov

05:11:10 PM

@Matt Gowie Yeah, your PR is more than welcome. Message me that PR link and I will handle it to get merged. Thanks

Matt Gowie

05:34:23 PM

Will do.

2024-02-14

Corky

10:13:27 PM

Hey y’all,

Right now, we’re thinking about the best way to manage RDS SQL Server logins. Does anyone have preferred methods for managing something like that declaratively? We were considering doing this in Terraform since logins are per-instance and could live with the db instance creation, but there would be a lot of moving pieces to be able to connect to a non-publicly-available RDS instance.

Joe Perez

10:42:06 PM

I don’t use SQL server , but I imagine you can use the same process. You can set up a “jumpbox” which uses AWS SSM session manager to remote into the VPC without exposing it publicly. Then you can use ssm port forwarding from the jumpbox to the RDS instance (with the proper security groups configured)

Joe Perez

10:43:38 PM

So at least the portion of authentication happens via AWS SSO with MFA. Another solution to look into would be Teleport

Corky

01:21:39 PM

Thanks Joe!

We’ve got a TCP proxy pod in an EKS cluster (where the application that connects to the DB lives) that we can use. We were able to use a Terraform module to kubectl port-forward to the pod, and a provider to create SQL logins, but it kind of feels like a round hole square peg kind of situation. I didn’t know if this was a common/accepted approach or if there might be a better pattern.

Corky

01:48:07 PM

Also, our pipeline runs a separate Terraform plan & deploy from the output, so we’d need the tunnel to be open during both plan and apply. This is turning out to be a bit tricky due to the semantics around whether or not to execute a data source, e.g. if state shows that the data source will remain the same, it will not be executed.

2024-02-15

Hans D

12:33:36 AM

https://aws.amazon.com/about-aws/whats-new/2024/02/aws-control-tower-apis-register-organizational-units/

2024-02-16

Sean Turner

06:22:39 PM

Faced with an interesting engineering problem.

We work with a lot of GIS data which geologists interact with via qgis, a desktop client for interacting with GIS layers. Architecturally this is a postgresql RDS Instance in AWS. Geologists are geographically distributed and therefore suffer latency issues due to interacting with the RDS Instance in Oregon from Africa or Australia. Access patterns involve frequent reads and less than moderate writes. Generally Geologists wouldn’t be interacting with the same project(s) and general conflict resolution should be sufficient otherwise.

pgActive (RDS Postgresql plugin for an active-active or multi-writer replication configuration) and Aurora Global DB for PostgreSQL (write forwarding) both don’t replicate DDL Statements (e.g. CREATE) which takes these solutions out of the running (qgis uses CREATE frequently)

We’re looking at pgEdge which seems to be a young startup that can replicate DDL statements and looks to be very compelling, but wanted to see if there are any other vendors, or, if anyone else has done some serious thinking about these problems and has insight.

Cheers!

pgEdge Fully Distributed PostgreSQL

Distributed Postgres optimized for the network edge for low latency and ultra-high availability.

Jeremy G (Cloud Posse)

10:08:53 PM

I have not done it myself, but you probably want to deploy middeware such as PgBouncer, Pgpool-II, or HAProxy along with RDS read replicas in each of the regions where you have users. Have the users connect to the middleware near them.

pgEdge Fully Distributed PostgreSQL

Distributed Postgres optimized for the network edge for low latency and ultra-high availability.

Sean Turner

10:11:43 PM

Our number of users are not substantial enough to require a connection pooler fortunately. Also, can’t use read replicas because writes are a part of the access pattern

Jeremy G (Cloud Posse)

11:50:27 PM

The point of the proxy, in your use case, is not connection pooling, it is allowing reads to be served by the local read replica, while forwarding writes and DDL to the distant master.

Jack Langston

11:53:14 PM

Can you provide more detail about your access pattern?

The latency from Australia to Oregon is approximately 150ms.

If you have multiple sequential reads in order to render the UX, that can create a multiple second delay that might be the primary source of complaints. Using read replicas as Jeremy suggested would resolve that issue.

Writes are usually not done sequentially so 150ms to talk with the master seems very reasonable.

Active-active (or edge postgres) won’t speed up writes without giving up consistency guarantees that could leave your data in a borked state. If you look at the docs for pgEdge, you can see that they don’t give up the consistency guarantee; it propagates the writes to all of the masters, which will likely make it slower than your current setup.

Overall, active-active is significantly more operational overhead to maintain than read replicas, and this isn’t really the primary use case for using it (primary use is fault tolerance). Running a connection pooler is straightforward compared to an active-active setup.

Jack Langston

12:15:33 AM

I misspoke. After looking at pgEdge in more detail (personal curiosity), it looks like they do async write replication via a conflict resolution algorithm documented here.

Ultimately, if you went that route, you are guaranteed only eventual consistency (different databases might have different data) coupled with reduced durability (committed writes can be dropped if they cause conflicts). Those are two important changes from your current operating model that you should ensure the underlying application can handle before you choose that path.

Jeremy G (Cloud Posse)

12:22:10 AM

Using active-active masters is an enormous headache and IMHO not nearly worth it for performance improvements in an internal tool that is primarily reading. Read-replicas plus a smart proxy is a much better solution in this case. I think the proxies specifically designed for this kind of work will even take care of things like preventing reads of stale data immediately after a write.

At the very least, I recommend you give it a try before trying any kind of multi-master solution.

Sean Turner

03:21:29 PM

The point of the proxy, in your use case, is not connection pooling, it is allowing reads to be served by the local read replica, while forwarding writes and DDL to the distant master Ah okay that’s quite cool! I thought that sort of pattern would only be possible on the application (qgis) side if say qgis expose different connection types (read / write).

This is potentially super interesting because I would much rather run AWS Read Replicas. We didn’t think this would work in the past because the workload involved writes also.
Ultimately, if you went that route, you are guaranteed only eventual consistency (different databases might have different data) coupled with reduced durability (committed writes can be dropped if they cause conflicts). Those are two important changes from your current operating model that you should ensure the underlying application can handle before you choose that path. Yeah. And this should be within the acceptable parameters for us at the moment. Our company really only has internal users so there’s really only 60 or so daily users making reads and a smaller portion of those users making writes.

Sean Turner

03:21:34 PM

Thanks all

Sean Turner

03:23:54 PM

How would the connection pooler know how to differentiate reads and writes? Does it look at the underlying SQL Query?

Sean Turner

03:41:57 PM

I’m potentially misinterpreting things in that the connection poolers likely are unable to automatically route reads to replicas and writes to the main node, but, this has given me another idea where we configure different connections in qgis based on if the workflow is read-only or not.

Jeremy G (Cloud Posse)

06:07:09 PM

These are more proxies than connection poolers, and they route requests based on the underlying SQL command.

2024-02-19

2024-02-21

jonjitsu

12:58:33 PM

Has anyone created any serverless api gateway + lambda system that was not exposed to the internet but only access from an on premise system through some dedicated connection like private connect? Looking at this https://aws.amazon.com/blogs/compute/integrating-amazon-api-gateway-private-endpoints-with-on-premises-networks/ it seems to be possible. I would have a VPC purely for connecting on-prem to AWS and allowing it to access the API gateway endpoint.

Integrating Amazon API Gateway private endpoints with on-premises networks | Amazon Web Services attachment image

This post was written by Ahmed ElHaw, Sr. Solutions Architect Using AWS Direct Connect or AWS Site-to-Site VPN, customers can establish a private virtual interface from their on-premises network directly to their Amazon Virtual Private Cloud (VPC). Hybrid networking enables customers to benefit from the scalability, elasticity, and ease of use of AWS services while […]

Brian

01:11:59 PM

Yes. I have in the past. As the title of the blog hints, it requires private VPC endpoints (aka PrivateLink endpoints).

Integrating Amazon API Gateway private endpoints with on-premises networks | Amazon Web Services attachment image