#aws (2023-06)
Discussion related to Amazon Web Services (AWS)
Discussion related to Amazon Web Services (AWS)
Archive: https://archive.sweetops.com/aws/
2023-06-01
“A pull through cache is a way to cache images you use from an upstream repository. Container images are copied and kept up-to-date without giving you a direct dependency on the external registry. If the upstream registry or container image becomes unavailable, then your cached copy can still be used.” https://aws.amazon.com/blogs/containers/announcing-pull-through-cache-for-registry-k8s-io-in-amazon-elastic-container-registry/
Introduction Container images are stored in registries and pulled into environments where they run. There are many different types of registries from private, self-run registries to public, unauthenticated registries. The registry you use is a direct dependency that can have an impact on how fast you can scale, the security of the software you run, […]
2023-06-05
Hello! Im trying to create a 301 redirect with s3 and cloudfront. The main idea is to redirect traffic from support.example.com to our atlassian service desk portal url. Cloudfronts behaviour is to redirect http from https and theres a valid certificate from acm added to cloudfront for *.example.com, however, i get this error from chrome when trying to test the redirect: NET::ERR_CERT_COMMON_NAME_INVALID. Any ideas what could be wrong?
Check the name of the bucket. IIRC, the name of the bucket has to match the domain name pretty much exactly. Its been a while and I haven’t looked at the docs yet but that’s one thing to consider.
For those who are using AWS Identity Center (SSO)
• Are you using primarily permission sets for cross-account access?
• Are you using primarily self managed roles for cross-account access? Why do you use one over the other?
I hope I understand the question correctly, but I use permission-sets for cross account access that is focused around users and groups. When it comes to services, and automation I use self-managed rules.
With permissions sets, it’s primarily a mix of inline-policies and AWS managed policies. So far haven’t really used the customer managed policies option of permission sets.
Yes, that’s what I am interested in. Do you run EKS clusters?
I do not, the org I’m working in is primarily ECS.
With permission sets, AWS sets up those SSO roles with random numbers in the ARN’s across accounts. Do you use those anywhere to configure access or allow something else to assume the role?
We use sso roles with random numbers in terraform plans.The random numbers are appeneded to the role names so role names can still be used applying some basic regex
2023-06-08
This message was deleted.
2023-06-09
Hi. Want to import existing elasticache cluster into terraform. Not sure what is it about the replication group? Should I create a replication group and import cluster into it or what?
Your best bet is to get the specific spec using describe_replication_group and then write the terraform resource to match. Then import it, yes. Do a terraform plan and if you note any deviations, then fix the resource definition to match.
Thanks @jimplet me do that
I described the cluster and got two clusters. Then I was trying to import both individually. But they did not succeed at terraform plan
stage.
I’ll now follow what you have suggested.
2023-06-12
Guys, how can I redeploy tasks after i pushed another Docker image to ECR?
When i press “Update service” and tick “force deploy”, it does not do anything.
So tired of going manually through deleting a service -> waiting until it’s finished the deletion (cause otherwise my tasks fail, probably due to the quota limit, so they have to finish deleting first) -> creating another service from scratch…
Are you updating the Task Definition before pressing “update service”?
to reference the new docker image
@Nat Williams no, i don’t. But i have set some tag like …[us-east-1.amazonaws.com/test-classifier-fargate//us-east-1.amazonaws.com/test-classifier-fargate:arm-fp16)
the tag is always the same
hmm. It sounds like the default ECS behaviour should be to check the repo for a newer version every time
you’re on ECS, right?
I just sort of assumed that from the “Update service” and “force deploy” verbiage
@Nat Williams ECS Fargate
normally need to use a different image tag
Yeah, ideally I guess you’d be updating the task definition with a specific version each time
@Hao Wang why? :(
Is there any way to use the same without having to set up awful CodeDeploy?
Sure, just create a new revision of the Task Definition and updating the service to use it
@Nat Williams so it’s either i set up CodeDeploy or manually create another task revision? (or maybe via CLI somehow so it just duplicates it)
Using different image tags is a best practice especially in prod env
I mean, you’re already manually forcing the deploy, so it’s not that big a change
@Hao Wang but i’d still need to update the task definition, right?
yeah, if using a different tag
AWS support would know more
have run into some other issues and finally AWS support worked them out, it is a black box for us
ec2 hosted ecs might use the locally cached image of that tag to avoid the network cost. Fargate would pull fresh on a redeploy of the same tag. You might be able to disable image caching if using ec2, dunno how.
@Michael Galey not using ECS + EC2, but ECS + Fargate
then it’d work. Your ecr repo might have tag immutability checked? if so, your 2nd push wouldn’t overwrite the first one. Or theres something else that’s more user-side, if your force deployed tasks failed to pass health checks for instance, so checked the date of the task start, and view the logs of any recently stopped instances
but in general i’d highly recommend using unique ids for deploys to solve possible confusion issues like this
you could also pull the image locally and do docker inspect or connect to it to see if your recent change is there
@Michael Galey could you please tell me where i can find that immutability tag? :)
and how to use those unique ids :(
how do you build images? codepipeline?
@Michael Galey nope. I build locally, push locally and then Fargate takes my image from ECR and deploys
add something like this to deploy script ? it assumes code is committed
- COMMIT_HASH=`git rev-parse --short HEAD`
- IMAGE_TAG_VER=v-${COMMIT_HASH:=latest}
- docker build ... -t <repo url>:$IMAGE_TAG_VER
- docker push
- <deploy command> <repo_url>:$IMAGE_TAG_VER
@Michael Galey without <deploy command> <repo_url>:$IMAGE_TAG_VER
have no idea what the deploy command should be
whatever it is now, you’d just be looking for a parameter for the image, i haven’t used this stuff but quick googles show things like https://github.com/awslabs/fargatecli
CLI for AWS Fargate
@Michael Galey but if it doesn’t work in UI, how can it even work in CLI and even some third-party apps?
I tried deploying with AWS cli. Does not work
Something like
aws ecs update-service --cluster ${{ inputs.ecs_cluster_name }} --service ${{ inputs.ecs_service_name }} --force-new-deployment
did you check the actual tasks are successfully starting?
@Michael Galey they are up and running now
“Fargate does not cache images, and therefore the whole image is pulled from the registry when a task runs.”
everything works, LB handles requests to them just fine
did you inspect the latest image?
@Michael Galey locally?
pull the image from ecr, and see if your intended change is in there
and see if theres a date via docker inspect
it’s there
you’d have to clear your local cache
and the code isn’t working in the deployed version?
code is working. AWS is not redeploying on “force redeploy”
do the tasks start date line up with the redeploy command?
@Michael Galey Nope. Old date. They do not redeploy
oh ok, not a cache thing then, not sure
command looks right to me
@Michael Galey which command?
aws ecs update-service --cluster <<cluster-name>> --service <<service-name>> --force-new-deployment --region <<region>>
@Michael Galey i would rather make sure it works in UI, then try to get the command working
AWS support is horrible
support: All restrictions on your account have been lifted. me: What were the restrictions? support: https://i.imgur.com/TuC0If7.jpg me: You said “All restrictions on your account have been lifted.”. So what were the restrictions? support: https://i.imgur.com/imFbdjL.jpg
support: “I understand”
are they on crack?
oh it is common to have such restrictions, AWS should have sent some email to root account
@Hao Wang but they’re refusing to tell me the restrictions…
hard to know the details for this case which is not related to security
Any thoughts/opinions on orgformation as an alternative to Control Tower/Landing Zones?
great to know this project, just took a look, it is CFN wrapper with JS/TS
This component is responsible for provisioning the full account hierarchy along with Organizational Units (OUs). It includes the ability to associate Service Control Policies (SCPs) to the Organization, each Organizational Unit and account.
2023-06-13
Yup. We are all waiting around and can’t do anything about it.
us-east with lambdas api gateway both being down gonna be a bad time for a large swath of aws
cloud formation and lambda are affected I wonder if this only impacts provisioning managed services
Realtime overview of issues and outages with all kinds of services. Having issues? We help you find out what is wrong.
2023-06-14
Has anyone implement a TCP/UDP proxy for instances in AWS? pure forwarding of ports to different instances like port:1 to instance:1 , port2: instance2 etc ? I wonder if there is a container with nginx or something else that have this built in to make it easier instead of cooking my on image. I could use NLBs but NLBs are layer4 and do not support SGs so once is public then I need to use NACLs to close the access and I will like to avoid that
I use socat to make some private things available to different clouds and it’s worked well so far, been running 6 months or so with no issues.
mmm I think I have the same issue than nginx container, that I can’t pass multiple lines to the entypoint to create multiple configurations
use NLB. NACL management is less than management of an entire proxy
that is true
2023-06-16
Is there any easy way to launch a Docker container on AWS from ECR without a complex cluster + task + service setup on ECS?
If there is such a complex setup for just playing around with one server, there is no point in ECR – better set up manually EC2…
Use the AWS App Runner service to go directly from an existing container image or a source code repository to a running web service in the AWS Cloud with CI/CD capability.
You can also deploy docker images to lambda
Take a look at copilot, it still starts ecs cluster and other resources but in a simpler way.
You no longer need to provision by yourself.
Lambda? I don’t want to configure api gateway so I could just make requests
@tommy thanks
@Fizz thanks
Lambda can be accessed via http now
Add a dedicated HTTP(S) endpoint to your Lambda function using a function URL.
@Fizz it’s still not the best option. Lambda requires specific Docker requirements, whereas EC2 does not
@Fizz App Runner costs $0.064 per hour? It’s crazy. EC2 costs around $10 a month whereas this would cost almost $50.
@tommy
Copilot such an awful tool.
Takes a lot of time to deploy just one small Docker image.
It takes less time to deploy a Fargate task manually.
You could use AWS Lightsail for this. 7$ for the nano sized per month. (0,25 vCPU, 0,5GB RAM)
@Stoor awful tool as well. Don’t remember why tho
@Stoor but thank you very much
Awful tool? What do you mean? It’s doing exactly what it needs to be doing.
this elasticache module doesn’t seem to conveniently handle global clusters https://github.com/cloudposse/terraform-aws-elasticache-redis - is that correct?
Terraform module to provision an ElastiCache Redis Cluster
right now I’m using this module to create 1 cluster then bring in the global cluster resources with ordinary aws terraform resources
Terraform module to provision an ElastiCache Redis Cluster
hi John , my guess is people do not use much global clusters in Redis li it was in the rds module a while ago
but that can be implemented
PRs are welcome
Just making sure I’m going down the right path with using the module and resources for the rest. The secondary inherits most of the primary’s settings so it basically works.
I think taking the approach of the RDS module and do a for_each for the replication group , or different count logic and add
resource "aws_elasticache_global_replication_group" "example" {
global_replication_group_id_suffix = "example"
primary_replication_group_id = aws_elasticache_replication_group.primary.id
}
should be sufficient
Anybody has experience in AWS Lightsai?
Is there any way to make the launch script work? If i execute these commands in a shell script by running ./install.sh with these commands, i can then find that my packages are installed, whereas with this – when I ssh into an instance – there are no such packages…
How to select arm/x86 for a lightsail container service?
Do you know why does my Lightsail deployemnt fails?
No launch command, environment variables, or open ports specified.
It does have CMD in my Dockerfile
One of the worst development experience with Lightsail too
And why a deployment on Lightsail take such a lot of time? Crazy…
Deployment on Lightsail takes way more time than to Fargate…
Seeing this in Lightsail service deployment logs. Why? So for Fargate this Docker image is good and for Lightsail it’s not, correct?
It’s maybe because the image is built for ARM but it deploys on x86.
Amazing, what an awful UX, Amazon.
Lightsail tho
Blaming incorrect architecture on the platform is something though. Maybe try running a windows executable
2023-06-17
Yo. Please help me. I created a new task definition with .5 vCPU instead of .25 vCPU.
How can I update all the current tasks im a service to .5 vCPU?
I pressed “Update service” button, then checked “Force new deployment”, then pressed “Update” button, got “Service updated:” message and my tasks are still .25 vCPU. Why?
I checked “Deployments” tab and all the last deployment is “In progress” for eternity now…
It’s faster to remove a service and create a new service to update tasks… What the hell…
Guys, could you please recommend any Fargate autoscaling tutorials?
Don’t recommend AWS docs please.
I set it up and it does not work…
I’ve got many clients in the same situation, it is frustrating, let us calm down first I think the cause is some basic thing is ignored like the image architecture as Alex mentioned
Cause of what exactly? :)
cannot know the cause for now with the shared information, do you have source codes or doc?
@Hao Wang source code of what?)
source codes or instructions you followed
I was frustrated when I started using Docker at 2014, version 0.9
You are at the psychology turning point of learning tech. Don’t give up but try different platforms, AWS may not be a good fit for you
The other platform will have the similar issue since this seems not an AWS issue though
So back to the first point, which is some basic stuff was overseen… Do you have a writeup or source codes?
Please tell me how to test docker lambda locally.
I found this https://docs.aws.amazon.com/lambda/latest/dg/images-test.html
Ran a docker image like this docker run -p 9000:8080 ...
I got: 18 Jun 2023 01:34:24,542 [INFO] (rapid) exec '/var/runtime/bootstrap' (cwd=/var/task, handler=)
Went to http://127.0.0.1:9000/2023-06-18/functions/function/invocations , got nothing
figured out. You have to use that exact URL:
http://localhost:9000/2015-03-31/functions/function/invocations
Why does the hell local Lamda has different event
‘s structure from pushed function’s event?
Here is my Lanbda’s handler:
def handler(event, context):
image_url = event.get("image_url")
token = event.get("headers", {}).get("Authorization")
return {
"headers": {"Content-Type": "application/json"},
"statusCode": "status_code",
"body": image_url,
"token": token,
"event": event,
}
Here is my Python’s request:
import requests
payload = {
"image_url": 'https://...',
"return_version": True,
}
headers = {
"Authorization": "tes",
}
res = requests.post("https://...", json=payload, headers=headers)
print(res.json())
Here is what i get in Python from deployed Lambda invocation:
None
Here is what i get from local Lambda invocation:
{'headers': {'Content-Type': 'application/json'}, 'statusCode': 'status_code', 'body': 'https://...', 'token': None, 'event': {'image_url': 'https://...', 'return_version': True}}
Here is what i get when i press “Test” in AWS console: https://i.imgur.com/Vmuy01T.png
Why the hell it’s the same function, but 3 different outputs? How am i supposed to test it locally if it gives me the different result in the deployed state?
And what is even this None
???
woosha
I think you’re suffering from the same thing that I suffered from as I was getting started, slow down and approach the problem a little more pragmatically. If you’re attempting to implement something that has been done before then if it’s not working the problem is not the language/framework/platform, it’s your understanding thereof. That’s not a judgement statement, but an encouragement to slow down and breakdown the problem into digestible bits.
I’m not trying to be crude/discouraging but just trying to help break things down in a meaningful way. If the technology was truly as lacking or ass backwards as you’re making it out to be, then AWS wouldn’t be nearly as successful as they are.
@Darren Cunningham don’t you agree that making different environment in dev and prod is such a stupid thing of Lambda?
Is there any way to have faster Lambda responses? My lambda processes requests for too long as opposed to a regular EC2…
are you able to provide some more detail?
• “too long” — what’s the limit you want to stay under?
• what is the response time Lambda vs EC2?
• what runtime? ◦ what memory allocation are you giving it?
• what is the access pattern? meaning, how are you invoking the Lambda vs EC2 ◦ is this a local invocation or remote?
• what does the lambda do? aka, does it access resources within the VPC? if so, are you deploying it into the VPC?
Generally it’s not surprising that an EC2 is going to process a single or a few hundred requests faster than a Lambda. Lambda shines in it’s ability to “scale infinitely” (besides rate limits, etc).
EC2 is like owning a car, you can get it in and go wherever you want/whenever you want without waiting. But you own the maintenance thereof. Lambda is a taxi, it’s not always faster (sometimes is) but it’s worry free…outside of the cost and the more you know how to work the system, the more value you get out of it.
sticking with my car analogy, you can get a 1000 taxis to show up at your door step a lot easier, faster and cheaper than if you tried to rent 1000 cars for a night
@Darren Cunningham seems like i figured it. I had to increase the memory so it implicitly increases CPU and makes responses faster. But thank you (:
Generally i want to stay under 2s while not paying so much for 2 CPU lambda
Lambda responds in 2s on 2048 memory.
EC2 responds in under 1s on 1 vCPU. But only on pure flask multi-threaded without any WSGI thing like gunicorn (makes responses longer)
Runtime: public.ecr.aws/lambda/python:3.10-arm64
gave it 512 memory. On 2048 it’s faster, but i don’t need that much, i only need more CPU, not more memory
access pattern: Function URL
remote invocation
does not access resources. Makes AI CPU inference (yolov8 ONNX)
If your lambda is called infrequently, increased latency is result of slow lambda start-up, which is kinda expected for lambdas is general. You might want to play with provisioned concurrency and see if it helps for your case https://aws.amazon.com/blogs/compute/new-for-aws-lambda-predictable-start-up-times-with-provisioned-concurrency/
Since the launch of AWS Lambda five years ago, thousands of customers such as iRobot, Fender, and Expedia have experienced the benefits of the serverless operational model. Being able to spend less time on managing scaling and availability, builders are increasingly using serverless for more sophisticated workloads with more exacting latency requirements. As customers have […]
It’s still to expensive, but thanks
2023-06-18
Guys, could you please recommend any good course video or article on auto-scaling Fargate for beginners?
Everything i read and watch so far gives me me more questions that answers…
Isn’t it a case of updating the task definition to increase or decrease counts via CloudWatch alarms?
@james192 don’t know. Which counts?
Could you please recommend any good tutorials/courses on deploying Docker images to EKS and setting up auto-scaling?
Again, so tired of watching videos and reading articles which give me more questions than answers…
to be frank, stop learning AWS would be a good option for you for now. picking it up in a few months would be a good strategy
@Hao Wang why? I need to deploy my app to it now, not in a few months
2023-06-19
2023-06-20
Guys, do you know how to edit a cloudwatch alarm for EC2 auto-scaling?
Was unable to Google anything on it (maybe i Googled it wrong…)
I need it to scale up if CPU is over 70% for 30 seconds and scale down if CPU is under 70% for 30 seconds
To edit a CloudWatch alarm for EC2 auto-scaling, you can follow these steps:
-
Sign in to the AWS Management Console and open the Amazon CloudWatch service.
-
In the navigation pane, click on “Alarms” under the “CloudWatch” section.
-
Locate the alarm that is associated with your EC2 auto-scaling group and select it.
-
Click on the “Actions” dropdown menu and choose “Edit.”
-
In the “Create/Edit Alarm” wizard, you can modify the alarm configuration to match your requirements.
- Under the “Conditions” section, select the “Static” option for the “Threshold type.”
- For the “Whenever” condition, choose “Greater” and enter “70” in the text box.
- Set the “Period” to 30 seconds.
- Enable the “Consecutive periods” option and set it to “1.”
- Choose the appropriate “Statistic” (e.g., “Average” CPU utilization) and adjust the “Datapoints to alarm” if needed.
-
Under the “Actions” section, click on the “Add notification action” button if you want to receive notifications when the alarm state changes.
-
Optionally, you can configure auto-scaling actions when the alarm state is triggered.
- Click on the “Add Scaling Action” button.
- Choose the appropriate scaling policy for scaling up and scaling down.
- Configure the desired scaling adjustments, such as the number of instances to add or remove.
- Save the scaling actions.
-
Review your changes and click on the “Save” button to update the alarm.
The edited CloudWatch alarm will now trigger scaling actions for your EC2 auto-scaling group based on the specified CPU utilization thresholds and duration.
did you read my message?
How do you add CloudFlare to AWS Load Balancer? Spent an hour trying to figure out how to make it work – no luck.
Without NGINX forwarding of course, since it seems like a huuuge redundant overhead
Hell. I tried refreshing instances in auto-scaling group.
I thought that the logic for this is following
- Create a new instance
- Make sure that it’s port 80 is accessible
- Drop old instance, remove it from the auto-scaling group
But the logic is like this
- Remove old instance from the auto-scaling group
- Create a new instance
- Drop old instance
How to make it work like it should (1st case?)
the purpose of an auto-scaling group is not to manipulate instances manually
if you need more instances launched, increase the min size
if you want to refresh all running instances (for any reason), there is a menu item in the AWS console “Refresh Instances” (if you click on it, the ASG will replace all old instances with new ones making sure the min desired size is always running)
This reminds me of the termination policy of AWS:
If you did not assign a specific termination policy to the group, Amazon EC2 Auto Scaling uses the default termination policy. It selects the Availability Zone with two instances, and terminates the instance that was launched from the oldest launch template or launch configuration. If the instances were launched from the same launch template or launch configuration, Amazon EC2 Auto Scaling selects the instance that is closest to the next billing hour and terminates it.
I don’t use the policy for a while but still remember the pain lol
If investing too much into ASG, the beauty of k8s will be missed
Learn k8s directly
@Andriy Knysh (Cloud Posse) not manipulate maually? But how to update instances with new code? I have a launch script in the “user data” which clones github repo. How can i update all my instances automatically?
if you want to refresh all running instances (for any reason), there is a menu item in the AWS console “Refresh Instances”
I did exactly this and accessing my load balancing was giving me http errors for a while, meaning it drop an existing instance from the group while another instance was not ready. Why?
@Hao Wang what’s the ASG?)
@Hao Wang does k8s launches Docker inside EC2 instances? Or Creates EC2 instances already as a Docker image without inner Docker overhead?
It sounds like you could learn a lot from some introduction tutorials to auto scaling. AWS have lots of good video introductions to resources. If you feel masochistic you can also try to read the docs
But how to update instances with new code? I have a launch script in the “user data” which clones github repo. How can i update all my instances automatically?
you use a Launch template, update any parameter in it, and the ASG will update all instances to use the new version of the Launch template
if you know terraform, this is how it’s done https://github.com/cloudposse/terraform-aws-ec2-autoscale-group/blob/main/main.tf
``` resource “aws_launch_template” “default” { count = module.this.enabled ? 1 : 0
name_prefix = format(“%s%s”, module.this.id, module.this.delimiter)
dynamic “block_device_mappings” { for_each = var.block_device_mappings content { device_name = lookup(block_device_mappings.value, “device_name”, null) no_device = lookup(block_device_mappings.value, “no_device”, null) virtual_name = lookup(block_device_mappings.value, “virtual_name”, null)
dynamic "ebs" {
for_each = lookup(block_device_mappings.value, "ebs", null) == null ? [] : ["ebs"]
content {
delete_on_termination = lookup(block_device_mappings.value.ebs, "delete_on_termination", null)
encrypted = lookup(block_device_mappings.value.ebs, "encrypted", null)
iops = lookup(block_device_mappings.value.ebs, "iops", null)
kms_key_id = lookup(block_device_mappings.value.ebs, "kms_key_id", null)
snapshot_id = lookup(block_device_mappings.value.ebs, "snapshot_id", null)
volume_size = lookup(block_device_mappings.value.ebs, "volume_size", null)
volume_type = lookup(block_device_mappings.value.ebs, "volume_type", null)
}
}
} }
dynamic “credit_specification” { for_each = var.credit_specification != null ? [var.credit_specification] : [] content { cpu_credits = lookup(credit_specification.value, “cpu_credits”, null) } }
disable_api_termination = var.disable_api_termination ebs_optimized = var.ebs_optimized update_default_version = var.update_default_version
dynamic “elastic_gpu_specifications” { for_each = var.elastic_gpu_specifications != null ? [var.elastic_gpu_specifications] : [] content { type = lookup(elastic_gpu_specifications.value, “type”, null) } }
image_id = var.image_id instance_initiated_shutdown_behavior = var.instance_initiated_shutdown_behavior
dynamic “instance_market_options” { for_each = var.instance_market_options != null ? [var.instance_market_options] : [] content { market_type = lookup(instance_market_options.value, “market_type”, null)
dynamic "spot_options" {
for_each = (instance_market_options.value.spot_options != null ?
[instance_market_options.value.spot_options] : [])
content {
block_duration_minutes = lookup(spot_options.value, "block_duration_minutes", null)
instance_interruption_behavior = lookup(spot_options.value, "instance_interruption_behavior", null)
max_price = lookup(spot_options.value, "max_price", null)
spot_instance_type = lookup(spot_options.value, "spot_instance_type", null)
valid_until = lookup(spot_options.value, "valid_until", null)
}
}
} }
instance_type = var.instance_type key_name = var.key_name
dynamic “placement” { for_each = var.placement != null ? [var.placement] : [] content { affinity = lookup(placement.value, “affinity”, null) availability_zone = lookup(placement.value, “availability_zone”, null) group_name = lookup(placement.value, “group_name”, null) host_id = lookup(placement.value, “host_id”, null) tenancy = lookup(placement.value, “tenancy”, null) } }
user_data = var.user_data_base64
dynamic “iam_instance_profile” { for_each = var.iam_instance_profile_name != “” ? [var.iam_instance_profile_name] : [] content { name = iam_instance_profile.value } }
monitoring { enabled = var.enable_monitoring }
# https://github.com/terraform-providers/terraform-provider-aws/issues/4570 network_interfaces { description = module.this.id device_index = 0 associate_public_ip_address = var.associate_public_ip_address delete_on_termination = true security_groups = var.security_group_ids }
metadata_options { http_endpoint = (var.metadata_http_endpoint_enabled) ? “enabled” : “disabled” http_put_response_hop_limit = var.metadata_http_put_response_hop_limit http_tokens = (var.metadata_http_tokens_required) ? “required” : “optional” http_protocol_ipv6 = (var.metadata_http_protocol_ipv6_enabled) ? “enabled” : “disabled” instance_metadata_tags = (var.metadata_instance_metadata_tags_enabled) ? “enabled” : “disabled” }
dynamic “tag_specifications” { for_each = var.tag_specifications_resource_types
content {
resource_type = tag_specifications.value
tags = module.this.tags
} }
tags = module.this.tags
lifecycle { create_before_destroy = true } }
- locals {
- launch_template_block = {
- id = one(aws_launch_template.default[*].id)
- version = var.launch_template_version != “” ? var.launch_template_version : one(aws_launch_template.default[*].latest_version)
- }
- launch_template = (
- var.mixed_instances_policy == null ? local.launch_template_block
- null) mixed_instances_policy = ( var.mixed_instances_policy == null ? null : { instances_distribution = var.mixed_instances_policy.instances_distribution launch_template = local.launch_template_block override = var.mixed_instances_policy.override }) tags = { for key, value in module.this.tags : key => value if value != “” && value != null } }
resource “aws_autoscaling_group” “default” { count = module.this.enabled ? 1 : 0
name_prefix = format(“%s%s”, module.this.id, module.this.delimiter) vpc_zone_identifier = var.subnet_ids max_size = var.max_size min_size = var.min_size load_balancers = var.load_balancers health_check_grace_period = var.health_check_grace_period health_check_type = var.health_check_type min_elb_capacity = var.min_elb_capacity wait_for_elb_capacity = var.wait_for_elb_capacity target_group_arns = var.target_group_arns default_cooldown = var.default_cooldown force_delete = var.force_delete termination_policies = var.termination_policies suspended_processes = var.suspended_processes placement_group = var.placement_group enabled_metrics = var.enabled_metrics metrics_granularity = var.metrics_granularity wait_for_capacity_timeout = var.wait_for_capacity_timeout protect_from_scale_in = var.protect_from_scale_in service_linked_role_arn = var.service_linked_role_arn desired_capacity = var.desired_capacity max_instance_lifetime = var.max_instance_lifetime capacity_rebalance = var.capacity_rebalance
dynamic “instance_refresh” { for_each = (var.instance_refresh != null ? [var.instance_refresh] : [])
content {
strategy = instance_refresh.value.strategy
dynamic "preferences" {
for_each = (length(instance_refresh.value.preferences) > 0 ? [instance_refresh.value.preferences] : [])
content {
instance_warmup = lookup(preferences.value, "instance_warmup", null)
min_healthy_percentage = lookup(preferences.value, "min_healthy_percentage", null)
}
}
triggers = instance_refresh.value.triggers
} }
dynamic “launch_template” { for_each = (local.launch_template != null ? [local.launch_template] : []) content { id = local.launch_template_block.id version = local.launch_template_block.version } }
dynamic “mixed_instances_policy” { for_each = (local.mixed_instances_policy != null ? [local.mixed_instances_policy] : []) content { dynamic “instances_distribution” { for_each = ( mixed_in…
user_data = var.user_data_base64
@Oleh Kopyl ASG = AutoScaling Group, yeah you can understand docker in EC2 for now, but under the hood it is much more complex
Docker is just a middle man(run time) which can be replaced by others, like containerd
@Alex Jurkiewicz docs? no, thank you very much.
If you could share a couple of good videos which actually helped you personally, i would appreciate it
@Andriy Knysh (Cloud Posse) so it does the update automatically upon each template version creation?
@Andriy Knysh (Cloud Posse) i don’t know Terraform. But should I?
@Hao Wang i mean that i don’t want to Have Docker in Docker. So if messing around with user data of a launch template is the only option – i better do this than pay for 1 CPU i don’t use (which is used under the hood to have Docker inside Docker))
It is not DnD
@Hao Wang k8s?
yeah
lol better to dig in more, before I didn’t have ChatGPT…
@Hao Wang thanks :)
My pleasure, keep learning
@Oleh Kopyl If you just want to refresh the instances in the ASG , this could be helpful. https://docs.aws.amazon.com/autoscaling/ec2/userguide/asg-instance-refresh.html
if you are using terraform, you can add below setting into your asg configuration : https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/autoscaling_group#instance_refresh
2023-06-22
i launch my ec2 instance with a launch script (user data). it ends with “python3 launch-app.py” Where are the logs of it in Ubuntu? Meaning where is the stderr and stdout?
can be found under /var/log
, it should be managed by systemd otherwise cloud-init will never finishes running
i am not very sure but i remember i had a ubuntu server running with similar setup as yours and was trying to find stderr and stdout logs but i ended up finding nothing until you write those logs to a file. This article says the same. https://askubuntu.com/questions/1030938/where-do-i-find-stderr-logs#<i class="em em-~"</i>text=Command%20output%2C%20i.e.%20stdout%20and,writing%20it%20to%20a%20file> This could be helpful you can send your user data logs: https://repost.aws/knowledge-center/ec2-linux-log-user-data
Another option is to look at the cloudinit logs.
@Hao Wang yeah, but there are multiple files. So i was wondering what is the one for the logs i need…
I could as well check all the files, since there are not so many and avoid posting here.
Thank you very much any was
Is checking all the files fun? Lol starting with cloud-init*
if python3 launch-app.py
run, cloud-init will hang forever
@Hao Wang not fun, true
if python3 launch-app.py
run, cloud-init will hang forever
@Hao Wang what do you mean by “hang forever”?
yeah, forever
cloud init is not meant to run long running services. it’s meant for doing EC2 first run bootstrapping. It won’t even run on reboot. Your cloud init script should create a systemd service file, enable the service and start the service.
if you did run the app at the end of the launch script then all the stdout/err will be in the /var/logs/cloud* files.
cloud init is not meant to run long running services
Sure, but why can’t it?
@jonjitsu
cloud init is an agent which runs setup which eventually ends. It is not a service manager. For example it won’t run on reboot so if you try to run a service at the end of a cloud-init script it won’t run. All linuxes come with service managers. Ubuntu’s is systemd, so in your cloud init you can create the systemd service file and systemctl enable
then systemctl start
it and your service will be properly managed including in the case of reboots.
I don’t know what happens when you try to run a never ending process at the end of a cloud init script or if that is even valid according to the cloud init specs.
@jonjitsu just yesterday rewrote it to systemd since i needed restarts when an app fails (OOM kill)
#!/bin/bash
git clone https://[email protected]/kopyl/crossdot-yolo.git && \
mv crossdot-yolo/onnx /home/ubuntu/onnx && \
rm -r crossdot-yolo && \
mv /home/ubuntu/onnx/* /home/ubuntu/ && \
rm -r /home/ubuntu/onnx/ && \
echo "wget <https://bootstrap.pypa.io/get-pip.py> && \
python3 get-pip.py && \
python3 -m pip install --ignore-installed flask && \
python3 -m pip install psycopg2-binary && \
python3 -m pip install onnxruntime==1.13.1 && \
python3 -m pip install opencv-python-headless==4.7.0.72 " >> i.sh && \
chmod 777 i.sh && \
./i.sh && \
echo "[Unit]
Description=Flask app
After=multi-user.target
[Service]
Environment=AWS_POSTGRES_DB_HOST=.
Environment=AWS_POSTGRES_DB_USER=.
Environment=AWS_POSTGRES_DB_PASSWORD=.
Environment=ONNX_MODEL_PATH=/home/ubuntu/model.onnx
Environment=PYTHONUNBUFFERED=1
Type=simple
Restart=always
ExecStart=/usr/bin/python3 /home/ubuntu/flask-server-postgres.py
[Install]
WantedBy=multi-user.target" >> /etc/systemd/system/crossdot-flask-inference.service && \
systemctl daemon-reload && \
systemctl enable crossdot-flask-inference.service && \
systemctl start crossdot-flask-inference.service && \
systemctl status crossdot-flask-inference.service
@jonjitsu
agent which runs setup which eventually ends
Do you have any proofs? Not trying to be rude, just really want to read it from some official source
@jonjitsu by the way, is it going to start on its own on reboot or you need some additional configs?
I don’t really have any proof. It might be out there though, you’ll have to check the docs.
It should start on it’s own after reboot because you ran systemctl enable ...
@jonjitsu was not able to find it in docs. How can you know then if you have no proofs?
2023-06-23
2023-06-26
AWS S3 Buckets never cease to amaze me with their peculiar nature. https://www.cloudyali.io/blogs/aws-s3-bucket-creation-date-discrepancy-in-master-and-other-regions
AWS S3 bucket creation date may be reported differently in different regions. To get the S3 bucket creation date correctly call list api in us-east-1.
Good post. Straightforward and brings receipts
AWS S3 bucket creation date may be reported differently in different regions. To get the S3 bucket creation date correctly call list api in us-east-1.
2023-06-27
I have an issue with Fargate. It scales up fast (from 1 to 60 instances in 1 minute), but scales down tooo slow (from 60 to 1 instance in 59 min, meaning it scales 1 instance per 1 minute).
Can i have more control over it? I need it to scale up in 1 minute and down in 1 minute too (from whatever amount instances i have to whatever amount instances are needed at a moment be it 1 or 30 or anything else(
if you change the service desired count to 1 you are saying it is taking 59 mins to remove all the extra containers?
I’m not changing it to 1
It changes to 1 on its own
Reducing amount of “desired” by 1 per minute on its own
I need it to reduce to one in 1 minute, not in 59
why does it go from 1 to 60 back down to 1 in one minute? Is this a batch job or something? what mechanism are you using to scale it from 1 to 60?
- Is it possible to have app restarts on Fargate?
As if you launch it with systemd
as a service…
I was getting 5xx errors from Fargate probably due to OOM kills. App restart would fix this crap….
when a container dies it’s not getting restarted by fargate?
@jonjitsu what do you mean by “dies” exactly?
@jonjitsu i was getting 5xx HTTP error, so restarts wasn’t working obviously
You can try fore deployment option in ecs servcie.
@Oleh Kopyl how are you controlling the scaling? Scaling down should not take than long
With CloudWatch metrics
Does SageMaker Real-Time Inference scale up and down?
So I don’t pay for instances which are not used
Seems to be the case
Use this step-by-step, hands-on guide to learn how to deploy a trained machine learning model to a real-time inference endpoint.
Hey all.
I’m pondering over a project I’m working on at the moment and was hoping to get some advice or thoughts from other people.
I have to design the architecture for a 3 tier nodejs application which consists of a simple web front-end, and API component, and a database. My initial thoughts are to go serverless and deploy the web and API components on Lambda and try to keep things light and quick. I am concerned here though about the potential lack of flexibility with the front-end. I understand that you can have a Lambda function return HTML but I don’t know how well it would work for the application’s progression in the furture.
Alternatively, I can containerise both the web and API components and move them onto ECS which would cost more but allow for greater flexibility and if need be a migration to Kubernetes if required down the track.
Has anybody got any thoughts on this? Have you deployed front-end on Lambda and had it work well or poorly?
I think “light and quick” correlates more with using the technologies you know well, rather than using specific tools that have buzz
I think that’s some wisdom I needed to hear
Some say Fargate does restarts. But does it restart the whole image or just an app from CMD?
I was getting some 5xx errors from Fargate due to OOM Kills. OOM Kills are okay, but 5xx errors are not. With my EC2 instance (no Docker) systemd always restarts my Python app and since i have a load balancer, i never get 5xx errors (the worst what i can get - .5s delay on a request.
Meaning that Fargate seemingly makes whole image restarts instead of just my python app restarts (the thing which was in CMD
like ["python", "main.py"]
If it’s really the case, is there any way to force Fargate to just restart my app (in the same way as systemd does it).
I was trying to get systemd to work in Docker but was getting this error: /bin/sh: 4: systemd: not found.
Even after i did apt update && apt install --reinstall systemd -y
, i was still getting errors like this:
System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to connect to bus: Host is down
I tried using service
instead of systemctl
, but it had errors too:
# service crossdot-flask-inference.service on
crossdot-flask-inference.service: unrecognized service
when a container fails healthcheck, the entire thing is replaced
containers aren’t meant to be touched beyond the initial process being started. If you want the app to automatically restart, add a supervisor.
Systemd is one approach, but it’s pretty heavyweight. You could also use a shell script which contains something like
while true; do python main.py ; done
What is supervisor? This thing? https://dev.to/rabeeaali/install-supervisor-on-aws-eb-with-laravel-5g8a
Is my python app going to be relaunched in a case of OOM kill if i launch it with the shell script?
when a container fails healthcheck, the entire thing is replaced
Damn.. Not something i need…
no, supervisor is “init system”. Generally, containers run without one. init systems are designed for long-running servers, but that’s not how containers work. As you see, they follow the philosophy “if it stops working, kill it and get a new one”
2023-06-28
aws serverless v1. how vcan I restore from backup and not from snapshot?
What do you mean by backup if not a snapshot?
In rds I see snapshots and backups. I wanted to restore from backup. Guess that is good only for backup point in time
@Balazs Varga do you still need to restore from backup? Maybe @Alex Jurkiewicz can help
yes, to a different location. I think I can use the aws backup to copy over the backups to different region and then restore from that backup
thanks
2023-06-29
Is anyone well versed in github actions and aws, i’m having an issues deploying my container image to ecr i keep getting Error: Not authorized to perform sts:AssumeRoleWithWebIdentity
We have significant experience, that sounds like there is an issue with your trust policy of the role
I gave the role admin access, i know its not good practice but i just wanted to make sure it was working
that’s not the trust policy
what’s the trust policy
{ “Version”: “2012-10-17”, “Statement”: [ { “Effect”: “Allow”, “Principal”: { “Federated”: “arniam:oidc-provider/token.actions.githubusercontent.com” }, “Action”: “sts:AssumeRoleWithWebIdentity”, “Condition”: { “StringLike”: { “http://token.actions.githubusercontent.com<i class="em em-aud|token.actions.githubusercontent.com"</i>aud>”: “sts.amazonaws.com”, “http://token.actions.githubusercontent.com<i class="em em-sub|token.actions.githubusercontent.com"</i>sub>”: “repo:dandiggas/firstwebapp” } } } ] }
I think you are missing a *
repo:dandiggas/firstwebapp:*
ok ill try that now
Yeah still not working. keep getting this error message Run aws-actions/configure-aws-credentials@v2 https://github.com/Dandiggas/FirstWebApp/actions/runs/5412372323/jobs/9836381905#step<i class="em em-3"6 (node:1742) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023. https://github.com/Dandiggas/FirstWebApp/actions/runs/5412372323/jobs/9836381905#step<i class="em em-3"7
https://github.com/Dandiggas/FirstWebApp/actions/runs/5412372323/jobs/9836381905#step<i class="em em-3"8
Please migrate your code to use AWS SDK for JavaScript (v3).
https://github.com/Dandiggas/FirstWebApp/actions/runs/5412372323/jobs/9836381905#step<i class="em em-3"9
For more information, check the migration guide at https://a.co/7PzMCcy
https://github.com/Dandiggas/FirstWebApp/actions/runs/5412372323/jobs/9836381905#step<i class="em em-3"10
(Use node --trace-warnings ...
to show where the warning was created)
https://github.com/Dandiggas/FirstWebApp/actions/runs/5412372323/jobs/9836381905#step<i class="em em-3"11
Error: Not authorized to perform sts:AssumeRoleWithWebIdentity
What happened when you tried removing the condition altogether?
It worked thats what happened lol
Thanks a lot! I’m quite new to this so making a lot of errors
so you need some sort of condition, but that means there is a mismatch between your condition and the reality
Yeah i just fixed it with the condition
what’s the new condition?
{ “Version”: “2012-10-17”, “Statement”: [ { “Effect”: “Allow”, “Principal”: { “Federated”: “arniam:oidc-provider/token.actions.githubusercontent.com” }, “Action”: “sts:AssumeRoleWithWebIdentity”, “Condition”: { “StringLike”: { “http://token.actions.githubusercontent.com<i class="em em-aud|token.actions.githubusercontent.com"</i>aud>”: “sts.amazonaws.com”, “http://token.actions.githubusercontent.com<i class="em em-sub|token.actions.githubusercontent.com"</i>sub>”: “repo:Dandiggas/FirstWebApp:*” } } } ] }
Thanks Warren
no problem
Hey. Earlier this year we began using the terraform-aws-sso module to manage our human access to our AWS accounts. It works really well and has been a lifesaver, so upfront thank you to everyone who added to it. .
However I think I am missing something as only recently did we have a need to make a new account assignment and because I have a depends_on
for my Okta moduole to make sure the okta groups are created before the account assignment is attempted, terraform is forcing a replacement of all account assignments despite the code only adding one.
Removing the depends_on
fixes it in my plan, but I worry it will fail because it isn’t aware of the dependency on my okta module.
I did some searching and I think that this PR addressed this issue already by adding a variable to handle the dependency issue.
The variable identitystore_group_depends_on
description states the value should be “a list of parameters to use for data resources to depend on”.
I don’t understand what parameters it’s referring to? Is it a list of all Okta groups I create?
I suggest cross posting link to this message in #terraform
Will do. Thanks Eric.
@Simon Weil you clearly know what you’re doing since it’s your PR and the code in your example is super clean. Any ideas on how to fix this?
2023-06-30
I have few question related to orangizations:
• I know I need to select a management account, but with delegated role can I have a user in member account to manage organization?
• can I limit this delegated role to OU ?
• if I delete the management account will it delete the all other aws accounts in organization ?
@Ben Smith (Cloud Posse)
- Todo this you’d have to use the aws organizations delegation to delegate Organization management to a delegated role. So you could have your AWS Org delegated to an
aws-team
. By default your management account is theroot
account, and contains your state bucket and is where yourSuperAdmin
role deploys theaccounts
component, which creates the member accounts. - The aws organizations delegated role can be limited to an OU, meaning you just need to specify in the roles permissions
- I’ve never done this, as this is where we deploy our TFState and manage organization security policies. It appears from the aws docs that it would delete the organization, resulting in your member accounts becoming standalone accounts
Learn how to delete an AWS organization that you no longer need.