SweetOps #refarch for March, 2024

Cloud Posse Reference Architecture

2024-03-27

Taimur Gibson

Hey, getting a weird error when trying to deploy ecs-services tasks. Some, but not all of our services are failing to deploy with this error:

│ Error: creating ECS Task Definition (taskname): ClientException: When networkMode=awsvpc, the application protocol must be one of [http, http2, grpc]
│ 
│   with module.ecs_alb_service_task[0].aws_ecs_task_definition.default[0],
│   on .terraform/modules/ecs_alb_service_task/main.tf line 49, in resource "aws_ecs_task_definition" "default":
│   49: resource "aws_ecs_task_definition" "default" {
│ 

Taimur Gibson

06:16:07 PM

We can’t find any meaningful difference between the services that deploy and the ones that don’t

Taimur Gibson

06:16:24 PM

According to this: https://github.com/cloudposse/terraform-aws-components/tree/main/modules/ecs-service#input_containers

Taimur Gibson

06:16:39 PM

appProtocol is an optional string and we don’t have it set for any of the other tasks that are working

Taimur Gibson

06:16:48 PM

and setting it also doesn’t seem to matter

Dan Miller (Cloud Posse)

06:23:23 PM

based on the error message, it sounds like awsvpc network mode requires one of those 3 app protocols. When you tried setting appProtocol, what happened?

Taimur Gibson

06:27:26 PM

didn’t make a difference, same error

Taimur Gibson

06:28:17 PM

that should go under port_mappings correct?

Dan Miller (Cloud Posse)

06:29:20 PM

yes that should be right. Could you share how you configured that variable?

Taimur Gibson

06:31:14 PM

            port_mappings: 
                - containerPort: 80
                hostPort: 80
                protocol: tcp
                appProtocol: http

Dan Miller (Cloud Posse)

06:33:58 PM

I’m assuming Slack reformatted that right? YAML indentation is picky

            port_mappings: 
              - containerPort: 80
                hostPort: 80
                protocol: tcp
                appProtocol: http

Taimur Gibson

06:34:16 PM

yes, that’s what we have

Taimur Gibson

06:34:57 PM

it doesn’t seem to be picking up the appProtocol var

Taimur Gibson

06:35:13 PM

it’s weird because this works fine for some other ecs-services with nearly the same config

Dan Miller (Cloud Posse)

06:35:47 PM

yeah that is bizarre. Can you share a config that is working? What’s the difference?

Taimur Gibson

06:38:53 PM

              port_mappings:
                - containerPort: 8080
                  hostPort: 8080
                  protocol: tcp

Taimur Gibson

06:38:55 PM

Dan Miller (Cloud Posse)

06:39:39 PM

oh lol. I’m trying to reproduce locally, one minute

Taimur Gibson

06:45:35 PM

we’re on version: 1.417.0

Dan Miller (Cloud Posse)

06:59:56 PM

I can’t reproduce this. Whenever I add the appProtocol, it is passed all the way through. Could you try describing a component that is working and a component that isnt working? Then check the values that are passed to terraform:

For example, describe the component:

atmos describe component ecs/platform/service/echo-server -s plat-use2-sandbox

Then double check that the port_mappings include appProtocol under port_mappings.

Then also check that task network_mode is awsvpc

Dan Miller (Cloud Posse)

07:00:35 PM

atmos describe component ecs/platform/service/echo-server -s plat-use2-sandbox
...

vars:
...
  containers:
    service:
...
      port_mappings:
      - appProtocol: http
        containerPort: 8080
        hostPort: 8080
        protocol: tcp
...
  task:
...
    network_mode: awsvpc

Dan Miller (Cloud Posse)

07:04:29 PM

oh maybe you might have a lifecycle rule configured for the task definition? In that case it could be that terraform is ignoring your changes. When you add or change appProtocol, if you run terraform plan does it show changes?

Taimur Gibson

07:12:19 PM

standby, we think we might have a larger issue with ECS clusters that is manifesting as this error for some weird reason

Taimur Gibson

07:12:32 PM

we deployed a new ECS cluster yesterday and might have broken some stuff by accident

Taimur Gibson

07:12:47 PM

if it’s still busted after the ECS cluster is rebuilt, I’ll give this a shot and follow up. thank you!

Dan Miller (Cloud Posse)

07:13:10 PM

sounds good. let me know!

Taimur Gibson

04:46:00 PM

yeah this ended up being a totally separate issue with a weird error message for some reason

Taimur Gibson

04:46:14 PM

it looks like it was partly being caused by some cached task definitions in the ecs s3 mirror

Taimur Gibson

04:46:26 PM

we cleared those out and things seem OK now

Dan Miller (Cloud Posse)

04:47:03 PM

yeah I believe this PR was the fix for anyone else looking up this thread in the future: https://github.com/cloudposse/terraform-aws-components/pull/1008

#1008 `ecs-service` better task definition merging

what

ECS Service Upstream for better support of partial task definition.

why

• Fixes issue with bad merges on s3 task definition • Map_secrets not being updated

#refarch (2024-03)

2024-03-27

2024-03-28