SweetOps #terraform-aws-modules for December, 2023

Discussions related to https://github.com/terraform-aws-modules

Archive: https://archive.sweetops.com/terraform-aws-modules/

2023-12-03

Josh B.

02:59:36 PM

When someone gets a chance please https://github.com/cloudposse/terraform-aws-mq-broker/pull/70

#70 Allow custom broker name

what

Allow a user to set broker_name to a custom name.

why

Flexibility

references

Andriy Knysh (Cloud Posse)

04:00:30 PM

@Josh B. thanks, please see a few comments

#70 Allow custom broker name

what

Allow a user to set broker_name to a custom name.

why

Flexibility

references

Gabriela Campana (Cloud Posse)

02:47:22 PM

@Josh B. more comments

2023-12-05

Nate

02:51:22 PM

Good morning!

I am trying to use the bastion module (terraform-aws-components/modules/bastion) to create an SSM based bastion host. However, the instance seems to be stuck right after “Mounting additional volume…” message. I see the following message printed in system logs:

[  345.411338] cloud-init[2351]: waiting for device /dev/sdh

Here is the configuration used as part of the Atmos stack:

    bastion-ssm:
      metadata:
        component: infra/bastion
        inherits:
          - bastion/defaults
      vars:
        enabled: true
        name: bastion-ssm
        availability_zones: ["us-east-1a", "us-east-1b"]
        instance_type: t3.micro

Any ideas on what is causing this and how to get past this stage?

Dan Miller (Cloud Posse)

04:33:56 PM

what happens after you see that “Mounting additional volume…” message?

Does Terraform fail or timeout with another message? Or does Terraform successfully apply the component and the instance itself fails to complete?

Dan Miller (Cloud Posse)

04:35:10 PM

for reference, that message comes from this step of user-data https://github.com/cloudposse/terraform-aws-components/blob/main/modules/bastion/templates/user-data.sh#L3-L9

# Mount additional volume
echo "Mounting additional volume..."
while [ ! -b $(readlink -f /dev/sdh) ]; do echo 'waiting for device /dev/sdh'; sleep 5 ; done
blkid $(readlink -f /dev/sdh) || mkfs -t ext4 $(readlink -f /dev/sdh)
e2label $(readlink -f /dev/sdh) sdh-volume
grep -q ^LABEL=sdh-volume /etc/fstab || echo 'LABEL=sdh-volume /mnt ext4 defaults' >> /etc/fstab
grep -q \"^$(readlink -f /dev/sdh) /mnt \" /proc/mounts || mount /mnt

Nate

04:38:11 PM

Yes, that’s correct. The message is coming from user-data. Terraform completes successfully but I am not able to access the instance as it is not completely setup yet.

Dan Miller (Cloud Posse)

04:49:48 PM

I was able to reproduce this. checking for a quick fix

Dan Miller (Cloud Posse)

05:09:04 PM

https://github.com/cloudposse/terraform-aws-components/pull/917

#917 fix: `bastion` userdata volume mount

what

• removed volume mount steps from bastion userdata

why

• there is no additional volume or block storage attached to a bastion instance. So userdata would get stuck at this step and fail to complete.

[  345.411338] cloud-init[2351]: waiting for device /dev/sdh

references

• https://sweetops.slack.com/archives/CDYGZCLDQ/p1701787882992429

Dan Miller (Cloud Posse)

05:26:56 PM

We’ve merged this PR now. You should be able to remove that userdata step and retry

Nate

10:09:56 PM

Awesome! Thanks @Dan Miller (Cloud Posse) Will try it again tonight.

Nate

03:39:57 AM

@Dan Miller (Cloud Posse) The instance seems to be now stuck on the docker installation step. I am including a portion of the system log:

[   96.330320] cloud-init[2216]: yum-config-manager --save --setopt=<repoid>.skip_if_unavailable=true
[   96.331804] cloud-init[2216]: Cannot find a valid baseurl for repo: amzn2-core/2/x86_64
[   96.333142] cloud-init[2216]: Could not retrieve mirrorlist <https://amazonlinux-2-repos-us-east-1.s3.dualstack.us-east-1.amazonaws.com/2/core/latest/x86_64/mirror.list> error was
[   96.335805] cloud-init[2216]: 12: Timeout on <https://amazonlinux-2-repos-us-east-1.s3.dualstack.us-east-1.amazonaws.com/2/core/latest/x86_64/mirror.list>: (28, 'Connection timeout after 5000 ms')
[   96.338840] cloud-init[2216]: Dec 06 00:09:54 cloud-init[2216]: util.py[WARNING]: Package upgrade failed
[   96.437710] cloud-init[2216]: Dec 06 00:09:54 cloud-init[2216]: cc_package_update_upgrade_install.py[WARNING]: 1 failed with exceptions, re-raising the last one
[   96.440334] cloud-init[2216]: Dec 06 00:09:54 cloud-init[2216]: util.py[WARNING]: Running module package-update-upgrade-install (<module 'cloudinit.config.cc_package_update_upgrade_install' from '/usr/lib/python2.7/site-packages/cloudinit/config/cc_package_update_upgrade_install.pyc'>) failed
[   96.930433] cloud-init[2323]: Cloud-init v. 19.3-46.amzn2.0.1 running 'modules:final' at Wed, 06 Dec 2023 00:09:55 +0000. Up 96.86 seconds.
[   97.016364] cloud-init[2323]: Installing docker...

Nate

03:49:49 PM

Are these two parameters required for the instance to start properly?

image_container: infrastructure:latest
image_repository: "111111111111.dkr.ecr.us-east1.amazonaws.com/example/infrastructure"

Dan Miller (Cloud Posse)

05:09:00 PM

no those are not required. when we run this internally, we only pass these options:

components:
  terraform:
    bastion:
      vars:
        enabled: true
        name: bastion
        instance_type: t3.micro
        inbound_ssh_enabled: true
        associate_public_ip_address: true # deploy to public subnet and associate public IP with instance
        security_group_rules:
          - key         : ssh_us-east-2
            type        : ingress
            from_port   : 22
            to_port     : 22
            protocol    : tcp
            description : Permit core-network to connect from us-east-2
            cidr_blocks : ["10.x.x.x/20"]

Dan Miller (Cloud Posse)

05:11:47 PM

the docker installation step is working for this use-case, but it’s in a public subnet

Nate

05:12:44 PM

Right - I was just going to say that. I am trying to standup an SSM version of bastion host.

Nate

05:12:53 PM

in a private subnet

Dan Miller (Cloud Posse)

05:13:45 PM

that should be fine as well, but you may need to update userdata

Nate

05:15:14 PM

Is that because I am launching the instance in a private subnet? Any guidance on which parts of the the user data I may need to update?

Dan Miller (Cloud Posse)

05:20:19 PM

I suspect it’s because the instance in the private subnet cant install packages listed in userdata. I would suggest adding a flag to disable userdata if it isnt required for your use-case. Or you can add a flag to disable userdata but package all requirements in a custom AMI and then pass that AMI to the component

Nate

05:26:31 PM

We don’t have any specific need for custom user data for now. I will disable it through a flag and try it out. Thanks @Dan Miller (Cloud Posse)

Nate

05:02:49 PM

@Dan Miller (Cloud Posse) It turns out we do need to install basic tools like psql in the image - so we needed to bring the user data section back in. However, we finally figured out the reason for the issues - the instance did not have a route to internet to download the packages. We updated security group and subnet routes, and all is working as expected now. Thanks for your help!

Dan Miller (Cloud Posse)

06:09:06 PM

great! happy to have helped

Alex Jurkiewicz

01:48:20 AM

just wanted to check in and say THANKS for migrating to semver last year. I just upgraded several old stacks with pre-semver cloudposse modules, my life was made much easier

Erik Osterman (Cloud Posse)

05:47:14 PM

Thanks @Alex Jurkiewicz - glad to hear it

#terraform-aws-modules (2023-12)

Terraform Modules

2023-12-03

2023-12-05

2023-12-06

2023-12-07

2023-12-27