SweetOps #pr-reviews for November, 2024

Pull Request Reviews for Cloud Posse Projects

2024-11-01

z0rc3r

08:28:45 AM

https://github.com/cloudposse/terraform-aws-eks-cluster/pull/243

#243 Add support for zonal shift configuration

what

Add support of zonal shift config for the cluster. Closes #242

why

See linked ticket.

references

See linked ticket.

additional info

<br> to <br/> changes are caused by new version of terraform-docs and are intended, see terraform-docs/terraform-docs#787

2024-11-04

2024-11-06

2024-11-07

2024-11-08

Ray Finch

09:21:08 PM

Hey all, please review https://github.com/cloudposse/terraform-aws-rds-cluster/pull/236

#236 output instance endpoints, add attributes to random_pet that force a …

…new instance

what

• output instance endpoints • add aws_rds_cluster_instance attributes that force a new instance to the randmon_pet resource.

why

• I need the actual instance endpoints for the Datadog DMS integration, the default dashboards work better with the exact instance identifier. • Currently if any of these attributes change (db_subnet_group_name, engine) it will bypass the random_pet and attempt to create instances with the same identifier.

references

mihai.plesa

06:01:36 PM

can we revert this please? or at least put the last change behind a conditional. currently there is potential of data loss.

#236 output instance endpoints, add attributes to random_pet that force a …

…new instance

what

• output instance endpoints • add aws_rds_cluster_instance attributes that force a new instance to the randmon_pet resource.

why

references

mihai.plesa

02:41:31 PM

https://github.com/cloudposse/terraform-aws-rds-cluster/issues/241

#241 1.13.0 is forcing a recreate of the cluster instance

Describe the Bug

I’m currently using 1.12.0, which is good and all.

I’d like to upgrade to 1.14.0 which has a bug fix in an output that I’d like to utilize.

BUT a PR merged for 1.13.0 is causing the DB instance to recreate due to the way the attribute is defined (even if the end result value isn’t changing).

See that PR here #236 (review)

https://sweetops.slack.com/archives/CB6GHNLG0/p1733335202382489

Expected Behavior

It should not force a recreate of the resource. If it forces a recreate, it should be a MAJOR bump.

Steps to Reproduce

Set version to 1.13.0, will force replacement
image

Screenshots

No response

Environment

No response

Additional Context

No response

Gabriela Campana (Cloud Posse)

02:48:02 PM

@Andriy Knysh (Cloud Posse)

Jeremy G (Cloud Posse)

05:37:34 PM

@mihai.plesa This is a complicated issue and will take significant time to resolve, if it can be resolved at all. My best guess is a solution will involve some kind of hack.

All of which is to say, please be patient, and stick with v1.12.0 for now.

Yangci Ou

10:14:28 PM

Perhaps we can revert the original PR causing this recreate, then implement that as a MAJOR bump with breaking change instead of just a minor change

Gabriela Campana (Cloud Posse)

11:12:17 PM

@Jeremy G (Cloud Posse)

Jeremy G (Cloud Posse)

02:29:41 AM

I’m a bit rusty on RDS. @mihai.plesa @Yangci Ou please tell me, given that new DB instances will be created before the existing ones are deleted, how much of a problem is this upgrade going to cause you?

My recollection is that since Terraform creates new instances first, the data should get migrated to the new instances automatically before the old ones are deleted and the database should remain functional throughout. You probably want to specify extra long timeouts via timeouts_configuration and take a snapshot first.

The PR does fix a real bug so it is kind of weird to make it optional or revert it. Moving it to a new major version does not really solve the problem if the automatic rotation of DB instances is an issue, because most people are going to want to upgrade, and Cloud Posse encourages PRs to be on the latest version (although earlier versions can be targeted for maintenance).

That said, it is a pretty rarely encountered bug, give than changing the engine destroys the entire cluster, so the only real issue is if you want to change the name of the db subnet. I could live with keeping this bug in the code and just documenting it.

@Erik Osterman (Cloud Posse) what are your thoughts?

Yangci Ou

02:09:30 PM

Eh the problem is with RDS & Terraform, I’m pretty sure it doesn’t do the data migration as you mentioned - It does not automatically do this “Terraform creates new instances first, the data should get migrated to the new instances automatically before the old ones are deleted and the database should remain functional throughout.”

There’d be some manual intervention where take a final database snapshot, then spin up the new database on that snapshot. It’s a lot of effort for a no downtime blue green deployment just to upgrade a Terraform module version.

Yangci Ou

02:12:57 PM

throwing random suggestions/idea: what if that attribute that # forces replacement can be put into a lifecycle ignore block under a conditional? it’d create another variable to maintain backwards compatbility for folks that don’t want to go through the effort of recreating DB just to upgrade module 1.12 -> 1.13, but this requires documentation and maintainability with this such specific case

mihai.plesa

02:14:12 PM

I have suggested the same. nobody should be in the position of getting downtime or potential full data loss just for something so minor.

Yangci Ou

02:15:54 PM

all RDS eventually goes through engine upgrades/changes that forces recreation (which is why the other suggestion for MAJOR bump), but the key thing in this instance is that I don’t think people would want to go through the hassle for this recreate just for Terraform state to be happy

mihai.plesa

02:31:36 PM

major bump would mean that users that want newer features or bug fixes in newer module versions have to still go through the danger. it should be behind a conditional or reverted fully.

Jeremy G (Cloud Posse)

09:03:15 PM

~~~OK. The plan is to make it conditional by adding a feature flag:~~~ ~~~in the v1 module, the conditional defaults to the old behavior~~~ ~~~in the new v2 module, the conditional defaults to the new behavior, but you can set it to retain the old behavior~~~s a side note, @Yangci Ou, because this is our “cluster” module, these instances are inside a cluster, either an Aurora cluster or a Multi-AZ RDS cluster, and in either case AWS should migrate the data for you. I haven’t tested it, though, and I imagine it could take a long time with a big database if you are not using Aurora (where the storage is separate from the compute instances).

Yangci Ou

09:33:14 PM

That sounds like a good plan.

Ah thanks for that side note info, I’m curious about that too so I’ll test that out peronsally in a dummy environment.

Jeremy G (Cloud Posse)

12:35:13 AM

Upon further reflection, I’m going to just revert the changes unless I hear strong objections. Here’s why:

The added keepers (that cause the instances to be viably updated) break existing deployments, as noted, and that is bad. Granted, they fix a problem with deploying updated instances, but in every added case, ~~~the update also destroys and replaces the entire DB cluster~~~he change of the added variable also causes the the entire cluster to be destroyed and recreated.

Because of the high potential for data loss when replacing a cluster, I am actually in favor of this module failing when you try to do that. The workaround is to manually destroy the existing cluster first. That gives people fair warning that their data is at risk, and reduces the chance of accidental data loss because of misconfiguration.

Jeremy G (Cloud Posse)

12:35:23 AM

@Erik Osterman (Cloud Posse)

Yangci Ou

01:45:11 AM

So the decision is just to create a new version by reverting that PR, correct?

Jeremy G (Cloud Posse)

05:03:53 AM

That is my recommendation, but I want to get consensus on it before moving ahead.

Yangci Ou

03:01:57 PM

Sounds good. I’m good with any way as long as there’s a way to make it not recreate (either conditional variable or a specific version) while maintaining newer versions

Ray Finch

05:53:52 PM

Hey all, jumping in late here

Ray Finch

05:54:31 PM

I still have to read through all the comments above but I would like to stress THERE IS NO DATA LOSS

Ray Finch

05:56:29 PM

The data is part of the cluster not the individual instances

Ray Finch

05:59:59 PM

@Jeremy G (Cloud Posse) You state that “the update also destroys and replaces the entire DB cluster.”. That is not true for the changes in the latest MR

Ray Finch

06:03:17 PM

@Yangci Ou From what I can see you’re confusing the cluster with the instances, deleting the instance does not delete the data in the cluster.

Ray Finch

06:04:56 PM

And before I implemented the random_pet and the create_before_destroy any change that forced replacement of the instances would delete all the existing instances before creating new ones

Jeremy G (Cloud Posse)

07:08:46 PM

@Ray Finch I’m sorry my wording was confusing.

The change you made in the most recent PR was to add to the keepers:

• cluster_identifier

• db_subnet_group_name

• engine Any change to those destroys and replaces the entire DB cluster along with the instances, which results in data loss.

Ray Finch

07:09:54 PM

Hi @Jeremy G (Cloud Posse) no, that is not correct. A change to those keepers only effects the instances NOT the entire cluster

Ray Finch

07:10:32 PM

And it uses create_before_destroy so the new instances are created before the old ones are deleted

Ray Finch

07:11:23 PM

we have run this against our production databases with all applications running

Jeremy G (Cloud Posse)

07:11:48 PM

@Ray Finch This has nothing to do with the keepers or random_pet. Those variables also affect the cluster itself, and Terraform will replace the cluster when those variables change.

Ray Finch

07:12:29 PM

Ray Finch

07:12:43 PM

yeah, I see what you’re saying

Jeremy G (Cloud Posse)

07:13:23 PM

The previous set of keepers were ones that affected only the instances, not the cluster.

Ray Finch

07:13:26 PM

although is that true for the cluster identifier in this case?

Ray Finch

07:14:25 PM

it’s a caclulated value

Ray Finch

07:14:32 PM

from two other resources

Ray Finch

07:15:55 PM

actually, not sure the statement above is true

Jeremy G (Cloud Posse)

07:15:58 PM

although is that true for the cluster identifier in this case?

Yes, when the cluster identifier is changed, that forces a replacement of the cluster
Because the cluster identifier is part of the output of the random_pet (via the random_pet.prefix input), its output changes when the cluster identifier changes anyway, so it is as if cluster identifier were one of the keepers

Ray Finch

07:16:38 PM

the cluster identifier in this case is calculated as follows:

coalesce(join("", aws_rds_cluster.primary[*].id), join("", aws_rds_cluster.secondary[*].id))

Ray Finch

07:16:53 PM

so it could change and not have the cluster get recreated

Ray Finch

07:17:21 PM

also, according to the documentation db_subnet_group_name does not force a replacement

Ray Finch

07:18:13 PM

and it’s also a calculated value

join("", aws_db_subnet_group.default[*].name)

Jeremy G (Cloud Posse)

07:20:33 PM

From my test (output abbreviated)

  # module.rds_cluster.aws_rds_cluster.primary[0] must be replaced
+/- resource "aws_rds_cluster" "primary" {
      ~ allocated_storage                     = 1 -> (known after apply)
      ~ arn                                   = "arn:aws:rds:us-east-2:<redacted>:cluster:eg-test-rds-cluster-nuru" -> (known after apply)

...

      ~ db_subnet_group_name                  = "eg-test-rds-cluster-nuru" -> "nuru-rds-cluster" # forces replacement

Ray Finch

07:22:44 PM

ok, that’s not in the docs:

db_subnet_group_name - (Optional) DB subnet group to associate with this DB cluster. NOTE: This must match the db_subnet_group_name specified on every aws_rds_cluster_instance in the cluster.

Jeremy G (Cloud Posse)

07:23:43 PM

The documentation on what changes force a replacement is lacking to non-existent.

Ray Finch

07:24:27 PM

So a couple of points

Ray Finch

07:24:41 PM

the main one being that none of my changes cause data loss

Ray Finch

07:27:48 PM

and reverting the PR is just transferring the pain to those already on v1.13.0 or higher

Ray Finch

07:28:50 PM

also, I would argue the code is correct in that if any of the keepers change you would want a new random pet

Jeremy G (Cloud Posse)

07:31:45 PM

To recap: • I agree that the changes in the PR are legitimate bug fixes, as I said in my earlier statements. However, they are not a complete fix for the issue and, after consideration, I prefer the older behavior because it reduces the chance of accidental data loss. • The old and current module will fail if you change only the db_subnet_group_name or engine because it will try to replace the cluster but use the same cluster identifier, which is not allowed. This could be fixed, but because replacing the cluster results in data loss, I would prefer it to fail.

• The behavior of the old and current module when only the cluster identifier changes has remained the same. It will successfully replace the cluster. Therefore, as a practical matter, the changes in the PR do not change the behavior of the module, except that it forces new instances on anyone who had deployed them with an earlier version.

Ray Finch

07:34:04 PM

As it did with the 1.6 deploy

Ray Finch

07:37:19 PM

I do see your point in that the scenario where any of these change are unlikely

Ray Finch

07:38:14 PM

I for one wouldn’t migrate from mysql to postgres by changing engine in my terraform module

Ray Finch

08:44:09 PM

I’m OK with removing any attribute from the random_pet keepers that would force a cluster replacement anyway

Jeremy G (Cloud Posse)

08:45:55 PM

OK, good, because as it happens, all the attributes you added in this PR fall under that category: they would force a cluster replacement anyway.

Jeremy G (Cloud Posse)

06:48:26 PM

Rollback and other enhancements released as v1.15.0

mihai.plesa

11:08:04 AM

thank you, will test soon

mihai.plesa

03:53:56 PM

looking good to me

Yangci Ou

04:21:11 PM

Just upgraded. Looks good to me too. Thanks all!

2024-11-10

2024-11-12

Yangci Ou

01:01:27 PM

Hey all! Can I request some help / someone to look at Cloudposse’s terraform-aws-alb module, the CI is failing to release a new version after a most recent PR merge. It’s throwing an error about the README.md.gotmpl, but I’ve noticed that it hasn’t even been changed in the past 6 months. Hopefully someone with more context can help out? Here is the error https://github.com/cloudposse/terraform-aws-alb/pull/190#issuecomment-2470455978 - Thanks!

Comment on #190 fix: listener default action variable object block [HOTFIX REQUIRED!] attachment image

Erik Osterman (Cloud Posse)

01:52:40 PM

@Igor Rodionov

Cc @Gabriela Campana (Cloud Posse)

Comment on #190 fix: listener default action variable object block [HOTFIX REQUIRED!] attachment image

Gabriela Campana (Cloud Posse)

08:29:55 PM

@Igor Rodionov please lmk when you address this

Igor Rodionov

11:30:30 PM

@Erik Osterman (Cloud Posse) May I have an approval https://github.com/cloudposse/build-harness/pull/390 ?

#390 [readme] Fix includes

what

• Added explicit default value for README_INCLUDES environment variables

why

• Starting from gomplate 4.2.0 includes required implicit definition starting with “./” or “/”

references

• https://sweetops.slack.com/archives/CUJPCP1K6/p1731416487743859 • https://github.com/cloudposse/terraform-aws-alb/actions/runs/11795616491/job/32869718771 • hairyhenderson/gomplate#2255

Igor Rodionov

12:37:01 AM

@Yangci Ou fixed and released https://github.com/cloudposse/terraform-aws-alb/releases/tag/v2.1.0

Yangci Ou

02:55:00 AM

awesome, thanks so much! appreciate the quick help!!

Erik Osterman (Cloud Posse)

02:50:09 PM

If you’re affected by this in any other repos, let us know and we can retrigger the builds

Gabriela Campana (Cloud Posse)

03:36:22 PM

Please tag me, if it happens once again, @Yangci Ou

2024-11-13

01:54:46 PM

Please review s3 replication role fix

https://github.com/cloudposse/terraform-aws-s3-bucket/pull/250

03:18:33 PM

Thanks!

04:17:14 PM

Please review correct usage of new destination.bucket key

https://github.com/cloudposse/terraform-aws-s3-bucket/pull/256

03:46:23 PM

Thank you!

04:17:38 PM

Please review eventbridge support in event notifications

https://github.com/cloudposse/terraform-aws-s3-bucket/pull/255

03:46:35 PM

Thank you!

2024-11-14

08:09:23 AM

Please review fix for variable name

https://github.com/cloudposse/terraform-aws-s3-bucket/pull/258

Veronika Gnilitska

11:50:23 AM

Merge conflicts that must be resolved first

03:45:56 PM

Thanks Veronika. I fixed the conflicts.

Before merging, i was hoping to chat with someone about this comment to see what the proper step is for this.

https://github.com/cloudposse/terraform-aws-s3-bucket/pull/258#pullrequestreview-2434928344

cc @Jeremy G (Cloud Posse) if you have a sec

Jeremy G (Cloud Posse)

06:54:13 PM

@RB I would prefer to reject (or at least place on indefinite hold) this PR, because it is a breaking change that adds no new functionality

We can leave it open with do not merge and incorporate it into the next major version update, but I don’t think it warrants a version update and breaking change on its own. I want to reserve those for new features that are incompatible with old features or removing deprecated features.

07:14:12 PM

Yes i figured this but was wondering if we can do something like put the old var in variables-deprecated and if either is used, make it have the same impact, that way it would conform to cloudposse naming conventions without a breaking change, no?

Jeremy G (Cloud Posse)

07:30:22 PM

It’s on my to-do list to document Cloud Posse’s policy on exactly that. @Erik Osterman (Cloud Posse) How do you want to handle this?

Erik Osterman (Cloud Posse)

11:08:36 PM

I think we need to use release branches and merge this into those future release branches

Erik Osterman (Cloud Posse)

11:08:46 PM

won’t adding the major label have the same effect

Jeremy G (Cloud Posse)

06:17:43 PM

@Erik Osterman (Cloud Posse) I meant, I would prefer to defer this change until we had more significant breaking changes, because I don’t have a problem with the current variable name. If you think it should go out now with a major version bump, then I will update the docs sooner rather than later about how to deprecate the old variable.

06:36:30 PM

Couldn’t it go out without a major bump if the new var is added and the old vat is moved to variables-deprecated ?

Then later when the major release is ready, the variables-deprecated can be wiped ?

08:09:49 AM

Please review s3 payment request config

https://github.com/cloudposse/terraform-aws-s3-bucket/pull/259

03:46:45 PM

Thank you!

2024-11-15

2024-11-19

Michal Tomaszek

08:45:42 AM

hey, remote-state module version bump: https://github.com/cloudposse-terraform-components/aws-spacelift-admin-stack/pull/4

Gabriela Campana (Cloud Posse)

07:59:47 PM

we will merge this PR next week https://github.com/cloudposse-terraform-components/aws-spacelift-admin-stack/pull/4#issuecomment-2486671429

Michal Tomaszek

08:03:06 PM

np, got this info from Igor already thanks!

hamza-25

09:52:54 AM

Hey all, Can I please have this PR reviewed https://github.com/cloudposse/terraform-aws-ec2-instance/pull/212

Currently blocking me and forcing me to go a longer way round to solve (adding resources on top of the localised module)

Gabriela Campana (Cloud Posse)

08:00:54 PM

@Igor Rodionov

hamza-25

10:55:35 AM

Thanks Gabriela, is there a possible timeline for when this can be reviewed and merged? thanks @Gabriela Campana (Cloud Posse) @Igor Rodionov

Gabriela Campana (Cloud Posse)

10:42:33 PM

Hi @hamza-25 We can not promise on an ETA, as client work is prioritized over community support. I’ve tagged our engineers on the PRs. This is a best effort by a small team. We appreciate your persistence, and we’ll get to it when we can.

Igor Rodionov

12:43:31 AM

@hamza-25 thanks for your contribution. the PR merged and released https://github.com/cloudposse/terraform-aws-ec2-instance/releases/tag/v1.6.1

hamza-25

04:22:23 PM

Thanks a lot, both of you!

I can now get rid of my own local changes much appreciated

2024-11-20

2024-11-22

03:00:10 PM

Please review adding an option to use inline policies instead of managed policies for iam roles

https://github.com/cloudposse/terraform-aws-iam-role/pull/79

#79 feat: support inline policies

what

• support inline policies

why

• Most of the time users want to create a specific policy for a specific role and don’t realize that the same policy can be accidentally reused for another purpose which makes it difficult to delete the role and policy • Inline policies do not need to be tagged • If a managed policy is updated and is attached to multiple roles, now it will impact multiple roles

references

• closes #78

04:44:03 AM

Thanks!

#79 feat: support inline policies

what

• support inline policies

why

references

• closes #78

2024-11-26

z0rc3r

03:38:24 PM

https://github.com/cloudposse/terraform-aws-eks-cluster/pull/245 please

(feedback about ia bot review: it’s garbage and it wasted my time)

Gabriela Campana (Cloud Posse)

10:50:19 PM

@Igor Rodionov

Erik Osterman (Cloud Posse)

07:16:00 PM

@z0rc3r these were absolutely not garbage; these were great recommendations. We should strive for better variable descriptions, so the user has everything they need to use the variable.

Erik Osterman (Cloud Posse)

07:16:10 PM

As merged, the descriptions are incomplete.

Erik Osterman (Cloud Posse)

07:17:06 PM

We are using these suggestions ourselves, and improving the code quality of #amtos https://github.com/cloudposse/atmos/pulls?q=is%3Apr+is%3Aclosed+is%3Amerged

Erik Osterman (Cloud Posse)

07:18:12 PM

Plus, CodeRabbit learns from feedback. If you have constructive criticism of why they are not good, you can share that.

Erik Osterman (Cloud Posse)

07:18:37 PM

Case in point:

-  description = "Configuration block for the support policy to use for the cluster"
+  description = "Configuration block for the support policy to use for the cluster. The support_type can be set to 'Standard' or 'Extended' (default). For more information, see <https://docs.aws.amazon.com/eks/latest/userguide/service-quotas.html#service-quotas-eks-support>"

Erik Osterman (Cloud Posse)

07:18:46 PM

I prefer the second description over the first one.

z0rc3r

07:20:06 PM

one man’s trash is another man’s treasure

z0rc3r

07:20:28 PM

I’m not dealing with llm recomndations

z0rc3r

07:20:37 PM

Feel free to close the pr

Igor Rodionov

07:21:08 PM

@z0rc3r ups. I merged it couple hours ago

Igor Rodionov

07:21:29 PM

https://github.com/cloudposse/terraform-aws-eks-cluster/releases/tag/v4.5.0

Erik Osterman (Cloud Posse)

07:22:22 PM

Honestly, what’s bad about the recommendations? What’s trash about them?

z0rc3r

07:24:18 PM

Links are incorrect, recommended values use wrong capitalization. In the end they rehash the official aws docs in a slightly convoluted way.

Erik Osterman (Cloud Posse)

07:26:27 PM

Fair point about invalid links, if true. However, showing what the acceptable values are is

z0rc3r

07:28:54 PM

Hard disagree. My preference is to leave this on provider level, when module replicates behaviour from the provider. Which is the case here.

Erik Osterman (Cloud Posse)

07:29:39 PM

Thanks for sharing your views.

z0rc3r

07:30:39 PM

As an end user of other modules I often don’t care what is in variables description. I always read the code to understand it’s behaviour and how resources interact with each other.

z0rc3r

07:36:20 PM

General problem of any ai stuff that it can hallucinate things. So in the end it wastes the contributor’s time to do the fact checking. Plus it doesn’t add value. If you want to give more info just link to provider or aws docs.

Erik Osterman (Cloud Posse)

10:42:11 PM

Often terraform modules have such poor descriptions, reviewing the provider docs is required. But IMO a well-defined module has thoroughly documented variables, outputs, examples, terratests, and README. We often spend longer on everything else than the modules themselves.