#pr-reviews (2024-11)

Pull Request Reviews for Cloud Posse Projects

2024-11-01

z0rc3r avatar
#243 Add support for zonal shift configuration

what

Add support of zonal shift config for the cluster. Closes #242

why

See linked ticket.

references

See linked ticket.

additional info

<br> to <br/> changes are caused by new version of terraform-docs and are intended, see terraform-docs/terraform-docs#787

1

2024-11-04

2024-11-06

2024-11-07

2024-11-08

Ray Finch avatar
Ray Finch
#236 output instance endpoints, add attributes to random_pet that force a …

…new instance

what

• output instance endpoints • add aws_rds_cluster_instance attributes that force a new instance to the randmon_pet resource.

why

• I need the actual instance endpoints for the Datadog DMS integration, the default dashboards work better with the exact instance identifier. • Currently if any of these attributes change (db_subnet_group_name, engine) it will bypass the random_pet and attempt to create instances with the same identifier.

references

1
mihai.plesa avatar
mihai.plesa

can we revert this please? or at least put the last change behind a conditional. currently there is potential of data loss.

#236 output instance endpoints, add attributes to random_pet that force a …

…new instance

what

• output instance endpoints • add aws_rds_cluster_instance attributes that force a new instance to the randmon_pet resource.

why

• I need the actual instance endpoints for the Datadog DMS integration, the default dashboards work better with the exact instance identifier. • Currently if any of these attributes change (db_subnet_group_name, engine) it will bypass the random_pet and attempt to create instances with the same identifier.

references

1
mihai.plesa avatar
mihai.plesa
#241 1.13.0 is forcing a recreate of the cluster instance

Describe the Bug

I’m currently using 1.12.0, which is good and all.

I’d like to upgrade to 1.14.0 which has a bug fix in an output that I’d like to utilize.

BUT a PR merged for 1.13.0 is causing the DB instance to recreate due to the way the attribute is defined (even if the end result value isn’t changing).

See that PR here #236 (review)

https://sweetops.slack.com/archives/CB6GHNLG0/p1733335202382489

Expected Behavior

It should not force a recreate of the resource. If it forces a recreate, it should be a MAJOR bump.

Steps to Reproduce

Set version to 1.13.0, will force replacement
image

Screenshots

No response

Environment

No response

Additional Context

No response

Gabriela Campana (Cloud Posse) avatar
Gabriela Campana (Cloud Posse)

@Andriy Knysh (Cloud Posse)

Jeremy G (Cloud Posse) avatar
Jeremy G (Cloud Posse)

@mihai.plesa This is a complicated issue and will take significant time to resolve, if it can be resolved at all. My best guess is a solution will involve some kind of hack.

All of which is to say, please be patient, and stick with v1.12.0 for now.

Yangci Ou avatar
Yangci Ou

Perhaps we can revert the original PR causing this recreate, then implement that as a MAJOR bump with breaking change instead of just a minor change

Gabriela Campana (Cloud Posse) avatar
Gabriela Campana (Cloud Posse)

@Jeremy G (Cloud Posse)

Jeremy G (Cloud Posse) avatar
Jeremy G (Cloud Posse)

I’m a bit rusty on RDS. @mihai.plesa @Yangci Ou please tell me, given that new DB instances will be created before the existing ones are deleted, how much of a problem is this upgrade going to cause you?

My recollection is that since Terraform creates new instances first, the data should get migrated to the new instances automatically before the old ones are deleted and the database should remain functional throughout. You probably want to specify extra long timeouts via timeouts_configuration and take a snapshot first.

The PR does fix a real bug so it is kind of weird to make it optional or revert it. Moving it to a new major version does not really solve the problem if the automatic rotation of DB instances is an issue, because most people are going to want to upgrade, and Cloud Posse encourages PRs to be on the latest version (although earlier versions can be targeted for maintenance).

That said, it is a pretty rarely encountered bug, give than changing the engine destroys the entire cluster, so the only real issue is if you want to change the name of the db subnet. I could live with keeping this bug in the code and just documenting it.

@Erik Osterman (Cloud Posse) what are your thoughts?

Yangci Ou avatar
Yangci Ou

Eh the problem is with RDS & Terraform, I’m pretty sure it doesn’t do the data migration as you mentioned - It does not automatically do this “Terraform creates new instances first, the data should get migrated to the new instances automatically before the old ones are deleted and the database should remain functional throughout.”

There’d be some manual intervention where take a final database snapshot, then spin up the new database on that snapshot. It’s a lot of effort for a no downtime blue green deployment just to upgrade a Terraform module version.

Yangci Ou avatar
Yangci Ou

throwing random suggestions/idea: what if that attribute that # forces replacement can be put into a lifecycle ignore block under a conditional? it’d create another variable to maintain backwards compatbility for folks that don’t want to go through the effort of recreating DB just to upgrade module 1.12 -> 1.13, but this requires documentation and maintainability with this such specific case

mihai.plesa avatar
mihai.plesa

I have suggested the same. nobody should be in the position of getting downtime or potential full data loss just for something so minor.

Yangci Ou avatar
Yangci Ou

all RDS eventually goes through engine upgrades/changes that forces recreation (which is why the other suggestion for MAJOR bump), but the key thing in this instance is that I don’t think people would want to go through the hassle for this recreate just for Terraform state to be happy

mihai.plesa avatar
mihai.plesa

major bump would mean that users that want newer features or bug fixes in newer module versions have to still go through the danger. it should be behind a conditional or reverted fully.

Jeremy G (Cloud Posse) avatar
Jeremy G (Cloud Posse)

~OK. The plan is to make it conditional by adding a feature flag:~ ~in the v1 module, the conditional defaults to the old behavior~ ~in the new v2 module, the conditional defaults to the new behavior, but you can set it to retain the old behavior~s a side note, @Yangci Ou, because this is our “cluster” module, these instances are inside a cluster, either an Aurora cluster or a Multi-AZ RDS cluster, and in either case AWS should migrate the data for you. I haven’t tested it, though, and I imagine it could take a long time with a big database if you are not using Aurora (where the storage is separate from the compute instances).

1
Yangci Ou avatar
Yangci Ou

That sounds like a good plan.

Ah thanks for that side note info, I’m curious about that too so I’ll test that out peronsally in a dummy environment.

Jeremy G (Cloud Posse) avatar
Jeremy G (Cloud Posse)

Upon further reflection, I’m going to just revert the changes unless I hear strong objections. Here’s why:

The added keepers (that cause the instances to be viably updated) break existing deployments, as noted, and that is bad. Granted, they fix a problem with deploying updated instances, but in every added case, ~the update also destroys and replaces the entire DB cluster~he change of the added variable also causes the the entire cluster to be destroyed and recreated.

Because of the high potential for data loss when replacing a cluster, I am actually in favor of this module failing when you try to do that. The workaround is to manually destroy the existing cluster first. That gives people fair warning that their data is at risk, and reduces the chance of accidental data loss because of misconfiguration.

Jeremy G (Cloud Posse) avatar
Jeremy G (Cloud Posse)

@Erik Osterman (Cloud Posse)

Yangci Ou avatar
Yangci Ou

So the decision is just to create a new version by reverting that PR, correct?

Jeremy G (Cloud Posse) avatar
Jeremy G (Cloud Posse)

That is my recommendation, but I want to get consensus on it before moving ahead.

Yangci Ou avatar
Yangci Ou

Sounds good. I’m good with any way as long as there’s a way to make it not recreate (either conditional variable or a specific version) while maintaining newer versions

Ray Finch avatar
Ray Finch

Hey all, jumping in late here

Ray Finch avatar
Ray Finch

I still have to read through all the comments above but I would like to stress THERE IS NO DATA LOSS

Ray Finch avatar
Ray Finch

The data is part of the cluster not the individual instances

Ray Finch avatar
Ray Finch

@Jeremy G (Cloud Posse) You state that “the update also destroys and replaces the entire DB cluster.”. That is not true for the changes in the latest MR

Ray Finch avatar
Ray Finch

@Yangci Ou From what I can see you’re confusing the cluster with the instances, deleting the instance does not delete the data in the cluster.

Ray Finch avatar
Ray Finch

And before I implemented the random_pet and the create_before_destroy any change that forced replacement of the instances would delete all the existing instances before creating new ones

Jeremy G (Cloud Posse) avatar
Jeremy G (Cloud Posse)

@Ray Finch I’m sorry my wording was confusing.

The change you made in the most recent PR was to add to the keepers:

• cluster_identifier

• db_subnet_group_name

• engine Any change to those destroys and replaces the entire DB cluster along with the instances, which results in data loss.

Ray Finch avatar
Ray Finch

Hi @Jeremy G (Cloud Posse) no, that is not correct. A change to those keepers only effects the instances NOT the entire cluster

Ray Finch avatar
Ray Finch

And it uses create_before_destroy so the new instances are created before the old ones are deleted

Ray Finch avatar
Ray Finch

we have run this against our production databases with all applications running

Jeremy G (Cloud Posse) avatar
Jeremy G (Cloud Posse)

@Ray Finch This has nothing to do with the keepers or random_pet. Those variables also affect the cluster itself, and Terraform will replace the cluster when those variables change.

Ray Finch avatar
Ray Finch

Ray Finch avatar
Ray Finch

yeah, I see what you’re saying

Jeremy G (Cloud Posse) avatar
Jeremy G (Cloud Posse)

The previous set of keepers were ones that affected only the instances, not the cluster.

Ray Finch avatar
Ray Finch

although is that true for the cluster identifier in this case?

Ray Finch avatar
Ray Finch

it’s a caclulated value

Ray Finch avatar
Ray Finch

from two other resources

Ray Finch avatar
Ray Finch

actually, not sure the statement above is true

Jeremy G (Cloud Posse) avatar
Jeremy G (Cloud Posse)

although is that true for the cluster identifier in this case?

  1. Yes, when the cluster identifier is changed, that forces a replacement of the cluster
  2. Because the cluster identifier is part of the output of the random_pet (via the random_pet.prefix input), its output changes when the cluster identifier changes anyway, so it is as if cluster identifier were one of the keepers
Ray Finch avatar
Ray Finch

the cluster identifier in this case is calculated as follows:

coalesce(join("", aws_rds_cluster.primary[*].id), join("", aws_rds_cluster.secondary[*].id))
Ray Finch avatar
Ray Finch

so it could change and not have the cluster get recreated

Ray Finch avatar
Ray Finch

also, according to the documentation db_subnet_group_name does not force a replacement

Ray Finch avatar
Ray Finch

and it’s also a calculated value

join("", aws_db_subnet_group.default[*].name)
Jeremy G (Cloud Posse) avatar
Jeremy G (Cloud Posse)

From my test (output abbreviated)

  # module.rds_cluster.aws_rds_cluster.primary[0] must be replaced
+/- resource "aws_rds_cluster" "primary" {
      ~ allocated_storage                     = 1 -> (known after apply)
      ~ arn                                   = "arn:aws:rds:us-east-2:<redacted>:cluster:eg-test-rds-cluster-nuru" -> (known after apply)

...

      ~ db_subnet_group_name                  = "eg-test-rds-cluster-nuru" -> "nuru-rds-cluster" # forces replacement
1
Ray Finch avatar
Ray Finch

ok, that’s not in the docs:

db_subnet_group_name - (Optional) DB subnet group to associate with this DB cluster. NOTE: This must match the db_subnet_group_name specified on every aws_rds_cluster_instance in the cluster.
Jeremy G (Cloud Posse) avatar
Jeremy G (Cloud Posse)

The documentation on what changes force a replacement is lacking to non-existent.

1
Ray Finch avatar
Ray Finch

So a couple of points

Ray Finch avatar
Ray Finch

the main one being that none of my changes cause data loss

Ray Finch avatar
Ray Finch

and reverting the PR is just transferring the pain to those already on v1.13.0 or higher

Ray Finch avatar
Ray Finch

also, I would argue the code is correct in that if any of the keepers change you would want a new random pet

Jeremy G (Cloud Posse) avatar
Jeremy G (Cloud Posse)

To recap: • I agree that the changes in the PR are legitimate bug fixes, as I said in my earlier statements. However, they are not a complete fix for the issue and, after consideration, I prefer the older behavior because it reduces the chance of accidental data loss. • The old and current module will fail if you change only the db_subnet_group_name or engine because it will try to replace the cluster but use the same cluster identifier, which is not allowed. This could be fixed, but because replacing the cluster results in data loss, I would prefer it to fail.

• The behavior of the old and current module when only the cluster identifier changes has remained the same. It will successfully replace the cluster. Therefore, as a practical matter, the changes in the PR do not change the behavior of the module, except that it forces new instances on anyone who had deployed them with an earlier version.

Ray Finch avatar
Ray Finch

As it did with the 1.6 deploy

Ray Finch avatar
Ray Finch

I do see your point in that the scenario where any of these change are unlikely

Ray Finch avatar
Ray Finch

I for one wouldn’t migrate from mysql to postgres by changing engine in my terraform module

Ray Finch avatar
Ray Finch

I’m OK with removing any attribute from the random_pet keepers that would force a cluster replacement anyway

Jeremy G (Cloud Posse) avatar
Jeremy G (Cloud Posse)

OK, good, because as it happens, all the attributes you added in this PR fall under that category: they would force a cluster replacement anyway.

Jeremy G (Cloud Posse) avatar
Jeremy G (Cloud Posse)

Rollback and other enhancements released as v1.15.0

mihai.plesa avatar
mihai.plesa

thank you, will test soon

mihai.plesa avatar
mihai.plesa

looking good to me

1
Yangci Ou avatar
Yangci Ou

Just upgraded. Looks good to me too. Thanks all!

1

2024-11-10

2024-11-12

Yangci Ou avatar
Yangci Ou

Hey all! Can I request some help / someone to look at Cloudposse’s terraform-aws-alb module, the CI is failing to release a new version after a most recent PR merge. It’s throwing an error about the README.md.gotmpl, but I’ve noticed that it hasn’t even been changed in the past 6 months. Hopefully someone with more context can help out? Here is the error https://github.com/cloudposse/terraform-aws-alb/pull/190#issuecomment-2470455978 - Thanks!

1
1
Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

@Igor Rodionov

Cc @Gabriela Campana (Cloud Posse)

1
1
Gabriela Campana (Cloud Posse) avatar
Gabriela Campana (Cloud Posse)

@Igor Rodionov please lmk when you address this

Igor Rodionov avatar
Igor Rodionov

@Erik Osterman (Cloud Posse) May I have an approval https://github.com/cloudposse/build-harness/pull/390 ?

#390 [readme] Fix includes

what

• Added explicit default value for README_INCLUDES environment variables

why

• Starting from gomplate 4.2.0 includes required implicit definition starting with “./” or “/”

references

https://sweetops.slack.com/archives/CUJPCP1K6/p1731416487743859https://github.com/cloudposse/terraform-aws-alb/actions/runs/11795616491/job/32869718771hairyhenderson/gomplate#2255

1
Yangci Ou avatar
Yangci Ou

awesome, thanks so much! appreciate the quick help!!

1
Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

If you’re affected by this in any other repos, let us know and we can retrigger the builds

1
1
1
Gabriela Campana (Cloud Posse) avatar
Gabriela Campana (Cloud Posse)

Please tag me, if it happens once again, @Yangci Ou

1
1

2024-11-13

RB avatar

Please review correct usage of new destination.bucket key

https://github.com/cloudposse/terraform-aws-s3-bucket/pull/256

1
RB avatar

Thank you!

RB avatar

Please review eventbridge support in event notifications

https://github.com/cloudposse/terraform-aws-s3-bucket/pull/255

1
RB avatar

Thank you!

2024-11-14

RB avatar
Veronika Gnilitska avatar
Veronika Gnilitska

Merge conflicts that must be resolved first

RB avatar

Thanks Veronika. I fixed the conflicts.

Before merging, i was hoping to chat with someone about this comment to see what the proper step is for this.

https://github.com/cloudposse/terraform-aws-s3-bucket/pull/258#pullrequestreview-2434928344

cc @Jeremy G (Cloud Posse) if you have a sec

Jeremy G (Cloud Posse) avatar
Jeremy G (Cloud Posse)

@RB I would prefer to reject (or at least place on indefinite hold) this PR, because it is a breaking change that adds no new functionality

We can leave it open with do not merge and incorporate it into the next major version update, but I don’t think it warrants a version update and breaking change on its own. I want to reserve those for new features that are incompatible with old features or removing deprecated features.

RB avatar

Yes i figured this but was wondering if we can do something like put the old var in variables-deprecated and if either is used, make it have the same impact, that way it would conform to cloudposse naming conventions without a breaking change, no?

Jeremy G (Cloud Posse) avatar
Jeremy G (Cloud Posse)

It’s on my to-do list to document Cloud Posse’s policy on exactly that. @Erik Osterman (Cloud Posse) How do you want to handle this?

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

I think we need to use release branches and merge this into those future release branches

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

won’t adding the major label have the same effect

Jeremy G (Cloud Posse) avatar
Jeremy G (Cloud Posse)

@Erik Osterman (Cloud Posse) I meant, I would prefer to defer this change until we had more significant breaking changes, because I don’t have a problem with the current variable name. If you think it should go out now with a major version bump, then I will update the docs sooner rather than later about how to deprecate the old variable.

RB avatar

Couldn’t it go out without a major bump if the new var is added and the old vat is moved to variables-deprecated ?

Then later when the major release is ready, the variables-deprecated can be wiped ?

RB avatar
1
RB avatar

Thank you!

2024-11-15

2024-11-19

Michal Tomaszek avatar
Michal Tomaszek
Michal Tomaszek avatar
Michal Tomaszek

np, got this info from Igor already thanks!

1
hamza-25 avatar
hamza-25

Hey all, Can I please have this PR reviewed https://github.com/cloudposse/terraform-aws-ec2-instance/pull/212

Currently blocking me and forcing me to go a longer way round to solve (adding resources on top of the localised module)

1
Gabriela Campana (Cloud Posse) avatar
Gabriela Campana (Cloud Posse)

@Igor Rodionov

1
hamza-25 avatar
hamza-25

Thanks Gabriela, is there a possible timeline for when this can be reviewed and merged? thanks @Gabriela Campana (Cloud Posse) @Igor Rodionov

Gabriela Campana (Cloud Posse) avatar
Gabriela Campana (Cloud Posse)

Hi @hamza-25 We can not promise on an ETA, as client work is prioritized over community support. I’ve tagged our engineers on the PRs. This is a best effort by a small team. We appreciate your persistence, and we’ll get to it when we can.

1
Igor Rodionov avatar
Igor Rodionov

@hamza-25 thanks for your contribution. the PR merged and released https://github.com/cloudposse/terraform-aws-ec2-instance/releases/tag/v1.6.1

1
hamza-25 avatar
hamza-25

Thanks a lot, both of you!

I can now get rid of my own local changes much appreciated

1

2024-11-20

2024-11-22

RB avatar

Please review adding an option to use inline policies instead of managed policies for iam roles

https://github.com/cloudposse/terraform-aws-iam-role/pull/79

#79 feat: support inline policies

what

• support inline policies

why

• Most of the time users want to create a specific policy for a specific role and don’t realize that the same policy can be accidentally reused for another purpose which makes it difficult to delete the role and policy • Inline policies do not need to be tagged • If a managed policy is updated and is attached to multiple roles, now it will impact multiple roles

references

• closes #78

1
1
RB avatar

Thanks!

#79 feat: support inline policies

what

• support inline policies

why

• Most of the time users want to create a specific policy for a specific role and don’t realize that the same policy can be accidentally reused for another purpose which makes it difficult to delete the role and policy • Inline policies do not need to be tagged • If a managed policy is updated and is attached to multiple roles, now it will impact multiple roles

references

• closes #78

2024-11-26

z0rc3r avatar

https://github.com/cloudposse/terraform-aws-eks-cluster/pull/245 please

(feedback about ia bot review: it’s garbage and it wasted my time)

1
Gabriela Campana (Cloud Posse) avatar
Gabriela Campana (Cloud Posse)

@Igor Rodionov

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

@z0rc3r these were absolutely not garbage; these were great recommendations. We should strive for better variable descriptions, so the user has everything they need to use the variable.

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

As merged, the descriptions are incomplete.

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

We are using these suggestions ourselves, and improving the code quality of #amtos https://github.com/cloudposse/atmos/pulls?q=is%3Apr+is%3Aclosed+is%3Amerged

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

Plus, CodeRabbit learns from feedback. If you have constructive criticism of why they are not good, you can share that.

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

Case in point:

-  description = "Configuration block for the support policy to use for the cluster"
+  description = "Configuration block for the support policy to use for the cluster. The support_type can be set to 'Standard' or 'Extended' (default). For more information, see <https://docs.aws.amazon.com/eks/latest/userguide/service-quotas.html#service-quotas-eks-support>"
Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

I prefer the second description over the first one.

z0rc3r avatar

one man’s trash is another man’s treasure

1
z0rc3r avatar

I’m not dealing with llm recomndations

z0rc3r avatar

Feel free to close the pr

Igor Rodionov avatar
Igor Rodionov

@z0rc3r ups. I merged it couple hours ago

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

Honestly, what’s bad about the recommendations? What’s trash about them?

z0rc3r avatar

Links are incorrect, recommended values use wrong capitalization. In the end they rehash the official aws docs in a slightly convoluted way.

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

Fair point about invalid links, if true. However, showing what the acceptable values are is

z0rc3r avatar

Hard disagree. My preference is to leave this on provider level, when module replicates behaviour from the provider. Which is the case here.

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

Thanks for sharing your views.

z0rc3r avatar

As an end user of other modules I often don’t care what is in variables description. I always read the code to understand it’s behaviour and how resources interact with each other.

z0rc3r avatar

General problem of any ai stuff that it can hallucinate things. So in the end it wastes the contributor’s time to do the fact checking. Plus it doesn’t add value. If you want to give more info just link to provider or aws docs.

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

Often terraform modules have such poor descriptions, reviewing the provider docs is required. But IMO a well-defined module has thoroughly documented variables, outputs, examples, terratests, and README. We often spend longer on everything else than the modules themselves.

2
Moritz avatar
1
Gabriela Campana (Cloud Posse) avatar
Gabriela Campana (Cloud Posse)

@johncblandii

Moritz avatar

Could it be merged please? Test passed, @RB also approved it already

Gabriela Campana (Cloud Posse) avatar
Gabriela Campana (Cloud Posse)

@johncblandii bumping this up

Gabriela Campana (Cloud Posse) avatar
Gabriela Campana (Cloud Posse)

@Andriy Knysh (Cloud Posse)

Moritz avatar

Thanks!

1

2024-11-27

2024-11-28

2024-11-29

    keyboard_arrow_up