#pr-reviews (2024-11)
Pull Request Reviews for Cloud Posse Projects
2024-11-01
what
Add support of zonal shift config for the cluster. Closes #242
why
See linked ticket.
references
See linked ticket.
additional info
<br>
to <br/>
changes are caused by new version of terraform-docs
and are intended, see terraform-docs/terraform-docs#787
2024-11-04
2024-11-06
2024-11-07
2024-11-08
Hey all, please review https://github.com/cloudposse/terraform-aws-rds-cluster/pull/236
…new instance
what
• output instance endpoints • add aws_rds_cluster_instance attributes that force a new instance to the randmon_pet resource.
why
• I need the actual instance endpoints for the Datadog DMS integration, the default dashboards work better with the exact instance identifier. • Currently if any of these attributes change (db_subnet_group_name, engine) it will bypass the random_pet and attempt to create instances with the same identifier.
references
can we revert this please? or at least put the last change behind a conditional. currently there is potential of data loss.
…new instance
what
• output instance endpoints • add aws_rds_cluster_instance attributes that force a new instance to the randmon_pet resource.
why
• I need the actual instance endpoints for the Datadog DMS integration, the default dashboards work better with the exact instance identifier. • Currently if any of these attributes change (db_subnet_group_name, engine) it will bypass the random_pet and attempt to create instances with the same identifier.
references
Describe the Bug
I’m currently using 1.12.0, which is good and all.
I’d like to upgrade to 1.14.0 which has a bug fix in an output that I’d like to utilize.
BUT a PR merged for 1.13.0 is causing the DB instance to recreate due to the way the attribute is defined (even if the end result value isn’t changing).
See that PR here #236 (review)
https://sweetops.slack.com/archives/CB6GHNLG0/p1733335202382489
Expected Behavior
It should not force a recreate of the resource. If it forces a recreate, it should be a MAJOR bump.
Steps to Reproduce
Set version to 1.13.0, will force replacement
image
Screenshots
No response
Environment
No response
Additional Context
No response
@Andriy Knysh (Cloud Posse)
@mihai.plesa This is a complicated issue and will take significant time to resolve, if it can be resolved at all. My best guess is a solution will involve some kind of hack.
All of which is to say, please be patient, and stick with v1.12.0 for now.
Perhaps we can revert the original PR causing this recreate, then implement that as a MAJOR bump with breaking change instead of just a minor
change
@Jeremy G (Cloud Posse)
I’m a bit rusty on RDS. @mihai.plesa @Yangci Ou please tell me, given that new DB instances will be created before the existing ones are deleted, how much of a problem is this upgrade going to cause you?
My recollection is that since Terraform creates new instances first, the data should get migrated to the new instances automatically before the old ones are deleted and the database should remain functional throughout. You probably want to specify extra long timeouts via timeouts_configuration
and take a snapshot first.
The PR does fix a real bug so it is kind of weird to make it optional or revert it. Moving it to a new major version does not really solve the problem if the automatic rotation of DB instances is an issue, because most people are going to want to upgrade, and Cloud Posse encourages PRs to be on the latest version (although earlier versions can be targeted for maintenance).
That said, it is a pretty rarely encountered bug, give than changing the engine destroys the entire cluster, so the only real issue is if you want to change the name of the db subnet. I could live with keeping this bug in the code and just documenting it.
@Erik Osterman (Cloud Posse) what are your thoughts?
Eh the problem is with RDS & Terraform, I’m pretty sure it doesn’t do the data migration as you mentioned - It does not automatically do this “Terraform creates new instances first, the data should get migrated to the new instances automatically before the old ones are deleted and the database should remain functional throughout.”
There’d be some manual intervention where take a final database snapshot, then spin up the new database on that snapshot. It’s a lot of effort for a no downtime blue green deployment just to upgrade a Terraform module version.
throwing random suggestions/idea: what if that attribute that # forces replacement
can be put into a lifecycle ignore block under a conditional? it’d create another variable to maintain backwards compatbility for folks that don’t want to go through the effort of recreating DB just to upgrade module 1.12 -> 1.13, but this requires documentation and maintainability with this such specific case
I have suggested the same. nobody should be in the position of getting downtime or potential full data loss just for something so minor.
all RDS eventually goes through engine upgrades/changes that forces recreation (which is why the other suggestion for MAJOR bump), but the key thing in this instance is that I don’t think people would want to go through the hassle for this recreate just for Terraform state to be happy
major bump would mean that users that want newer features or bug fixes in newer module versions have to still go through the danger. it should be behind a conditional or reverted fully.
~OK. The plan is to make it conditional by adding a feature flag:~ ~in the v1 module, the conditional defaults to the old behavior~ ~in the new v2 module, the conditional defaults to the new behavior, but you can set it to retain the old behavior~s a side note, @Yangci Ou, because this is our “cluster” module, these instances are inside a cluster, either an Aurora cluster or a Multi-AZ RDS cluster, and in either case AWS should migrate the data for you. I haven’t tested it, though, and I imagine it could take a long time with a big database if you are not using Aurora (where the storage is separate from the compute instances).
That sounds like a good plan.
Ah thanks for that side note info, I’m curious about that too so I’ll test that out peronsally in a dummy environment.
Upon further reflection, I’m going to just revert the changes unless I hear strong objections. Here’s why:
The added keepers (that cause the instances to be viably updated) break existing deployments, as noted, and that is bad. Granted, they fix a problem with deploying updated instances, but in every added case, ~the update also destroys and replaces the entire DB cluster~he change of the added variable also causes the the entire cluster to be destroyed and recreated.
Because of the high potential for data loss when replacing a cluster, I am actually in favor of this module failing when you try to do that. The workaround is to manually destroy the existing cluster first. That gives people fair warning that their data is at risk, and reduces the chance of accidental data loss because of misconfiguration.
@Erik Osterman (Cloud Posse)
So the decision is just to create a new version by reverting that PR, correct?
That is my recommendation, but I want to get consensus on it before moving ahead.
Sounds good. I’m good with any way as long as there’s a way to make it not recreate (either conditional variable or a specific version) while maintaining newer versions
Hey all, jumping in late here
I still have to read through all the comments above but I would like to stress THERE IS NO DATA LOSS
The data is part of the cluster not the individual instances
@Jeremy G (Cloud Posse) You state that “the update also destroys and replaces the entire DB cluster.”. That is not true for the changes in the latest MR
@Yangci Ou From what I can see you’re confusing the cluster with the instances, deleting the instance does not delete the data in the cluster.
And before I implemented the random_pet and the create_before_destroy any change that forced replacement of the instances would delete all the existing instances before creating new ones
@Ray Finch I’m sorry my wording was confusing.
The change you made in the most recent PR was to add to the keepers:
• cluster_identifier
• db_subnet_group_name
• engine Any change to those destroys and replaces the entire DB cluster along with the instances, which results in data loss.
Hi @Jeremy G (Cloud Posse) no, that is not correct. A change to those keepers only effects the instances NOT the entire cluster
And it uses create_before_destroy so the new instances are created before the old ones are deleted
we have run this against our production databases with all applications running
@Ray Finch This has nothing to do with the keepers or random_pet. Those variables also affect the cluster itself, and Terraform will replace the cluster when those variables change.
yeah, I see what you’re saying
The previous set of keepers were ones that affected only the instances, not the cluster.
although is that true for the cluster identifier in this case?
it’s a caclulated value
from two other resources
actually, not sure the statement above is true
although is that true for the cluster identifier in this case?
- Yes, when the cluster identifier is changed, that forces a replacement of the cluster
- Because the cluster identifier is part of the output of the random_pet (via the random_pet.prefix input), its output changes when the cluster identifier changes anyway, so it is as if cluster identifier were one of the keepers
the cluster identifier in this case is calculated as follows:
coalesce(join("", aws_rds_cluster.primary[*].id), join("", aws_rds_cluster.secondary[*].id))
so it could change and not have the cluster get recreated
also, according to the documentation db_subnet_group_name
does not force a replacement
and it’s also a calculated value
join("", aws_db_subnet_group.default[*].name)
From my test (output abbreviated)
# module.rds_cluster.aws_rds_cluster.primary[0] must be replaced
+/- resource "aws_rds_cluster" "primary" {
~ allocated_storage = 1 -> (known after apply)
~ arn = "arn:aws:rds:us-east-2:<redacted>:cluster:eg-test-rds-cluster-nuru" -> (known after apply)
...
~ db_subnet_group_name = "eg-test-rds-cluster-nuru" -> "nuru-rds-cluster" # forces replacement
ok, that’s not in the docs:
db_subnet_group_name - (Optional) DB subnet group to associate with this DB cluster. NOTE: This must match the db_subnet_group_name specified on every aws_rds_cluster_instance in the cluster.
The documentation on what changes force a replacement is lacking to non-existent.
So a couple of points
the main one being that none of my changes cause data loss
and reverting the PR is just transferring the pain to those already on v1.13.0 or higher
also, I would argue the code is correct in that if any of the keepers change you would want a new random pet
To recap:
• I agree that the changes in the PR are legitimate bug fixes, as I said in my earlier statements. However, they are not a complete fix for the issue and, after consideration, I prefer the older behavior because it reduces the chance of accidental data loss.
• The old and current module will fail if you change only the db_subnet_group_name
or engine
because it will try to replace the cluster but use the same cluster identifier, which is not allowed. This could be fixed, but because replacing the cluster results in data loss, I would prefer it to fail.
• The behavior of the old and current module when only the cluster identifier changes has remained the same. It will successfully replace the cluster. Therefore, as a practical matter, the changes in the PR do not change the behavior of the module, except that it forces new instances on anyone who had deployed them with an earlier version.
As it did with the 1.6 deploy
I do see your point in that the scenario where any of these change are unlikely
I for one wouldn’t migrate from mysql to postgres by changing engine
in my terraform module
I’m OK with removing any attribute from the random_pet
keepers that would force a cluster replacement anyway
OK, good, because as it happens, all the attributes you added in this PR fall under that category: they would force a cluster replacement anyway.
Rollback and other enhancements released as v1.15.0
thank you, will test soon
2024-11-10
2024-11-12
Hey all! Can I request some help / someone to look at Cloudposse’s terraform-aws-alb module, the CI is failing to release a new version after a most recent PR merge. It’s throwing an error about the README.md.gotmpl, but I’ve noticed that it hasn’t even been changed in the past 6 months. Hopefully someone with more context can help out? Here is the error https://github.com/cloudposse/terraform-aws-alb/pull/190#issuecomment-2470455978 - Thanks!
@Igor Rodionov
Cc @Gabriela Campana (Cloud Posse)
@Igor Rodionov please lmk when you address this
@Erik Osterman (Cloud Posse) May I have an approval https://github.com/cloudposse/build-harness/pull/390 ?
what
• Added explicit default value for README_INCLUDES
environment variables
why
• Starting from gomplate 4.2.0
includes required implicit definition starting with “./” or “/”
references
• https://sweetops.slack.com/archives/CUJPCP1K6/p1731416487743859 • https://github.com/cloudposse/terraform-aws-alb/actions/runs/11795616491/job/32869718771 • hairyhenderson/gomplate#2255
@Yangci Ou fixed and released https://github.com/cloudposse/terraform-aws-alb/releases/tag/v2.1.0
If you’re affected by this in any other repos, let us know and we can retrigger the builds
2024-11-13
Please review s3 replication role fix
https://github.com/cloudposse/terraform-aws-s3-bucket/pull/250
Thanks!
Please review correct usage of new destination.bucket
key
https://github.com/cloudposse/terraform-aws-s3-bucket/pull/256
Thank you!
Please review eventbridge support in event notifications
https://github.com/cloudposse/terraform-aws-s3-bucket/pull/255
Thank you!
2024-11-14
Please review fix for variable name
https://github.com/cloudposse/terraform-aws-s3-bucket/pull/258
Merge conflicts that must be resolved first
Thanks Veronika. I fixed the conflicts.
Before merging, i was hoping to chat with someone about this comment to see what the proper step is for this.
https://github.com/cloudposse/terraform-aws-s3-bucket/pull/258#pullrequestreview-2434928344
cc @Jeremy G (Cloud Posse) if you have a sec
@RB I would prefer to reject (or at least place on indefinite hold) this PR, because it is a breaking change that adds no new functionality
We can leave it open with do not merge
and incorporate it into the next major version update, but I don’t think it warrants a version update and breaking change on its own. I want to reserve those for new features that are incompatible with old features or removing deprecated features.
Yes i figured this but was wondering if we can do something like put the old var in variables-deprecated and if either is used, make it have the same impact, that way it would conform to cloudposse naming conventions without a breaking change, no?
It’s on my to-do list to document Cloud Posse’s policy on exactly that. @Erik Osterman (Cloud Posse) How do you want to handle this?
I think we need to use release branches and merge this into those future release branches
won’t adding the major
label have the same effect
@Erik Osterman (Cloud Posse) I meant, I would prefer to defer this change until we had more significant breaking changes, because I don’t have a problem with the current variable name. If you think it should go out now with a major version bump, then I will update the docs sooner rather than later about how to deprecate the old variable.
Couldn’t it go out without a major bump if the new var is added and the old vat is moved to variables-deprecated ?
Then later when the major release is ready, the variables-deprecated can be wiped ?
Please review s3 payment request config
https://github.com/cloudposse/terraform-aws-s3-bucket/pull/259
Thank you!
2024-11-15
2024-11-19
hey, remote-state module version bump: https://github.com/cloudposse-terraform-components/aws-spacelift-admin-stack/pull/4
we will merge this PR next week https://github.com/cloudposse-terraform-components/aws-spacelift-admin-stack/pull/4#issuecomment-2486671429
Hey all, Can I please have this PR reviewed https://github.com/cloudposse/terraform-aws-ec2-instance/pull/212
Currently blocking me and forcing me to go a longer way round to solve (adding resources on top of the localised module)
Thanks Gabriela, is there a possible timeline for when this can be reviewed and merged? thanks @Gabriela Campana (Cloud Posse) @Igor Rodionov
Hi @hamza-25 We can not promise on an ETA, as client work is prioritized over community support. I’ve tagged our engineers on the PRs. This is a best effort by a small team. We appreciate your persistence, and we’ll get to it when we can.
@hamza-25 thanks for your contribution. the PR merged and released https://github.com/cloudposse/terraform-aws-ec2-instance/releases/tag/v1.6.1
Thanks a lot, both of you!
I can now get rid of my own local changes much appreciated
2024-11-20
2024-11-22
Please review adding an option to use inline policies instead of managed policies for iam roles
https://github.com/cloudposse/terraform-aws-iam-role/pull/79
what
• support inline policies
why
• Most of the time users want to create a specific policy for a specific role and don’t realize that the same policy can be accidentally reused for another purpose which makes it difficult to delete the role and policy • Inline policies do not need to be tagged • If a managed policy is updated and is attached to multiple roles, now it will impact multiple roles
references
• closes #78
Thanks!
what
• support inline policies
why
• Most of the time users want to create a specific policy for a specific role and don’t realize that the same policy can be accidentally reused for another purpose which makes it difficult to delete the role and policy • Inline policies do not need to be tagged • If a managed policy is updated and is attached to multiple roles, now it will impact multiple roles
references
• closes #78
2024-11-26
https://github.com/cloudposse/terraform-aws-eks-cluster/pull/245 please
(feedback about ia bot review: it’s garbage and it wasted my time)
@Igor Rodionov
@z0rc3r these were absolutely not garbage; these were great recommendations. We should strive for better variable descriptions, so the user has everything they need to use the variable.
As merged, the descriptions are incomplete.
We are using these suggestions ourselves, and improving the code quality of #amtos https://github.com/cloudposse/atmos/pulls?q=is%3Apr+is%3Aclosed+is%3Amerged
Plus, CodeRabbit learns from feedback. If you have constructive criticism of why they are not good, you can share that.
Case in point:
- description = "Configuration block for the support policy to use for the cluster"
+ description = "Configuration block for the support policy to use for the cluster. The support_type can be set to 'Standard' or 'Extended' (default). For more information, see <https://docs.aws.amazon.com/eks/latest/userguide/service-quotas.html#service-quotas-eks-support>"
I prefer the second description over the first one.
I’m not dealing with llm recomndations
Feel free to close the pr
@z0rc3r ups. I merged it couple hours ago
Honestly, what’s bad about the recommendations? What’s trash about them?
Links are incorrect, recommended values use wrong capitalization. In the end they rehash the official aws docs in a slightly convoluted way.
Fair point about invalid links, if true. However, showing what the acceptable values are is
Hard disagree. My preference is to leave this on provider level, when module replicates behaviour from the provider. Which is the case here.
Thanks for sharing your views.
As an end user of other modules I often don’t care what is in variables description. I always read the code to understand it’s behaviour and how resources interact with each other.
General problem of any ai stuff that it can hallucinate things. So in the end it wastes the contributor’s time to do the fact checking. Plus it doesn’t add value. If you want to give more info just link to provider or aws docs.
Often terraform modules have such poor descriptions, reviewing the provider docs is required. But IMO a well-defined module has thoroughly documented variables, outputs, examples, terratests, and README. We often spend longer on everything else than the modules themselves.
@johncblandii
Could it be merged please? Test passed, @RB also approved it already
@johncblandii bumping this up
@Andriy Knysh (Cloud Posse)