#terraform (2022-11)
Discussions related to Terraform or Terraform Modules
Archive: https://archive.sweetops.com/terraform/
2022-11-01
2022-11-02
v1.3.4 1.3.4 (November 02, 2022) BUG FIXES: Fix invalid refresh-only plan caused by data sources being deferred to apply (#32111) Optimize the handling of condition checks during apply to prevent performance regressions with large numbers of instances (<a href=”https://github.com/hashicorp/terraform/issues/32123“…
When executing a refresh-only plan, it is not valid to plan a data source read. If the data source config is not known during planning, the only valid update would be the prior state, if there is a…
The handling of conditions during apply can have adverse affects on very large configs, due to the handling of all existing instances even when there are no changes. With extremely large configurat…
Hello, looking for opinions: should the application create sqs queues or let terraform manage them? I say terraform should manage them since it’s infrastructure. New dev manager wants the app to manage them. What do you think?
Is the sqs queue ephemeral or persistent?
Persistent
if it’s persistent, then it sounds like terraform would make more sense
you may want to ask why your manager wants the app to manage sqs. I’d pull on that thread to see what problems he’s trying to workaround.
To avoid having to run terraform before a deployment is his concern.
how does he feel about vpcs, subnets, eks clusters, peering, s3 buckets, etc
or what about shared infra vs application infra
it would be easier to just deploy the infra required once and then deploy the app as many times as you like
less code to maintain, less likely a chance to break something, and now you no longer need sqs:create perms on the iam role of the app
also less code to unit test
2022-11-03
Hi All, I’m also in the mood to get some different opinions.. What does others do with their terraform modules?
The flip side to this question is how do you version/track the updates when breaking changes are introduced?
I am a firm proponent of using release tags in module repos and then using those tags to install the module. I’ve been burned in the past by using main
or something similar and then having a breaking change bring plan
and apply
to a halt.
Thanks everyone, appreciate the answers.. For those who keep up to date at all times, do you dependabot or a similar service?
ManagedKaos, we got release tags on everything, that’s why we are falling behind, at the moment I’m not notified on new releases
Renovate here: https://github.com/renovatebot/renovate
Universal dependency update tool that fits into your workflows.
Hi All, does anyone know how to configure single cloudtrail with multiple data source (like S3, Lambda, DynamoDB) events using the terraform.
2022-11-04
I made a thing if anyone is looking for RDS IAM Authentication done with Terraform. https://gitlab.com/setheryops/rds-iam-auth
Creates a simple RDS instance and then uses IAM authentication to give access to other users in your org based on role assumption.
awesome write up, that 200 connects/sec limitations seems interesting. I also wanted to mention that you should take a look at AWS session manager so that you don’t have to create a bastion open to the public internet
Creates a simple RDS instance and then uses IAM authentication to give access to other users in your org based on role assumption.
2022-11-05
2022-11-07
Does anyone know any information about when associate 003 will be available? I’m looking to get certified but don’t really want to sit 002 if 003 is coming soon
Hey everyone! When it comes to having a git-ops workflow for applying terraform changes, do most people squash & merge code before running apply or vice versa? I’m setting up CI/CD terraform workflows for multiple repositories and keep running into this philosophical question. It would be great to hear any friction points people have experienced in either setup
i generally avoid squash merge, and just accept the merge commit so git sees the same commit is present in both the pr branch and in the main branch, but i don’t think that detail is the primary question here
we currently run a plan as part of the pr review, then merge, then run apply
but i’ve seen enough problems that only occur on apply, that i definitely see the value in the way atlantis does it by default, and run the apply just before merging the pr
that way you have a chance to fix any apply-time issues before merging the work
Running apply before merge is what i’m leaning towards because of that very reason. It’s fairly often that we have to do some amount of tinkering before the apply is successful.
In your current flow, are there situations where other developers working within the same project are blocked because the terraform code on the main branch has not been cleanly applied? How do you manage this problem?
sure, you can get blocked by a broken apply either way. in both cases, it is important to keep the “contributor” (developer or otherwise) engaged in the result and any follow-on actions
terraform apply is almost inherently serial. so in either approach, before applying, any pr needs to be up-to-date with fully resolved conflicts, and have a clean, approved, understood plan
Yeah that’s a fair point about it being serial. I guess the apply before merge route makes that more intentional.
Did you all ever experiment with using a merge queue for applying changes?
You may be able to limit the impact of the serial nature by splitting things into lots of states… One state may be blocked due to an apply-time issue, but devs working other, independent states may not be
agreed, the smaller the tf projects the less of a problem this becomes. Unfortunately we have a fair number of large, legacy projects in a monorepo that need to be broken down
Heh, it’s hard to find a good balance between small states and a workflow that makes sense and is manageable for your team
I was just reading a blog by slack on how they do it, and it sounded like a lot of large states lol, and still needed a lot of home-built tooling to manage it all
At Slack, we use Terraform for managing our Infrastructure, which runs on AWS, DigitalOcean, NS1, and GCP. Even though most of our infrastructure is running on AWS, we have chosen to use Terraform as opposed to using an AWS-native service such as CloudFormation so that we can use a single tool across all of our …
I read this one last week as well, it sounded pretty painful actually
but that’s probably because I can’t help but to think of my experiences with Jenkins while reading their blog
Yes, I noped pretty hard. But also, I feel a lot of their pain lol. These workflows are hard, no matter the tool!
So true!
Does anyone know if it’s possible to set the name of an access point created with terraform-aws-efs? It seems to want to name every access point with the same name as the filesystem itself which isn’t helpful to identifying which access point is which by name.
Always interesting to read about how others run terraform, especially large teams with large infrastructures… https://slack.engineering/how-we-use-terraform-at-slack/
At Slack, we use Terraform for managing our Infrastructure, which runs on AWS, DigitalOcean, NS1, and GCP. Even though most of our infrastructure is running on AWS, we have chosen to use Terraform as opposed to using an AWS-native service such as CloudFormation so that we can use a single tool across all of our …
Hi guys I’d like to ask if anyone knows what is the best way to import existing aws infrastructure to terraform?
Probably use terraformer
Best thing to do is to use terraformer but dont commit its output and wipe your hands.
For example, if you want to terraform all of your hosted zone records, you do not want to dump all the records and put it in main.tf and call it done
You want to put each record with its respective service and use the output of terraformer as a template
@Arash Bahrami does that make sense?
@RB wow thanks I’ll try it out also thanks for the advice, yeah I’d rather a clean and understandable directory with each service in different files
You should understand, though, that using terraformer only gives you a representation of existing resources as code.
it does not do an import right out of the gate. (at least not that I am aware of).
So if you want to manage/control your existing resources with TF, you will still need to import them into the TF state.
In my experience, terraformer is excellent for:
• getting a copy of something that is already running, perhaps built in the console, so you can customize it as a module and deploy it elsewhere (vs managing the thing that is already deployed.)
• getting a copy of existing code that is not backed up as IaC just in case you have to build it over again. Essentially the generated code becomes a backup or a reference.
2022-11-08
Hello there I’m looking at using the amazing terraform-aws-elasticache-redis module, but stubbled across mission option to disable the *Auto upgrade minor versions* option. I’ve seen the previously created issues on this (#117, #70), which were raised when the parameter was not editable from API, which is no longer the case, based on the API documentation. It works as expected for cluster versions 6+. I’ve created an issue as well #182. Thank you for creating and maintaining this module
Creates a Redis (cluster mode disabled) or a Redis (cluster mode enabled) replication group.
looks like you could submit a pull request to add a variable and pass the value through to auto_minor_version_upgrade
around here: https://github.com/cloudposse/terraform-aws-elasticache-redis/blob/master/main.tf#L115
resource "aws_elasticache_replication_group" "default" {
Thanks Alex, Will try that
Yes please, we always encourage contributions!
After clone the repo ECS, it gives this variable error. Of course, it can be fix with a sed and replace for the proper variable declaration. This give me the doubt if the repo is outdated?
I post some weeks ago about I can help and do some PR to suggest this fixes. But I got no attention. I was looking just to collaborate with this amazing CloudPosse infra.
What is the ecs repo
ecs_cluster
Link please?
Terraform module for provisioning an ECS cluster
Please put in a pr if you see an issue. We love pull requests!
Ohh, sure, thanks.
Also why are you using an ancient version of terraform
If you use the latest or at least 0.12 you wouldnt see the issue
Laziness
dont be lazy, use the non-beta terraform
i’m finding that specifying role_arn
here is ignored for the s3 bucket. it’s used for the dynamo lock lookup, but for the s3 bucket init, it doesn’t use the role_arn
defined here https://www.terraform.io/language/settings/backends/s3#assume-role-configuration
and instead uses the local credentials.
Terraform can store state remotely in S3 and lock that state with DynamoDB.
That is very interesting. I figured the role_arn
was used to initialize the tfstate specifically for the bucket.
Terraform can store state remotely in S3 and lock that state with DynamoDB.
how can i double confirm ? i’m running a terragrunt init
now with log-level set to debug
hopefully that shows me which role it uses for state lookup?
id use native terraform to verify it, that way you can reduce the number of additional variables in the equation
do you have an assume_role_policy or just a role_arn?
just a role_arn
please scrap what i said earlier.
it is definitely using the role_arn
but the init fails unless i give my local aws creds permissions to read the bucket
to clarify, i have an atlantis pod in EKS in AWS account A.
the remote_state config has a role_arn
specified which exists in Account B.
the init fails unless Account A has access to the bucket, even though Account B has full access to the bucket.
is there documentation on this?
this is pure assume role stuff, no Atlantis related
and assume role policy requires the Trust relationship in account B and a explicit allow policy to assume:role action on the Role used to assume in account A
plus a trust policy to assume the service role
then it should work
definitely not atlantis related, i was asking if there was terragrunt docs available
and assume role policy requires the Trust relationship in account B and a explicit allow policy to assume:role action on the Role used to assume in account A
Yes this is all done. but still fails unless i add a bucket policy that allows Account A perms to the bucket. even though the role_arn is specifying Account B
I’m doing this right now and I’m using Account B role in it
but I had and issue like you using ECS and I was using the exec role instead of the Task role(which is the one that assumes roles)
maybe you have a similar issue?
but I had and issue like you using ECS and I was using the exec role instead of the Task role(which is the one that assumes roles)
can you explain this further please?
in ECS you have two roles : one for the task execution (init), one for the running task ( task role)
the exec role is for thing like giving access to pull from a registry, or s3 or something at the START/INITIALIZATION of the container lifecycle
the TASK role is the one that take place AFTER the container is up
so once Atlantis is up ,then it uses the TASK role to assume role , run terraform or access other things
How do people feel about the live
vs modules
directory/src approach? It annoys me that everything just isn’t in a single directory. I’m trying to understand the reasoning of something like:
terraform/my_modules/my_mod/{main.tf,variables.tf}
terraform/live/{dev,staging,qa,prod}/{main.tf,variables.tf,versions.tf}
and then live goes:
module "my_mod" {
source = "../modules/dev/my_mod"
}
something feels off about this, but maybe it’s cleaner or more convenient or something?
i do this pretty regularly. though instead of “live” i call them “stacks”. each stack is essentially a root module and has terraform state
the modules/ directory lets me centralize business logic around one or more modules, as well as gives me a single place to update versions of external modules
@Jonas Steinberg in a real system each live env (ie “stack”) will be slightly different. Eg dev might have 1 EC2 instance, qa 3 and prod 30. These would be defined in the my_mod/variables.tf
, and passed as args in the my_mod
module block of the stack.
If that doesn’t help maybe you can be more specific?
I don’t know. I guess it’s the same with anything else. You keep interfaces, abstract classes and the like in separate directories than main app src.
Thanks for the responses!
2022-11-09
Hi, I am currently having an issue using the module: <https://github.com/cloudposse/terraform-aws-ec2-client-vpn>
If I don’t set the variable name
I get this error:
╷
│ Error: "name" isn't a valid log group name (alphanumeric characters, underscores, hyphens, slashes, hash signs and dots are allowed): ""
│
│ with module.client_vpn.module.cloudwatch_log.aws_cloudwatch_log_group.default[0],
│ on .terraform/modules/client_vpn.cloudwatch_log/main.tf line 17, in resource "aws_cloudwatch_log_group" "default":
│ 17: name = module.log_group_label.id
│
╵
If I do set it I get this one:
╷
│ Error: failed creating IAM Role (ci-vpn-log-group): InvalidInput: Duplicate tag keys found. Please note that Tag keys are case insensitive.
│ status code: 400, request id: *********************
│
│ with module.client_vpn.module.cloudwatch_log.module.role.aws_iam_role.default[0],
│ on .terraform/modules/client_vpn.cloudwatch_log.role/main.tf line 29, in resource "aws_iam_role" "default":
│ 29: resource "aws_iam_role" "default" {
│
I don’t really see a way to make it works, does somebody have an idea ?
The first error makes sense. We usually always set a name
because that’s tied to the null
label in context.tf and every resource is named based on that
The second error is unfamiliar to me
One thing you could do is reach out to the examples and see if you can use a similar approach because thats the one thats tested on every new PR
It would also help to create a written ticket in that repo with the following
• Reproducible examples containing your hcl and exact inputs
• Errors that you found and worked around
• References to similar errors thrown found on the web
A workaround could be to just disable logging for now since logging_enabled=true is whats creating the kog group
https://github.com/cloudposse/terraform-aws-ec2-client-vpn#input_logging_enabled
Thx, so just to clarify. For the first error I think it is due to the fact that in the complete test you pass the context to the module. In the Readme however you don’t so by default “name” is use and is refused as a name for the log-group.
The second error appear due to the fact that I try to correct it without passing a full context (since we use a different model in my company)
I just do:
module "client_vpn" {
source = "cloudposse/ec2-client-vpn/aws"
version = "0.13.0"
name = module.tags.id
Which override the name and works until I have the duplicate tags error which I think is caused by:
+ tags = {
+ "Attributes" = "log-group"
+ "Name" = "my-custom-name"
}
+ tags_all = {
+ "Attributes" = "log-group"
+ "Name" = "my-custom-name"
I don’t know if that clarify the issue for you. I am just wondering now if I can use it without using the context
it’s more clear now. please create an issue and then we can work towards resolving it
for now you can either try disabling the logging, if that’s the only duplicate tag issue, or try passing in a context with only the name
filled
we also love PRs on our documentation and fixes to our modules
Ok I am going to try to see what I can do
v1.4.0-alpha20221109 1.4.0 (Unreleased) BUG FIXES: The module installer will now record in its manifest a correct module source URL after normalization when the URL given as input contains both a query string portion and a subdirectory portion. Terraform itself doesn’t currently make use of this information and so this is just a cosmetic fix to make the recorded metadata more correct. (<a href=”https://github.com/hashicorp/terraform/issues/31636” data-hovercard-type=”pull_request”…
1.4.0 (Unreleased) BUG FIXES:
The module installer will now record in its manifest a correct module source URL after normalization when the URL given as input contains both a query string portion …
This fixes an issue in the String() for ModuleSourceRemote in which it does not consider query strings (if present). Before this fix, any subdirectory would simply be appended to the ModuleSourceRe…
Hello All! I wanted to share a post I created about simplifying your terraform workflow with a wrapper https://www.taccoform.com/posts/tf_wrapper_p1/
Overview Cloud providers are complex. You’ll often ask yourself three questions: “Is it me?”, “Is it Terraform?”, and “Is it AWS?” The answer will be yes to at least one of those questions. Fighting complexity can happen at many different levels. It could be standardizing the tagging of cloud resources, creating and tuning the right abstraction points (Terraform modules) to help engineers build new services, or streamlining the IaC development process with wrappers.
2022-11-10
Hi is anyone able to help me with the EC2 Client VPN Module?
https://github.com/cloudposse/terraform-aws-ec2-client-vpn
My first initial run of this module ran out successfully, I then made some changes to the configuration and ever since my applies fall over on the SSM Parameter creation. I have also completed changed the naming standard so they are ‘fresh resources’ - it still seems to fall over.
╷
│ Error: error creating SSM Parameter (/staging-awsvpn.key): ParameterAlreadyExists: The parameter already exists. To overwrite this value, set the overwrite option in the request to true.
│
│ with module.ec2_client_vpn.module.self_signed_cert_server.aws_ssm_parameter.private_key[0],
│ on .terraform/modules/ec2_client_vpn.self_signed_cert_server/ssm.tf line 12, in resource "aws_ssm_parameter" "private_key":
│ 12: resource "aws_ssm_parameter" "private_key" {
Any help in regards this would be extremely appreciated!
managed to escape around this by running the tf apply cycle and before pressing apply, deleting the parameters manually before allowing the apply run to proceed.
The below resources seem to constantly want to update/recreate on every apply run
• module.ec2_client_vpn.module.self_signed_cert_ca.aws_ssm_parameter.certificate will be updated
• module.ec2_client_vpn.module.self_signed_cert_ca.aws_ssm_parameter.private_key[0] will be updated in-place
• module.ec2_client_vpn.module.self_signed_cert_server.aws_acm_certificate.default[0] will be updated in-place
• module.ec2_client_vpn.module.self_signed_cert_server.aws_ssm_parameter.certificate[0] will be updated in-place
• module.ec2_client_vpn.module.self_signed_cert_server.aws_ssm_parameter.private_key[0] will be updated in-place
• module.ec2_client_vpn.module.self_signed_cert_server.tls_locally_signed_cert.default[0] must be replaced
has anyone been able to pass any pre-defined variables from say Github or Gitlab into Terraform Cloud? Looking to pass in say the $
GITLAB_USER_EMAIL`` variable to use in Terraform Cloud (via TF_VAR_user_email or something). The documentation (https://developer.hashicorp.com/terraform/enterprise/run/run-environment#environment-variables) indicates a set of pre-defined environment variables are injected automatically, however haven’t been able to find a way to pass in other variables.. cheers!
2022-11-11
2022-11-12
hello guys. i am using this module for creating emr cluster. https://github.com/cloudposse/terraform-aws-emr-cluster While, creating task instance group. i want task instance group to be spot and bid_price as use_on_demand_as_max_price. How do i pass that ?It’s failing when i pass bid_price=”OnDemandPrice”. I don’t see any example for this issue.Can anyone help?
Terraform module to provision an Elastic MapReduce (EMR) cluster on AWS
what is it failing with?
Terraform module to provision an Elastic MapReduce (EMR) cluster on AWS
task_instance_group_bid_price = “OnDemandPrice” i used this.
Error: ValidationException: The bid price is invalid. Revise the configuration and resubmit. │ status code: 400, request id: c07e612b-fd9d-4324-a9ab-522276c0cee8
OnDemandPrice
is an invalid value
you have to express a number
bid_price - (Optional) If set, the bid price for each EC2 instance in the instance group, expressed in USD. By setting this attribute, the instance group is being declared as a Spot Instance, and will implicitly create a Spot request. Leave this blank to use On-Demand Instances.
https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/emr_instance_group
yes, but i don’t want to specify number. i want to use_on_demand_as_max_price. AWS Management console gives that option.
terraform aws provider doesn’t currently support it
Community Note
• Please vote on this issue by adding a :+1: reaction to the original issue to help the community and maintainers prioritize this request • Please do not leave “+1” or “me too” comments, they generate extra noise for issue followers and do not help prioritize the request • If you are interested in working on this issue or have submitted a pull request, please leave a comment
Description
The api EMR natively provided the way to set the maximum Spot price at On-demand price whenMarket
parameter is set to SPOT
and BidPrice
at NULL
.
This is really important when you pay attention to the price.
This feature allows us to assign the current price of the SPOT course and allows us to keep our emr workload up to the On-demand price.
This more cost efficient and more secure for your Workload .
This information is not in the aws APIreference but you can find it on boto3 documentation.
boto3 EMR bidPrice and Market parameters
BidPrice (string) --
The maximum Spot price you are willing to pay for EC2 instances.
An optional, nullable field that applies if the MarketType for the instance group is specified as SPOT.
Specify the maximum spot price in USD.
If the value is NULL and SPOT is specified, the maximum Spot price is set equal to the On-Demand price.
Affected Resource(s)
• aws_emr_cluster
Potential Terraform Configuration
We can add a new parameter like Market
which can take two value : ON_DEMAND | SPOT
bid_price parameter if set to blank send a null
value in the API payload .
instance_group {
instance_role = "CORE"
instance_type = "c4.large"
instance_count = "1"
ebs_config {
size = "40"
type = "gp2"
volumes_per_instance = 1
}
market = "SPOT | ON-DEMAND"
bid_price = ""
}
Actual CODE
terraform-provider-aws/aws/resource_aws_emr_cluster.go
unc expandBidPrice(config *emr.InstanceGroupConfig, configAttributes map[string]interface{}) {
if bidPrice, ok := configAttributes["bid_price"]; ok {
if bidPrice != "" {
config.BidPrice = aws.String(bidPrice.(string))
config.Market = aws.String("SPOT")
} else {
config.Market = aws.String("ON_DEMAND")
}
}
}
Screen
capture d ecran 2019-01-16 a 11 51 18
info wrong in Docs
Info about biprice on Master Instance group is wrong
Actually i can bid on Master Instance group
bid_price - (Optional) If set, the bid price for each EC2 instance in the instance group, expressed in USD.
By setting this attribute, the instance group is being declared as a Spot Instance, and will implicitly create a Spot request.
Leave this blank to use On-Demand Instances.
bid_price can not be set for the MASTER instance group, since that group must always be On-Demand
i see this issue is closed. but why
?
the issue is not closed
yeah sorry.Just checked.
So, what can i do to solve this problem?
best you can do is probably the issue, add a comment, or attempt to add support to the provider by opening a PR.
Sure. Thanks for your time.
no problem
2022-11-14
Deciphering Terraform Module Vars I inherited some Terraform to manage but I am quite new to HCL and so I am struggling to decipher this commonly used code. I have no one to pass on this knowledge to me and so I find myself here. This below is a snippet, but I am looking for is an explanation to what is going on. I am particularly interested in the var definitions and how that translates to finding a value.
#main.tf module “alb_multi_target” { for_each = local.alb_multi_target_components source =…
I inherited some Terraform to manage but I am quite new to HCL and so I am struggling to decipher this commonly used code. I have no one to pass on this knowledge to me and so I find myself here. T…
2022-11-15
How can I use additional_tag_map to change the “Name” tag for the security groups created by the module? https://registry.terraform.io/modules/cloudposse/emr-cluster/aws/latest?tab=inputs
Can you describe how you’re trying to change the name?
I thought I could take advantage additional_tag_map
to overwrite the “Name” tag generated by terraform-null-label
by a value I provided as a string.
@Erik Osterman (Cloud Posse)?
@Jeremy G (Cloud Posse)
additional_tag_map
is for a special use case, the details of which I forget but you and find in Git history. In any case, it does not do what you want, nor is it intended to. As a general rule, you cannot overwrite the Name
tag because Name
is special and gets the generated label id
which is the main point of null-label
.
2022-11-16
2022-11-17
Hi, I found there were always changes when I terraform plan
with module cloudposse/lb-s3-bucket/aws
although I haven’t changed my config. It never gets to the synchronized state.
Here is an example:
module "alb_log_s3_bucket" {
source = "cloudposse/lb-s3-bucket/aws"
version = "0.15.0"
name = "my-test-alb-bucket-rchen"
stage = var.environment
namespace = var.namespace
attributes = [var.region]
lifecycle_rule_enabled = true
enable_glacier_transition = false
expiration_days = 31
noncurrent_version_expiration_days = 30
}
Just tested version 0.12.0
. It doesn’t have this issue.
0.14.1
also works. So it must be broken in 0.15.0, which upgrades cloudposse/s3-log-storage/aws
from 0.24.0
to 0.26.0
Submitted an issue: https://github.com/cloudposse/terraform-aws-lb-s3-bucket/issues/57
After spending most of my time provisioning AWS resources in Terraform, I decided to try provisioning resources across multiple Cloud/SaaS offerings. This exploration opened up a new level of orchestration https://www.taccoform.com/posts/tfg_p6/ (link fixed)
Overview When working in our respective cloud service providers, we tend to get tunnel vision and think only in terms of compute, networking, and storage for our task at hand. This may be feasible for all in scenarios, but the reality is that we most likely leverage multiple SaaS offerings to get the best experience possible. Using DigitalOcean for infrastructure, Cloudflare for CDN/WAF, GitHub for code repositories, and Datadog for logging/metrics.
v1.3.5 Version 1.3.5
Version 1.3.5
v1.3.5 1.3.5 (November 17, 2022) BUG FIXES: Prevent crash while serializing the plan for an empty destroy operation (#32207) Allow a destroy plan to refresh instances while taking into account that some may no longer exist (<a href=”https://github.com/hashicorp/terraform/issues/32208” data-hovercard-type=”pull_request”…
Some prior refactors left the destroyPlan method a bit confusing, and ran into a case where the previous run state could be returned as nil. Get rid of the no longer used pendingPlan value, and tra…
In order to complete the terraform destroy command, a refresh must first be done to update state and remove any instances which have already been deleted externally. This was being done with a refr…
2022-11-18
any ways to overcome this ? https://github.com/hashicorp/terraform-provider-aws/issues/10329
Description
Beginning in September 2019, improved VPC networking for AWS Lambda began rolling out in certain AWS Commercial regions. Due to the underlying AWS infrastructure changes associated with this improved networking for Lambda, an unexpected consequence was a slight change in the Elastic Network Interface (ENI) description that Terraform used to manually delete those in those EC2 Subnets and Security Groups as well as an increased amount of time to delete them. During this Lambda service deployment, it was noticed by HashiCorp, AWS, and the community that deleting Elastic Compute Cloud (EC2) Subnets and Security Groups previously associated with Lambda Functions were now receiving DependencyViolation
errors after those Terraform resources’ default deletion timeouts (20 minutes and 10 minutes respectively). These errors during a Terraform apply operation may look like the following:
$ terraform destroy
...
Error: errors during apply: 2 problems:
- Error deleting subnet: timeout while waiting for state to become 'destroyed' (last state: 'pending', timeout: 20m0s)
- Error deleting security group: DependencyViolation: resource sg-xxxxxxxxxxxx has a dependent object
status code: 400, request id: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx
Please note: not all DependencyViolation
errors like the above are associated with this Lambda service change. The DependencyViolation
error occurs when *any* infrastructure is still associated with an EC2 Subnet or Security Group during deletion. This may occur due to multiple, separate Terraform configurations working with the same subnet/security group or infrastructure manually associated with the subnet/security group.
Working on top of a community contribution (thanks, @ewbankkit and @obourdon!) and in close communication with the AWS Lambda service team to determine the highest percentile deletion times, Terraform AWS Provider version 2.31.0 and later includes automatic handling of the updated ENI description and handles the increased deletion times for the new Lambda infrastructure. See the Terraform documentation on provider versioning for information about upgrading Terraform Providers.
For Terraform environments that cannot be updated to Terraform AWS Provider version 2.31.0 or later yet, this issue can be mitigated by setting the customizable deletion timeouts available for these two Terraform resources to at least 45 minutes and ensuring any Lambda execution IAM Role permissions with ec2:DeleteNetworkInterface
are explicitly ordered after the deletion of associated subnets/security groups so the Lambda service has permissions to delete the ENIs it created in your VPC before those permissions are removed.
Example configuration for Terraform AWS Provider versions 2.30.0 and earlier:
resource "aws_iam_role_policy_attachment" "example" {
policy_arn = "arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole"
role = "${aws_iam_role.example.id}"
}
resource "aws_subnet" "example" {
# ... other configuration ...
timeouts = {
delete = "45m"
}
depends_on = ["aws_iam_role_policy_attachment.example"]
}
resource "aws_security_group" "example" {
# ... other configuration ...
timeouts = {
delete = "45m"
}
depends_on = ["aws_iam_role_policy_attachment.example"]
}
In those earlier versions of the Terraform AWS Provider, if the IAM Role permissions are removed before Lambda is able to delete its Hyperplane ENIs, the subnet/security groups deletions will continually fail with a DependencyViolation
error as those ENIs must be manually deleted. Those ENIs can be discovered by searching for the ENI description AWS Lambda VPC ENI*
.
Example AWS CLI commands to find Lambda ENIs (see the AWS CLI documentation for additional filtering options):
# EC2 Subnet example
$ aws ec2 describe-network-interfaces --filter 'Name=description,Values="AWS Lambda VPC ENI*",Name=subnet-id,Values=subnet-12345678'
# EC2 Security Group example
$ aws ec2 describe-network-interfaces --filter 'Name=description,Values="AWS Lambda VPC ENI*",Name=group-id,Values=sg-12345678'
Example AWS CLI command to delete an ENI:
$ aws ec2 delete-network-interface --network-interface-id eni-12345678
While the deletion issues are now handled (either automatically in version 2.31.0 or later, or manually with the configuration above), the increased deletion time for this infrastructure is less than ideal. HashiCorp and AWS are continuing to closely work together on reducing this time, which will likely be handled by additional changes to the AWS Lambda service without any necessary changes to Terraform configurations. This issue serves as a location to capture updates relating to those service improvements.
I wonder if there anything that could speed it up, or literally just wait for them to fix the bug
Description
Beginning in September 2019, improved VPC networking for AWS Lambda began rolling out in certain AWS Commercial regions. Due to the underlying AWS infrastructure changes associated with this improved networking for Lambda, an unexpected consequence was a slight change in the Elastic Network Interface (ENI) description that Terraform used to manually delete those in those EC2 Subnets and Security Groups as well as an increased amount of time to delete them. During this Lambda service deployment, it was noticed by HashiCorp, AWS, and the community that deleting Elastic Compute Cloud (EC2) Subnets and Security Groups previously associated with Lambda Functions were now receiving DependencyViolation
errors after those Terraform resources’ default deletion timeouts (20 minutes and 10 minutes respectively). These errors during a Terraform apply operation may look like the following:
$ terraform destroy
...
Error: errors during apply: 2 problems:
- Error deleting subnet: timeout while waiting for state to become 'destroyed' (last state: 'pending', timeout: 20m0s)
- Error deleting security group: DependencyViolation: resource sg-xxxxxxxxxxxx has a dependent object
status code: 400, request id: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx
Please note: not all DependencyViolation
errors like the above are associated with this Lambda service change. The DependencyViolation
error occurs when *any* infrastructure is still associated with an EC2 Subnet or Security Group during deletion. This may occur due to multiple, separate Terraform configurations working with the same subnet/security group or infrastructure manually associated with the subnet/security group.
Working on top of a community contribution (thanks, @ewbankkit and @obourdon!) and in close communication with the AWS Lambda service team to determine the highest percentile deletion times, Terraform AWS Provider version 2.31.0 and later includes automatic handling of the updated ENI description and handles the increased deletion times for the new Lambda infrastructure. See the Terraform documentation on provider versioning for information about upgrading Terraform Providers.
For Terraform environments that cannot be updated to Terraform AWS Provider version 2.31.0 or later yet, this issue can be mitigated by setting the customizable deletion timeouts available for these two Terraform resources to at least 45 minutes and ensuring any Lambda execution IAM Role permissions with ec2:DeleteNetworkInterface
are explicitly ordered after the deletion of associated subnets/security groups so the Lambda service has permissions to delete the ENIs it created in your VPC before those permissions are removed.
Example configuration for Terraform AWS Provider versions 2.30.0 and earlier:
resource "aws_iam_role_policy_attachment" "example" {
policy_arn = "arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole"
role = "${aws_iam_role.example.id}"
}
resource "aws_subnet" "example" {
# ... other configuration ...
timeouts = {
delete = "45m"
}
depends_on = ["aws_iam_role_policy_attachment.example"]
}
resource "aws_security_group" "example" {
# ... other configuration ...
timeouts = {
delete = "45m"
}
depends_on = ["aws_iam_role_policy_attachment.example"]
}
In those earlier versions of the Terraform AWS Provider, if the IAM Role permissions are removed before Lambda is able to delete its Hyperplane ENIs, the subnet/security groups deletions will continually fail with a DependencyViolation
error as those ENIs must be manually deleted. Those ENIs can be discovered by searching for the ENI description AWS Lambda VPC ENI*
.
Example AWS CLI commands to find Lambda ENIs (see the AWS CLI documentation for additional filtering options):
# EC2 Subnet example
$ aws ec2 describe-network-interfaces --filter 'Name=description,Values="AWS Lambda VPC ENI*",Name=subnet-id,Values=subnet-12345678'
# EC2 Security Group example
$ aws ec2 describe-network-interfaces --filter 'Name=description,Values="AWS Lambda VPC ENI*",Name=group-id,Values=sg-12345678'
Example AWS CLI command to delete an ENI:
$ aws ec2 delete-network-interface --network-interface-id eni-12345678
While the deletion issues are now handled (either automatically in version 2.31.0 or later, or manually with the configuration above), the increased deletion time for this infrastructure is less than ideal. HashiCorp and AWS are continuing to closely work together on reducing this time, which will likely be handled by additional changes to the AWS Lambda service without any necessary changes to Terraform configurations. This issue serves as a location to capture updates relating to those service improvements.
open a ticket with aws support i guess? been an issue for years now
it takes exactly
ws_security_group.lambda_sg: Still destroying... [id=sg-00c13d8760cf5ced7, 27m50s elapsed]
aws_security_group.lambda_sg: Destruction complete after 27m51s
Releasing state lock. This may take a few moments...
every time
that’s awful
state rm first lol
or maybe suggest a flag on security groups to skip waiting, similar to the flag on cloudfront distributions
let. me try state rm…….
what is that flag in cloudfront?
well, it’s the opposite, since it’s on creation, but the deployment takes forever, so you can set wait_for_deployment = false
and it will just continue on and let aws do its thing in the background…
https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudfront_distribution#wait_for_deployment
so, for security groups, maybe wait_for_destroy
?
yes, that would work
my workflow look like this:
workflows:
deploy-all:
description: Deploy terraform projects in order
steps:
- command: terraform deploy fetch-location -s stackname
- command : terraform deploy fetch-weather -s stackname
- command : terraform deploy output-results -s stackname
Has anyone used the terraform-provider-github? 200k downloads for the latest release, lots of issues raised, so it seems to be well used. But only 500 stars on github, so I’m just curious to find out limitations before I start using it?
it’s kind of a pain to use on files. but otherwise it worked fine for me
what do you mean by “to use on files”, what files?
you can manage files with the provider
but it’s a mess if you have branch protection, or use a pr-based workflow
i was trying to use it to template an initial repository with a bunch of common files. but i ended up not doing that, because i couldn’t get it to work very well
did you try using “bypass branch protections” permission on the role/account used by terraform?
it was a couple years ago, i don’t remember what i tried or didn’t. all i remember is it wasn’t worth it
Thanks for the info @loren
We use it a bit too manage teams and repositories belonging to different teams, as well as populating repos with templates on creation.
If I change a file that is used in a terraform module and trigger an apply, it can create a lot of noise in all the repos that are managed by terraform, and including that module. (similar to what loren mentions about: pain on files )
Yeah, I was just thinking that maybe using lifecycle ignore_changes on file contents might work, for at least the initial file contents of the repo
Makes sense. Once a file is managed by terraform, it should be modified only through terraform, which would be easy to forget in a repo.
But I might rather just create a template repo, and create new repos from that template, using the github provider mostly just to create the repo from that template, and to manage repo settings
lifecycle and github-provider-to-create-repo-from-template is worth trying
I think the bigge st challenge is if you’re using it on a large organization with hundreds of repos, you can easily exhaust rate limits if you don’t architect the terraform with that in mind.
e.g. doing a plan on 600+ repos is basically guaranteed to exhaust rate limits.
2022-11-19
2022-11-21
I am trying out the Getting started with Atmos — Cloud Posse Developer Hub The workflow command seems to fail
✗ . [none] (HOST) 02-atmos ⨠ atmos workflow deploy-all -s example
Error: required flag(s) "file" not set
Usage:
atmos workflow [flags]
if I try with -f option I get the following error
⧉ Tutorials
✗ . [none] (HOST) 02-atmos ⨠ atmos workflow deploy-all -s example -f /tutorials/02-atmos/stacks/example.yaml
yaml: unmarshal errors:
line 1: cannot unmarshal !!seq into config.WorkflowConfig
line 3: cannot unmarshal !!str `example` into config.WorkflowDefinition
not sure what I am missing
how does this look like in your atmos.yaml
workflows:
# Can also be set using `ATMOS_WORKFLOWS_BASE_PATH` ENV var, or `--workflows-dir` command-line arguments
# Supports both absolute and relative paths
base_path: "stacks/workflows"
how does your workflow file looks like? that error could be a yaml issue
I didn’t change anything from git clone
workflows:
# Can also be set using 'ATMOS_WORKFLOWS_BASE_PATH' ENV var, or '--workflows-dir' command-line arguments
# Supports both absolute and relative paths
base_path: "stacks/workflows"
workflows
workflows:
deploy-all:
description: Deploy terraform projects in order
steps:
- job: terraform deploy fetch-location
- job: terraform deploy fetch-weather
- job: terraform deploy output-results
✗ . [none] (HOST) 02-atmos ⨠ atmos version v1.9.1
@Dan Miller (Cloud Posse)
(also note, we’re moving these atmos tutorials to https://atmos.tools/category/tutorials)
Atmos is a workflow automation tool to manage complex configurations with ease. It’s compatible with Terraform and many other tools.
I checked this location too…same instructions and the workflow command doesn’t work Is there a particular version of atmos I need to pin to get this working? There are also few other command lines that doesn’t work based on the document For example This command doesn’t work
atmos terraform backend generate component tfstate-backend --stack ue2-root
it should actually be
atmos terraform generate backend tfstate-backend --stack ue2-root
is it atmos terraform backend generate
in the docs?
yes
can you share the link, because In the docs I see it correct
[Your first environment on AWS | atmos](https://atmos.tools/tutorials/first-aws-environment) |
Atmos is a workflow automation tool to manage complex configurations with ease. It’s compatible with Terraform and many other tools.
do you ming creating an issue in the atmos repo about this?
Found a bug? Maybe our Slack Community can help.
Describe the Bug
This document https://atmos.tools/tutorials/first-aws-environment
Has commands that reference to create terraform-backend
atmos terraform backend generate component tfstate-backend –stack ue2-root
Expected Behavior
The correct command line should be
atmos terraform generate backend tfstate-backend –stack ue2-root
Steps to Reproduce
Steps to reproduce the behavior:
This document https://atmos.tools/tutorials/first-aws-environment
Has commands that reference to create terraform-backend
check the Build and generate our tfstate-backend section
Screenshots Environment (please complete the following information): Additional Context
Any suggestions on the workflow command issue?
2022-11-22
2022-11-23
Does terraform have a mechanism of knowing that a plan file was already executed?
A plan is returned as stale when you try to apply it a second time. The exact mechanism i don’t know.
I’m guessing an apply against the plan file would yield no changes and would provide an exit code that corresponds to no changes
Hi, I’m trying to implement the terraform-aws-datadog-lambda-forwarder. and have enabled the forwarder_rds I keep running into the following error:
Error: error creating IAM Policy rds: MalformedPolicyDocument: Resource must be in ARN format or "*".
status code: 400, request id: 0d4b0d8a-3fee-4f6f-8149-d2f049c9286e
with module.datadog_lambda_forwarder.aws_iam_policy.lambda_forwarder_rds[0], on .terraform/modules/datadog_lambda_forwarder/lambda-rds.tf line 53, in resource "aws_iam_policy" "lambda_forwarder_rds": 53: resource "aws_iam_policy" "lambda_forwarder_rds" {
I tried many things to get past it. I just cannot figure out what resource or variable must exist to get this to pass…
Well as the error says the resource "aws_iam_policy" "lambda_forwarder_rds" {
has a malformed Policy Document because the Resource must be in ARN format or “*”.
Maybe share your code, and we can see what you have?
Yes:
module "datadog_integration" {
source = "cloudposse/datadog-integration/aws"
version = "1.0.0"
context = module.datadog_label.context
name = "datadog"
integrations = ["all"]
host_tags = local.datadog_host_tags
}
resource "aws_ssm_parameter" "datadog_key" {
name = "/datadog/datadog_api_key"
description = "Datadog key"
type = "SecureString"
value = var.datadog_api_key
}
module "datadog_lambda_forwarder" {
source = "cloudposse/datadog-lambda-forwarder/aws"
version = "1.0.0"
forwarder_rds_enabled = true
depends_on = [aws_ssm_parameter.datadog_key]
}
the upper one, datadog/integration/aws is already working correctly
you need to populate the value for variable dd_api_key_source
.
the default for dd_api_key_source.identifier
currently is just ""
which needs to be either *
or an actual ARN
Hi team I am implementing the module terraform-aws-ecs-cloudwatch-sns-alarms and there is no input variable for datapoints_to_alarm. This is part of the additional configurations. Should I make the changes and raise a PR or is there any new release coming for these changes?
Yes prs are always welcome! We may not get to it right away but we will eventually
And if I make changes and raise the PR then after how much time i might be able to use it in my project?
Hi team, I am getting below error when trying to run ‘terraform plan’ to check for changes to K8 service accounts. There are loads of service account but the error is being thrown only for few service accounts, has anyone come across this or do you have any suggestions plz?
Error: Get "<https://eks_control_plane_endpoint/api/v1/namespaces/namespace-name/serviceaccounts/serviceaccount-name>": dial tcp: lookup eks-control-plane-api-endpoint on 0.0.0.0:53: read udp 172.17.0.2:35383->0.0.0.5:53: i/o timeout
DNS misconfigured? It looks like eks_control_plane_endpoint is being resolved to 0.0.0.5 or that a DNS request (since port 53 is mentioned) is being made to that address.
Are all the failures showing a similar error, like same domain?
thanks for your reply; I have masked the DNS and yes all the failures are showing the same DNS. let me know if you need any more details.
2022-11-24
2022-11-25
Anyone who got an example using the cloudposse/s3-bucket/aws module to create 2 buckets in different regions with cross region replication setup between them?
Did you solve this? I have a module for this somewhere.
No I didn’t… I ended up postponing it as I’m on a very tight deadline to make improvements to the infra before an audit..
If you can share the module you wrote it will be much appreciated
Here you go: https://github.com/deploymode/terraform-aws-modules/tree/master/modules/s3-replication
I pulled it out of another project and made a few untested tweaks, so let me know if you have any issues.
2022-11-27
Target group ‘arnelasticloadbalancing110072843540:targetgroup/mytargetgroup/1d7561147f16c315’ is currently in use by a listener or a rule
I’m trying to change the target group on terraform but I get the above error. Is there a way to fix it in pure terraform? I know I can go to the UI and remove the target from the lb, but I’d rather let terraform do it correclty
maybe share your terraform code as well?
Oooh, great point- I may have not imported the liatener into terrafom, that’s why it doesn’t know to recreate it
2022-11-28
2022-11-29
Hi guys. I have been using terraform for some years, but still time to time struggle with for_each stuff. Compiling complex objects from locals etc using multiple for each loops and extracting needed values. Anyone has any good recommendation for book, documentation, video guide or blog post that would be helpful for me to deal with these in terraform? Anyone that faced similar issues and found good resource, paid or free and can recommend something, would be appreciated.
Honestly I can relate. I can’t actually write a complex for_each without google, and I’ve contributed code to the terraform aws provider. But with a few examples from CloudPosse from relevant modules and from internal modules, I can usually infer the syntax that I need to write. So, my suggestion is look for examples.
Of course there is terraform’s docs, but I haven’t found them helpful with complex objects (for the simpler ones I can do it by myself). For the books, I’ve seen praises for Terraform Up & Running, but I can’t speak wether the complicated stuff has enough coverage. So I still go with looking for examples, and adjust them to my needs.
Thanks @Denis i have read up and running on oreilly.com it has simple one liners for loop examples just with key value stuff, easy ones so not really helpful and examples from modules that other people use, including cloudposse is also something i’m using, but figured, i want to understand it and be able to write them without looking for examples as it takes time and got tired of it But anyway, thanks for your suggestions!
Understood. Personally, I’m not writing complex for_each loops often, so my cost benefit ratio for understanding it deeply is reduced, that’s where my suggestions are coming from But yeah, it’s good to dive deep on it if it makes a difference.
This was pretty good… https://brendanthompson.com/posts/2022/10/terraform-for-expression
Using the for expression in Terraform to filter, group, order and mutate information. With this knowledge in hand you will easily be able to construct complex objects based on existing information/configuration or from configuration passed in via input variables or ingested. Easily create multiple instances of resources or data sources using the for_each meta-argument.
Thanks! This is useful
has anyone seen this error
Error: Error retreiving Projects: "InvalidInputException: Invalid project ARN: region does not match caller's region"
I’ve seen something similar when a provider is configured for only one specific region. But this may be something different. Maybe share the terraform code?
it was orginally built in the wrong region, i changed the module along with the provider to a new region and then it barfed… with that non-descriptive error. I actually found the error by logging to a file with error logs
This message was deleted.
folks - guidance on best way to keep configuration around but not have the resources active? case in point is a eks cluster with a number of node groups defined with the terraform-aws-eks-node-group
module. I don’t need a specific node group that was dedicated to some testing, but may want to bring it up later. maybe this is as much a terraform question as this specific module, but what’s the best way to accomplish? I could obviously just comment out the block, but perhaps there’s another way within the module? thx
Set the min and desired qty to 0
that’s…too easy. thanks
2022-11-30
In the terraform aws rds docs there is this serverless v2 example:
“An aws_rds_cluster_instance
resource must also be added to the cluster”
resource "aws_rds_cluster" "example" {
cluster_identifier = "example"
engine = "aurora-postgresql"
engine_mode = "provisioned"
engine_version = "13.6"
database_name = "test"
master_username = "test"
master_password = "must_be_eight_characters"
serverlessv2_scaling_configuration {
max_capacity = 1.0
min_capacity = 0.5
}
}
resource "aws_rds_cluster_instance" "example" {
cluster_identifier = aws_rds_cluster.example.id
instance_class = "db.serverless"
engine = aws_rds_cluster.example.engine
engine_version = aws_rds_cluster.example.engine_version
}
Did I understand this correctly and is it really the case that you cannot create a serverless v2 cluster without an aws_rds_cluster_instance
resource?
Hi! someone had this problem before? i made a mess with the state, destroying all and migrating it. https://github.com/hashicorp/terraform/pull/2376 i ran the last script in there:
while read -r addr
do
if [[ "$addr" == "module."** ]]
then
module="${addr%.*.*}"
addr="${addr#$module.}"
echo terraform taint -module="${module//module./}" "$addr"
else
echo terraform taint "$addr"
fi
done < <(terraform state list | grep "aws_security_group_rule")
but when i apply the error still being the same. I tried deleting by hand all the security group rules and it didnt work neither
v1.3.6 1.3.6 (November 30, 2022) BUG FIXES: Terraform could crash if an orphaned resource instance was deleted externally and had condition checks in the configuration (#32246) Module output changes were being removed and re-added to the stored plan, impacting performance with large numbers of outputs (<a…
If when refreshing an orphaned instance the provider indicates it has already been deleted, there is no reason to create a change for that instance. A NoOp change should only represent an object th…