#terraform (2022-07)
Discussions related to Terraform or Terraform Modules
Archive: https://archive.sweetops.com/terraform/
2022-07-01
god, i feel so useless. I have 0 terraform knowledge, i’m trying to modify an existing template and the thrown errors don’t help me much on fixing them This is the template -> https://github.com/kaisoz/terraform-nextcloud-ec2-rds-s3 My first mission is moving it from RDS MySQL to RDS Postgres To do that, i searched for another example that was using RDS Postgres -> https://github.com/terraform-aws-modules/terraform-aws-rds/blob/v4.4.0/examples/complete-postgres/main.tf I tried to copy some bits and here’s when i fail. I set
engine_version, family, major_engine_version
right under engine attribute at https://github.com/kaisoz/terraform-nextcloud-ec2-rds-s3/blob/master/modules/data/mysql/main.tf#L7 but terraform validate says
An argument named "engine_version" is not expected here.
(for each new attr that i set)
Rechecking the other example, i noticed that i’m placing the attrs under a “resource” block, so tried moving them to https://github.com/kaisoz/terraform-nextcloud-ec2-rds-s3/blob/master/main.tf#L40 with the same result.
Okay, i’m placing them on the wrong place, i get it. But, how i know where i should be placing them?
oh i think i’ve found it, those unexpected arguments are variables set by the other example, they’re not listed here https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/db_instance
oh, that template is also using aws_db_parameter_group, that’s where those attrs come from
Hi Team, can anyone educate me on what is this Bridecrew / Code analysis step in githhub and how to correct it https://github.com/cloudposse/terraform-aws-emr-cluster/pull/54 ?
what
• Adding missing tags parameter to the aws_iam_instance_profile resource.
why
• Prevents linting tools from failing • Tags the resource appropriately in line with the tagging strategy.
references
@Paulo Castro this is better in #pr-reviews
I will try to find out more about the BridgeCrew step
what
• Adding missing tags parameter to the aws_iam_instance_profile resource.
why
• Prevents linting tools from failing • Tags the resource appropriately in line with the tagging strategy.
references
2022-07-02
2022-07-04
Hello , i’m new to TF. I have some TF code that create a gcp bucket with a service account that should also have a condition. But for some reason the service account is created but its not linked to the bucket when created. what i mean by that is you can find the newly created SA in the SA section under the IAM and admin section but not under permissions in the created bucket. This is what the TF looks like
module "bucket_service_account" {
source = "terraform-google-modules/service-accounts/google"
version = "~> 3.0.1"
project_id = var.project_id
description = "Storage Service Account - Created by TF"
names = [var.bucket_service_account_name]
generate_keys = true
}
module "google_storage_bucket" {
source = "terraform-google-modules/cloud-storage/google"
names = [var.bucket_name]
location = var.bucket_region
project_id = var.project_id
prefix = var.prefix
versioning = {
enabled = var.versioning
}
}
module "google_storage_bucket_iam_member" {
source = "terraform-google-modules/iam/google//modules/storage_buckets_iam"
storage_buckets = ["var.bucket_name"]
mode = "additive"
bindings = {
"roles/storage.objectAdmin" = [
"[email protected]",
]
}
conditional_bindings = [
{
role = "roles/storage.objectAdmin"
title = var.condition
description = "Condition to ensure newly created service account is restriced to the created bucket"
expression = "resource.name == \"var.prefix-var.bucket_region-var.bucket_name\""
members = ["[email protected]"]
}
]
}
Hello hello. We’re trying to augment our internal web-based deployment tool to use TerraformCloud instead of pure scripts. But, assigning a TFC Workspace to each customer causes friction in the implementation. Which makes me doubt if I’m using Workspaces correctly. Is it a good design to assign a TFC Workspace name per customer environment?
Each customer, that buys our product, has a number of resources created for them and assigned to said customer. For example, cloud VMs. Our internal web-based tool, in essence, will wrap the creation of Terraform Workspaces, and their managed resources, for each customer using a tf module. We chose TFC so we can run multiple backend instances (high availability) and have TFC sync an environment’s status.
But: A tf module can’t read the Workspace name from a tf variable. This makes us “sed” the tf module and inject the Workspace name. e.g., customer X will have a Workspace named “customer-x”. This implementation feels wrong. As if Workspace names were meant to be fixed/rare.
Is it a good design to assign a TFC Workspace name per customer environment? Is using Workspace tags a better design? Can, at least, one Workspace tag be read from a tf variable, if it is a better design?
We ended up doing something similar to this using a TFE workspace to manage other TFE workspaces. We look at a yaml file to set the unique variables on the workspace (and any workspace specific settings).
Alternatively you could do it in one big state, but that’s not great.
Great! Thanks for helping
Is there a concern for blast radius by doing it this way? Won’t the state file be unnecessarily large if and when you end up with 100’s and 1000’s of workspaces where it’s all essentially in a single state file? (Under the workspace that’s managing all the other workspaces). Is there a way to split this up?
2022-07-05
Hi, all!
When using modules from git or the registry, is there a way to prevent multiple downloads?
For example, the following results in separate copies of the Git repo within my .terragrunt-cache/.../.terraform/modules
:
module "iam_group_with_assumable_roles_policy" {
source = "terraform-aws-modules/iam/aws//modules/iam-group-with-assumable-roles-policy"
version = "5.2.0"
...
}
module "iam_assumable_role" {
source = "terraform-aws-modules/iam/aws//modules/iam-assumable-role"
version = "5.2.0"
...
}
I’m pretty sure the module is only downloaded once. There are just two copies on the filesystem
also maybe post in #terragrunt
2022-07-06
v1.3.0-alpha20220706 1.3.0 (Unreleased) NEW FEATURES:
Optional attributes for object type constraints: When declaring an input variable whose type constraint includes an object type, you can now declare individual attributes as optional, and specify a default value to use if the caller doesn’t set it. For example: variable “with_optional_attribute” { type = object({ a = string # a required attribute b = optional(string) # an optional attribute c = optional(number, 127) # an…
2022-07-07
Hello, team! Please accept my apology if my question would sound silly. I am using cloudposse sftp transfer module. I have a user map that I want to use in two different module instances. My first module was created few days ago. With second module, Terraform returns error saying IAM role is already defined for all users in user map. I understand because the same users in a same usermap is given to second sftp tranfer module which is trying to create roles with same arn/ids.
How can I solve this? Thank you.
Hrmmmmm so since the modules use null-label
the names of all resources should be easily disambiguated. Are you naming each instance differently?
2022-07-10
Hi…
I am using the WAFv2 module from CloudPosse and am wondering how I can implement mixed and_statement
and not_statement
rules…
My current rule defined using aws provider WAFv2 resources looks like this (see code snippet) but it would seem that the module does not support this pattern… or I just didn’t understand how to do it. Not a bug, but maybe a feature request if it isn’t currently supported
rule {
name = "AllowMSAPIFromOffice"
priority = 1
action {
block {}
}
statement {
and_statement {
statement {
byte_match_statement {
search_string = "/test-site"
field_to_match {
uri_path {}
}
text_transformation {
priority = 0
type = "NONE"
}
positional_constraint = "STARTS_WITH"
}
}
statement {
not_statement {
statement {
ip_set_reference_statement {
arn = aws_wafv2_ip_set.msys[0].arn
}
}
}
}
}
}
visibility_config {
cloudwatch_metrics_enabled = false
metric_name = "msys-test-site-metrics"
sampled_requests_enabled = true
}
}
Yep it does seem unsupported
statement {
The way to have it setup, you have the rule (1), then statement (2), then “and statement” (3), then the byte statement match (4) and the module doesn’t allow the level 3 portion of the “and statement”
@Tomer we’re always looking for contributors and ways to improve our modules. Would you be willing to write up a ticket or contribute this feature?
Would love to!
Unsure I will be able to contribute the feature, but I could at least make the feature request and maybe even start something for discussion and consideration
IMO, because of the insanity of aws_wafv2_webacl
’s resource schema, all modules implementing it are really ugly and require you to write in a module-specific format to create the correct resource-specific schema which then doesn’t exactly match the AWS-level representation anyway.
It’s simpler to manage WAFv2 without a module, because as soon as your needs grow beyond the module there’s a painful ejection phase. And ultimately, WAF modules don’t have too many convenience features.
This is most certainly true… The issue comes in when we want to start doing funky loops over local values and creating similar WAF for each… it tends to turn into a looping nightmare where each connected resource must reference other resources by index or key. Kind of a rick and hard place scenario
This need in the AWS WAF resource definition to nest not_
, or_
or and_
statements within other statement
blocks really throws a spanner in the works and seems arbitrarily complex
I smell a feature request to AWS provider
2022-07-11
Hi!
I’m creating a cluster with the eks cluster module, and I’d like to know if it’s possible to specify a specific port range when allowing access to the cluster from a CIDR block.
Thanks!
I’m asking this because I’m creating a private cluster, inside a private subnets, and I want to only allow HTTP traffic to a very specific port, and not opening the whole cluster to the internet.
If you agree that this should be configurable, I’m happy to send a PR adding this ability
2022-07-12
Hello, I am using the cloudposse/ecs-web-app/aws
and everything works great except I am not sure how to have ecs start a task when a new definition is uploaded. I set the ignore_changes_task_definition = false
but it fully deletes the service and creates a new one. Is this the default behavior or can we just launch a new task in the existing service and once that’s live we kill the old one.
you can move the resource from the old resource to the new resource to prevent the recreation
Not sure what you mean. Can you plz elaborate.
terraform state mv xyz abc
Hello, I’ve raised a bug for the terraform-aws-sso project where any changes to policy definitions results in a mass destroy and (re-)create of terraform resources; we followed the example tf definitions but I’m wondering if we’re doing something wrong and if we can avoid this behaviour… any thoughts would be appreciated https://github.com/cloudposse/terraform-aws-sso/issues/29
Found a bug? Maybe our Slack Community can help.
Describe the Bug
The bug seems to be that terraform wants to destroy and recreate all account assignments even if only one policy statement, assigned to one account is changed. In our case, we have 34 assignments and all 34 assignments are planned for destroy and creation with every change. Could this be due to the fact that the assignments are defined in a list, any changes to which triggers a recreation of the listed resources?
Expected Behavior
We expected only the policy and if necessary, the account(s) assigned to plan for changes, not all accounts.
Steps to Reproduce
Steps to reproduce the behavior:
- Define multiple permission sets:
`module “permission_sets” {
source = “cloudposse/sso/aws//modules/permission-sets”
version = “0.6.2”
permission_sets = [
{
name = local.permission_sets.standard,
session_duration = local.session_duration,
description = local.description,
relay_state = local.relay_state,
tags = local.tags,
inline_policy = data.aws_iam_policy_document.abacv1.json,
policy_attachments = [“arn:aws:iam::aws:policy/ReadOnlyAccess”]
},
{
name = local.permission_sets.billing_read,
session_duration = local.session_duration,
description = local.description,
relay_state = local.relay_state,
tags = local.tags,
inline_policy = data.aws_iam_policy_document.billing_readonly.json,
policy_attachments = []
},
{
name = local.permission_sets.budget,
session_duration = local.session_duration,
description = local.description,
relay_state = local.relay_state,
tags = local.tags,
inline_policy = data.aws_iam_policy_document.budget_management.json,
policy_attachments = []
}
]
}2. Define multiple account assignments:
module “account_assignments” {
source = “cloudposse/sso/aws//modules/account-assignments”
version = “0.6.2”
depends_on = [module.permission_sets]
account_assignments = [
{
name = local.permission_sets.admin,
session_duration = local.session_duration,
description = local.description,
relay_state = local.relay_state,
tags = local.tags,
inline_policy = “”,
policy_attachments = [“arn:aws:iam::aws:policy/AdministratorAccess”]
},
{
account = local.accounts.production,
permission_set_arn = module.permission_sets.permission_sets[local.permission_sets.admin].arn,
permission_set_name = local.permission_sets.admin,
principal_type = local.principal_type,
principal_name = local.principal_names.devops
},
{
account = local.accounts.staging,
permission_set_arn = module.permission_sets.permission_sets[local.permission_sets.admin].arn,
permission_set_name = local.permission_sets.admin,
principal_type = local.principal_type,
principal_name = local.principal_names.devops
}
]
}
3. Plan and Apply terraform 4. change a single policy doc 5. Plan shows all account assignments are to be replaced, convoluting the true change in the plan, which is the single policy change
# module.account_assignments.aws_ssoadmin_account_assignment.this[“XXXXXXXXXX”XXXXXX-G-aws-sso-devops-AdministratorAccess”] must be replaced
-/+ resource “aws_ssoadmin_account_assignment” “this” {
~ id = “XXXXXXXXXX”XXXXXX,AWS_ACCOUNT,arn:aws:sso:::permissionSet/ssoins-72238ef87c17d7ab/ps-XXXXXXXXXX”XXXXXX,arn:aws:sso:::instance/ssoins-XXXXXXXXXX”XXXXXX” -> (known after apply)
~ instance_arn = “arn:aws:sso:::instance/ssoins-XXXXXXX” -> (known after apply) # forces replacement
~ principal_id = “XXXXXXXXXX”XXXXXX -> (known after apply) # forces replacement
(4 unchanged attributes hidden)
}`
The above replacement plan is displayed for all account assignments, even though the policy change affects a lesser number.
Screenshots
If applicable, add screenshots or logs to help explain your problem.
In our implementation, we have 34 account assignments and upon simply commenting out some lines in one policy assigned in one account, we see a plan to destroy 34 and add 34 resources, whilst registering the change to a single policy:
Plan: 34 to add, 1 to change, 34 to destroy.
Environment (please complete the following information):
multi-account, multi-environment
Anything that will help us triage the bug will help. Here are some ideas:
• Terraform required_version = “>= 1.1.0” • Module version “0.6.2”
Additional Context
Add any other context about the problem here.
2022-07-13
v1.2.5 1.2.5 (July 13, 2022) BUG FIXES: Report correct error message when a prerelease field is included in the required_version global constraint. (#31331) Fix case when extra blank lines were inserted into the plan for unchanged blocks. (<a href=”https://github.com/hashicorp/terraform/issues/31330” data-hovercard-type=”pull_request”…
Closes #28148 Any required_version that contains a prerelease field will fail the later constraint check anyway. This change ensures the reported error message makes clear that prerelease versions …
Closes #28217 Fix the issue where multiple blank lines were inserted into the plan for every unchanged block type. There are two cases for unchanged blocks, either the previous block is of the same…
Hey friends!
I have been impacted by this bugfix in Terraform AWS provider v4.22.0
:
ISSUE: https://github.com/hashicorp/terraform-provider-aws/issues/25680
PULL REQUEST: https://github.com/hashicorp/terraform-provider-aws/pull/25681/files
I have described the regression which is affecting my current configuration in these comments:
- https://github.com/hashicorp/terraform-provider-aws/issues/25680#issuecomment-1183812969
- https://github.com/hashicorp/terraform-provider-aws/issues/25680#issuecomment-1183818419 Very simple PR, very simple fix, but feels like it introduces other issues to aws credentials chain flow. Curious if anyone else is experiencing something similar?
In theory ( according to Authentication and Configuration ) , provider configuration is just 1st place where provider would try to find AWS configuration, but it shouldn’t stop there if it fails.
2022-07-15
i am using an assume_role
block in my terraform aws provider block like this:
provider "aws" {
assume_role {
role_arn = "arn:aws:iam::123456789012:role/ROLE_NAME"
}
}
when running terraform however, it seems like that role isn’t being used.
how can i confirm what role terraform is using?
create a data block for aws_caller_identity
and output the whole thing to see all the attributes?
Where are you running Terraform? EC2 - ECS - CodeBuild also what is ~/.aws/config?
running terragrunt inside an Atlantis pod on EKS
Your EKS pod probably needs an IRSA which can assume role_arn
You can associate an IAM role with a Kubernetes service account. This service account can then provide AWS permissions to the containers in any pod that uses that service account. With this feature, you no longer need to provide extended permissions to the
yeah it has it
but it couldn’t see the state bucket
figured it out
does the ‘assume_role’ in the provider block get used to look up the state file in S3? or do the local credentials get used for reading the state file?
2022-07-16
does the ‘assume_role’ in the provider block get used to look up the state file in S3? or do the local credentials get used for reading the state file?
It’s different. You may use the AWS provider for resources, but have an entirely different remote state backend. So they don’t share that configuration
Terraform can store state remotely in S3 and lock that state with DynamoDB.
2022-07-17
2022-07-18
Running into an interesting issue with the ec2-instance module… I need to spin up 4 instances, 2 in each of 2 regions… So I’m passing providers = { aws = aws.<alias> }
to each of the module { }
blocks… I’m getting 2 of the 4 running into issues during plan stage saying query returns no results, but they aren’t for the same alias, it’s actually one of each but the other 2 aren’t erroring out
even if I reduce it down to only 2 instances, one for each region that is aliased… It errors out on the second. I don’t encounter this same issue when using the security-group module to generate SGs in both regions. The error is on the data resource lookup for “aws_ami.info” as if it’s ignoring the provider supplied to the module. I’ve tried using both 0.41.0 and 0.43.0
There’s very strange. Could you post the hcl in an issue in the upstream module? Perhaps it’s a bug with the module or the aws provider
I think I may have got it resolved… I’d seen one issue on GitHub that didn’t have much details. It looked like I may have gotten variables crossed and passed the AMI ID for the wrong region
I am looking for suggestions on how to tackle a resource situation in AWS. The resource is “aws_backup_framework” used for compliance reporting and it checks backup plans and vaults for different controls. Has anyone tried deploying this resource to check multiple backup plans? Say I have a daily backup plan, a weekly, a monthly backup plan with different backup frequency and backup retention settings. Use of MAP will make this messy very fast, IMO. Could be wrong though hence I need suggestions. What could be a decent method to deploy this? Should I do 1 framework per backup plan in flat file structure but then I am repeating myself again and again.
Hey CloudPosse, I’m having some troubles with the https://github.com/cloudposse/terraform-aws-ec2-instance . I’m attaching an IAM instance profile to give the instance access to my S3 bucket, but the credentials are never configured when I run any cli command:
admin@ip-10-5-1-11:~$ aws s3 ls
Unable to locate credentials. You can configure credentials by running "aws configure".
admin@ip-10-5-1-11:~$ aws configure list
Name Value Type Location
---- ----- ---- --------
profile <not set> None None
access_key <not set> None None
secret_key <not set> None None
region <not set> None None
I can see the IAM policy, role and profile successfully created and attached to the instance by terraform, I can query the metadata and it shows the role attached:
TOKEN=curl -X PUT "<http://169.254.169.254/latest/api/token>" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600"
&& curl -H “X-aws-ec2-metadata-token: $TOKEN” -v http://169.254.169.254/latest/meta-data/iam/security-credentials/role
* HTTP 1.0, assume close after body
< HTTP/1.0 200 OK
< Accept-Ranges: bytes
< Content-Length: 1470
< Content-Type: text/plain
< Date: Tue, 19 Jul 2022 05:40:30 GMT
< Last-Modified: Tue, 19 Jul 2022 05:08:23 GMT
< X-Aws-Ec2-Metadata-Token-Ttl-Seconds: 21600
< Connection: close
< Server: EC2ws
<
{
"Code" : "Success",
"LastUpdated" : "2022-07-19T05:08:07Z",
"Type" : "AWS-HMAC",
"AccessKeyId" : "access-id",
"SecretAccessKey" : "secret-key",
"Token" : "token"
}
I’ve used the console to create an instance with the same AMI and profile, and also created and instance with terraform with same AMI profile, but not using the module, only an aws_instance resource, and both have the credentials configured on startup.
I couldn’t find anything in the module the would keep the instance profile from being assumed, but based on the troubleshooting I did, it only seem to happen when using the module.
Any help appreciated.
Terraform module for provisioning a general purpose EC2 host
I’ve seen the same issue before and did not know how to track it down
Terraform module for provisioning a general purpose EC2 host
Have you tried going through these troubleshooting steps?
https://aws.amazon.com/premiumsupport/knowledge-center/troubleshoot-aws-cli-commands-ec2/
Also there were issues yesterday with aws sso and today with iam. Even tho iam is global its all in the region us-east-1. So it’s possible this issue may resolve itself
yeah I tried those, and didn’t help much tbh. I’ve been having this for some time, but only now decided to get to the bottom of it
for posterity: I think I found what causes this: if the http_token in the metadata_options of the ec2_instance resource is set to “required” AND you have awscli version 1 on your instance, the temporary credentials will not be returned (awscli will return 401 for the /latest/meta-data/iam/security-credentials/ request) .
As per the documentation when it’s set to “optional” and the request has no token, it returns v1 credentials, but when “required” it returns v2 always.
https://docs.aws.amazon.com/cli/latest/reference/ec2/modify-instance-metadata-options.html
So either setting the http_token to optional, or updating the awscli to version 2 on the ami fixes this.
Ah that’s very good to know. It’s probably much easier to upgrade awscli to v2, no?
yeah, probaly that’s the way to go
2022-07-19
Any tips/links on how to setup a lambda -> slack/teams webhook for notifications of SNS topics/cloudwatch alarms? Using terraform over around a dozen AWS accounts?
I use AWS chatbot to send messages from Codepipeline to Slack using an SNS topic. Easy to set up and integrates smoothly with the rest of AWS, maybe that might be useful to you.
Ah, that’s a good idea. We do have a couple of users with slack, however the company uses teams, so we need to cover teams also, ideally.
@András Sándor that’s pretty cool. Do you use a terraform module to set that up? Is it public? I’ve been meaning to try out aws chatbot. I’d love to hear more about your set up
@Neil Caldwell there is this module that can help with setting up lambdas
A module for launching Lambda Fuctions
I think I set it up from the console, that was before I started replacing everything with terraform. It’s pretty much 4 resources: Chatbot, SNS Topic, and an IAM policy for each. But now that I checked it seems that TF doesn’t support Chatbot as a resource, and every chatbot TF module out there are just a Cloudformation stacks created with TF.
Havent used teams unfortunately, so have little idea for that.
2022-07-20
Hi everyone, I am trying to iterate over a list of strings (EKS cluster roles) and would like to have generate an output for the aws-auth
configmap if the role name matches with IAM role ARNs which are being iterated over in the outer loop. Since the IAM role includes a non-predictable random part, the match should be based on a substring, ideally provided by replacing the variable of the cluster role string. I am thinking of something like this (pseudo-code):
locals {
sso_arns = [for group in local.permission_sets.clustername : {
groups = [group]
username = "${group}:{{SessionName}}"
rolearn = toset([for role in data.aws_iam_roles.sso_roles.names : role if role == group])
}]
map = yamlencode(local.sso_arns)
}
I am unsure about the if condition to match role with group and think that regex filter might make more sense, but then I have the problem of a variable substitution within the expression and I have no idea if this is even possible with Terraform by doing something like this:
rolearn = toset([for role in data.aws_iam_roles.sso_roles.names : role if regex(".*${group}.*, role)])
Obviously, the part with ${group}
would need proper escaping but this is where I am lost and I’d really appreciate any help here. Thanks!
ok, posting my own solution here:
it worked by using length(regexall("${group}", role)) > 0]
instead of the regex
function.
Hello everybody. If you want to fully manage the EKS aws auth config map with terraform using clousposse eks module you can take a look at this PR: https://github.com/cloudposse/terraform-aws-eks-cluster/issues/155
Found a bug? Maybe our Slack Community can help.
Describe the Bug
When using the module to create an EKS cluster, I’m trying to add additional roles to the aws_auth
configmap. This only happens with the roles, adding additional users works perfectly.
The behavior changes depending on the kubernetes_config_map_ignore_role_changes
config.
• If I leave it as default (false), the worker roles are added, but not my additional roles. • If I change it to true, the worker roles are removed and my additional roles are added.
Expected Behavior
When adding map_additional_iam_roles
, those roles should appear on the aws_auth configmap, together with the worker roles when the kubernetes_config_map_ignore_role_changes
is set to false
.
Steps to Reproduce
Steps to reproduce the behavior:
- Create a cluster and workers with this config (there are some vars not changed, so it’s easy to use):
module "eks_cluster" {
region = var.region
source = "cloudposse/eks-cluster/aws"
version = "2.2.0"
name = var.name
vpc_id = var.vpc_id
subnet_ids = var.private_subnet_ids
endpoint_public_access = false
endpoint_private_access = true
kubernetes_version = var.kubernetes_version
kube_exec_auth_enabled = true
kubernetes_config_map_ignore_role_changes = false
map_additional_iam_roles = [
{
rolearn = "arn:aws:iam::<account-id>:role/myRole"
username = "added-role"
groups = ["system:masters"]
}
]
map_additional_iam_users = [
{
userarn = "arn:aws:iam::<account-id>:user/myUser"
username = "added-user"
groups = ["system:masters"]
}
]
}
module "eks_node_group" {
source = "cloudposse/eks-node-group/aws"
version = "2.4.0"
ami_release_version = [var.ami_release_version]
instance_types = [var.instance_type]
subnet_ids = var.private_subnet_ids
min_size = var.min_size
max_size = var.max_size
desired_size = var.desired_size
cluster_name = module.eks_cluster.eks_cluster_id
name = var.node_group_name
create_before_destroy = true
kubernetes_version = [var.kubernetes_version]
cluster_autoscaler_enabled = var.autoscaling_policies_enabled
module_depends_on = [module.eks_cluster.kubernetes_config_map_id]
}
Screenshots
This example shows when I change the variable kubernetes_config_map_ignore_role_changes
from false to true.
Also, I don’t see a difference in the map roles in the data block for both options except for the quoting.
Environment (please complete the following information):
Anything that will help us triage the bug will help. Here are some ideas:
• EKS version 1.22 • Module version 2.2.0
@Jeremy G (Cloud Posse)
Found a bug? Maybe our Slack Community can help.
Describe the Bug
When using the module to create an EKS cluster, I’m trying to add additional roles to the aws_auth
configmap. This only happens with the roles, adding additional users works perfectly.
The behavior changes depending on the kubernetes_config_map_ignore_role_changes
config.
• If I leave it as default (false), the worker roles are added, but not my additional roles. • If I change it to true, the worker roles are removed and my additional roles are added.
Expected Behavior
When adding map_additional_iam_roles
, those roles should appear on the aws_auth configmap, together with the worker roles when the kubernetes_config_map_ignore_role_changes
is set to false
.
Steps to Reproduce
Steps to reproduce the behavior:
- Create a cluster and workers with this config (there are some vars not changed, so it’s easy to use):
module "eks_cluster" {
region = var.region
source = "cloudposse/eks-cluster/aws"
version = "2.2.0"
name = var.name
vpc_id = var.vpc_id
subnet_ids = var.private_subnet_ids
endpoint_public_access = false
endpoint_private_access = true
kubernetes_version = var.kubernetes_version
kube_exec_auth_enabled = true
kubernetes_config_map_ignore_role_changes = false
map_additional_iam_roles = [
{
rolearn = "arn:aws:iam::<account-id>:role/myRole"
username = "added-role"
groups = ["system:masters"]
}
]
map_additional_iam_users = [
{
userarn = "arn:aws:iam::<account-id>:user/myUser"
username = "added-user"
groups = ["system:masters"]
}
]
}
module "eks_node_group" {
source = "cloudposse/eks-node-group/aws"
version = "2.4.0"
ami_release_version = [var.ami_release_version]
instance_types = [var.instance_type]
subnet_ids = var.private_subnet_ids
min_size = var.min_size
max_size = var.max_size
desired_size = var.desired_size
cluster_name = module.eks_cluster.eks_cluster_id
name = var.node_group_name
create_before_destroy = true
kubernetes_version = [var.kubernetes_version]
cluster_autoscaler_enabled = var.autoscaling_policies_enabled
module_depends_on = [module.eks_cluster.kubernetes_config_map_id]
}
Screenshots
This example shows when I change the variable kubernetes_config_map_ignore_role_changes
from false to true.
Also, I don’t see a difference in the map roles in the data block for both options except for the quoting.
Environment (please complete the following information):
Anything that will help us triage the bug will help. Here are some ideas:
• EKS version 1.22 • Module version 2.2.0
@Sebastian Macarescu Thank you very much for this PR.
My apologies if I said something misleading, but I really want to avoid using remote state in this module. My hope was that the kubernetes_config_map_v1_data
resource would remove the need for referencing remote state by merging the EKS created roles with the customer roles.
Please elaborate on your experience with how kubernetes_config_map_v1_data
works and fails to work in this scenario.
On short the kubernetes_config_map_v1_data
does what it is supposed to do: setup field ownership. In official k8s documentation: https://kubernetes.io/docs/reference/using-api/server-side-apply/#merge-strategy there is a list merge functionality but the code for kubernetes_config_map_v1_data
does not use that. Therefore the ownership is set on the list level not on individual items. In this case we can even remove kubernetes_config_map_v1_data
since it does not fix our issue.
FEATURE STATE: Kubernetes v1.22 [stable] Introduction Server Side Apply helps users and controllers manage their resources through declarative configurations. Clients can create and modify their objects declaratively by sending their fully specified intent. A fully specified intent is a partial object that only includes the fields and values for which the user has an opinion. That intent either creates a new object or is combined, by the server, with the existing object.
I can provide some examples on how the aws auth config map looks like on specific stages but on short it’s like this:
• on eks creation it contains only map_aditional_iam_roles
- ownership is set to terraform
• when you create eks managed node groups, it will contain the initial iam roles from the eks module plus the aws created iam roles - ownership is set to lambda
• when you update the map_aditional_iam_roles
and apply again it will fail because the lambda is the sole owner of the iam roles fiels in the config map. If you force the apply then it will remove the roles added by aws and reset ownership to terraform
So there is no any other way to achieve this without the self reference
OK, if we need the self-reference, then I do not think the switch to kubernetes_config_map_v1_data
is worth the migration/maintenance effort. You can see how we handle this for our clients here .
I agree with this. I will remove it from the PR and push the new code
@Sebastian Macarescu We are not going to accept a PR that uses remote state because there are too many different places to store remote state. We want to leave that management to the component using the module. What you could do that would be helpful is create an example in the examples/
directory that shows how to use remote state in the component to manage the roles in the auth map. Remember, we have an example of how to do that here but you probably want your example to not use our [remote-state.tf](http://remote-state.tf)
module. Just refactor the remote state stuff you already wrote to be in the component calling the module rather than in the module itself. (Or just leave and close the PR as is and close it. You have already contributed a lot, for which we are grateful.)
What if we make the module accept any state type? I would use your [remote-state.tf](http://remote-state.tf)
but that seems to be using yaml config. I’m not really sure how to use that with the current module as is
Correct, our [remote-state.tf](http://remote-state.tf)
gets its configuration from YAML files compatible with atmos
. Atmos is how Cloud Posse handles configuration for all its clients, but we do not want our open source Terraform modules to depend on it because we want them to be widely accessible/useful and do not want to force anyone to use atmos
. So that puts us in a bit of a bind: the only remote state access configuration we want to support is one we do not want in our open source modules. This is why I suggest you publish your solution as an example, rather than try to add it to the module itself.
@Jeremy G (Cloud Posse) I have refactored the PR and removed the remote state data source together with kubernetes_config_map_v1_data
. Instead I used a SSM parameter to remember the previous map_additional_iam_roles
. Can you take another look at it? Thanks
https://github.com/cloudposse/terraform-aws-eks-cluster/pull/157
what
• Use newer kubernetes_config_map_v1_data to force management of config map from a single place and have field ownership • Implement self reference as described here: #155 (comment) in order to detect if any iam roles were removed from terraform config • Preserve any existing config map setting that was added outside terraform; basically terraform will manage only what is passed in variables
why
Mainly the reasons from #155.
references
*closes #155
This helped me a lot when using terragrunt to separately deploy cloudposse’s eks and nodegroup modules
Hi Everyone, this might be a silly question but I’m trying to deploy an AWS EKS cluster and running into some issues.
Specifically, I have to use a custom AMI but when I specify it in terraform-aws-eks-node-group
module the nodes aren’t able to join the cluster. The custom AMI are built on top of existing EKS images with very light updates. Using the runbook mentioned here, it seems the /etc/eks/bootstrap.sh
script in the node wasn’t being run for some reason. I SSM’ed into the instance and had to run it manually by doing sudo /etc/eks/bootstrap.sh ${ClusterName}
Anyone know what’s the best way for me to trigger this script moving forward?
This one was interesting… so I just put the command I ran manually as a before step and it worked. I wonder if it’s sudo that made the difference. Anyways, please disregard. Should be working now
Trying to use Cloudposse SES module (https://registry.terraform.io/modules/cloudposse/ses/aws/latest)
Failing when calling the nested ses_user module. The user and user groups are all created as expected:
module.ses_user.awsutils_expiring_iam_access_key.default[0]: Creating...
╷
│ Error: Error creating access key for user <redacted>: NoSuchEntity: The user with name <redacted> cannot be found.
│ status code: 404, request id: 2075758c-b8a1-44ab-9c6d-d322f23de3fd
│
│ with module.ses_user.awsutils_expiring_iam_access_key.default[0],
│ on .terraform/modules/ses_user/main.tf line 24, in resource "awsutils_expiring_iam_access_key" "default":
│ 24: resource "awsutils_expiring_iam_access_key" "default" {
│
╵
Releasing state lock. This may take a few moments...
trying to use the module https://github.com/cloudposse/terraform-aws-dynamodb with global replicated tables and autoscaling. How do I turn on autoscaling in the other regions? it seems that autoscaling is only performed on the primary region.
Terraform module that implements AWS DynamoDB with support for AutoScaling
2022-07-21
hey team, need some help with CloudPosse kms key module
Error: "name" must begin with 'alias/' and be comprised of only [a-zA-Z0-9/_-]
│
│ with MODULENAME[0].aws_kms_alias.default[0],
│ on .terraform/modules/MODULENAME/main.tf line 15, in resource "aws_kms_alias" "default":
│ 15: name = coalesce(var.alias, format("alias/%v", module.this.id))
│
creating key using kms_key module, using conditional create using count
the alias appears to be the problem
if I take the alias away and try to use terraform aws_kms_alias
then it fails
if I leave alias in CloudPosse module then I get a plan but since I am using conditional, I have to create multiple aliases
well my understanding is that alias is a required input
but the docs say no
Hi! I have a question about the validation of after_cluster_joining_userdata
in the eks-nodes module. It’s a list of commands to run, but the validation condition is that the list has to be smaller than 2. What’s the point of this?
The variable definition is:
variable "after_cluster_joining_userdata" {
type = list(string)
default = []
description = "Additional `bash` commands to execute on each worker node after joining the EKS cluster (after executing the `bootstrap.sh` script). For more info, see <https://kubedex.com/90-days-of-aws-eks-in-production>"
validation {
condition = (
length(var.after_cluster_joining_userdata) < 2
)
error_message = "You may not specify more than one `after_cluster_joining_userdata`."
}
}
It clearly says additional commands (in plural)
Describe the Bug
The validation of the variable after_cluster_joining_userdata
expects the length of the array to be smaller than 2, but the definition says that it can contain commands in plural.
variable "after_cluster_joining_userdata" {
type = list(string)
default = []
description = "Additional `bash` commands to execute on each worker node after joining the EKS cluster (after executing the `bootstrap.sh` script). For more info, see <https://kubedex.com/90-days-of-aws-eks-in-production>"
validation {
condition = (
length(var.after_cluster_joining_userdata) < 2
)
error_message = "You may not specify more than one `after_cluster_joining_userdata`."
}
}
Expected Behavior
The variable to be a string or the validation to allow more than 1 element.
I’ve created a PR with what I think it could fix this https://github.com/cloudposse/terraform-aws-eks-node-group/pull/124
what
• Allow using multiple lines of before and after userdata
why
• As the bash commands are added to a file, I don’t see why they have to be only one line.
references
closes #123
@Julio Chana Sorry if this is confusing, but it follows the pattern that optional data is provided in a list of zero or one items. The userdata can be a multi-line string:
after_cluster_joining_userdata = [<<-EOT
# After joining the cluster, execute a couple of commands
echo foo
echo bar
EOT
]
I would prefer you use the input this way rather than extend the options for the input because we use this zero or one pattern a lot and adding exceptions, even helpful ones, seems in the long run to add more confusion than it does remove obstacles, especially across a library of modules as big as Cloud Posse’s.
OK?
2022-07-22
Hello everyone , I am looking to find how can we integrate Atlantis with Tfsec , does anyone has experience with it ? Thank you in advance
Hey @Ishan Sharma Atlantis per-se can’t be integrated with Tfsec.
Atlantis basically listens for Gitlab/Github Terraform pull requests webhooks, runs and comments pull request with output of plan
.
Tfsec is easily integrated into Github Actions workflow, but it can be also easily integrated into any other CI/CD systems as separate stage.
Thank you @Alan Kis for your answer. In the last few hours I had also came to similar conclusion. And used a Github action tfsec-pr-commenter-action however I have encountered another issue that this action does not write comments on the PR for the existing terraform code [ if not changed] and printing the error ; fe
Vault does not have purge protection enabled. .... not writing as not part of the current PR
There seems to be couple of issues opened on the topic as well but not really a solution ref : https://github.com/aquasecurity/tfsec-pr-commenter-action/issues/46
Have you encountered something similar ?
We have .tf files in various, nested subdirs and are seeing this error. The action is outputting nothing to the PR and seems to print this in the logs for every check.
For example:
No public access block so not restricting public buckets .... not writing as not part of the current PR
Bucket does not have a corresponding public access block. .... not writing as not part of the current PR
Using aquasecurity/[email protected]
you can run scripts or anything on an Atlantis workflow
you can customize your workflow create a custom image with Atlantis and tfsec and you just modify your plan command to do so
Atlantis integrates with infracost, terragrunt in the same way
In fact, that’s correct @jose.amengual . Kinda liked approach to put it in specific stage.
@Ishan Sharma Yes, the issue that you have mentioned is well known, I used different action, I’ll have a look and update here.
The easy solution is to cut a new docker container of atlantis to include tfsec
and any other tools. Then you can add a custom workflow
Hi everyone! I’m trying to build a AWS CodeBuild module but having some difficulties on how to set different environment variables for each project. Using the dynamic block for the environment variables:
dynamic "environment_variable" {
for_each = var.env_vars
content {
name = environment_variable.value.name
value = environment_variable.value.value
type = environment_variable.value.type
}
}
The variable block:
variable "env_vars" {
description = "Environment Variables for CodeBuild Project"
type = list(object(
{
name = string,
value = string,
type = string
}
))
default = []
}
The terraform.tfvars:
env_vars = [
{
name = "PROJECT-1"
value = "NAME-1"
type = "PLAINTEXT"
},
{
name = "REPOSITORY_URI"
value = "*.dkr.ecr.us-east-1.amazonaws.com/project-1"
type = "PLAINTEXT"
}
]
This environment variables would be for project-1, but how would I add different ones for project-2? I created the codebuild projects using index:
module.pipeline.aws_codebuild_project.codebuild-project[0] -> Project-1
module.pipeline.aws_codebuild_project.codebuild-project[1] -> Project-2
Has anyone else experienced the issue with the cloudposse/terraform-aws-eks-cluster module if setting oidc_provider_enabled = true
on a new EKS cluster deployment that a terraform plan
will fail with thumbprint_list = [join("", data.tls_certificate.cluster.*.certficates.0.sha1_fingerprint)]
and The given key does not identify an element in this collection value: the collection has no elements
. If I change to be oidc_provider_enabled = false
then the plan is able to execute successfully but anything further in the Terraform code being executed that expects OIDC provider will then fail.
Did you check the tf expression for count on data source tls_certificate.cluster, maybe there is a clue there, why the list of certs is empty.
Although it could also be the certificates
field of one of the tls_certificate.cluster
certs that is an empty list.
Of course it’s going to be empty @OliverS as it’s while attempting to stand up a brand new EKS cluster that doesn’t exist yet. Which is the whole point… stand it up and and enable OIDC provider
I’m having the same issue @Jeremy (UnderGrid Network Services). Were you able to solve this?
@Sol no I hadn’t resolved it. I did note that if I set it to false I can execute a plan but I’d rather not have to do 2 deploys if I can help it.
I am running the example made by Cloud Posse without modifying anything at all, and got the same error. I’m not sure what’s going on. Maybe someone from the Cloud Posse team knows? Invoking @Erik Osterman (Cloud Posse) as my PoC
I’m using the 1.2.5
Terraform version, and I’m seeing the same error with the 1.1.0
I wonder what happens if we would just run the tests in CI against main and see if it happens
I’d developed our EKS deployment incrementally and I believe I enabled the OIDC provider after it had been initially deployed. Now I was working to launch a new EKS cluster from scratch using what I’d written and encountered this problem. I know the EKS cluster module had an update but looking at the diff it didn’t show anything I thought would have changed this behavior from what I could see. I’d initially been using Teraform 1.1.9
but I’ve upgraded to 1.2.5
.
When I get started working in the morning I’ll go back and double check my existing deployment with Terraform 1.2.5
and the latest EKS Cluster module to ensure it’s still solid. I still haven’t launched the new cluster since the plan was failing. I’ve got some network firewall work being done by someone else in the VPC but that shouldn’t affect this but it should be resolved by tomorrow as well.
examples/complete
creates a cluster (even the VPC) from scratch and has oidc_provider_enabled = true
and our automated testing should not allow a release without that succeeding.
This has got to be a change in behavior of either Terraform or a Terraform provider. I’m guessing the provider.
Yes, this is a known bug in the TLS provider v4.0.0. Pushing a hot fix to disallow v4 for now.
Terraform CLI and Provider Versions
terraform
1.2.5
hashicorp/tls
4.0.0
hashicorp/aws
4.23.0
Terraform Configuration
data "tls_certificate" "this" {
url = aws_eks_cluster.this.identity[0].oidc[0].issuer
}
resource "aws_iam_openid_connect_provider" "this" {
client_id_list = ["sts.amazonaws.com"]
thumbprint_list = [data.tls_certificate.this.certificates[0].sha1_fingerprint]
url = aws_eks_cluster.this.identity[0].oidc[0].issuer
}
Expected Behavior
terraform plan
should work normally
Actual Behavior
terraform plan
fails with the following error
╷
│ Error: Invalid index
│
│ on ../../../eksv2/cluster/oidcp.tf line 7, in resource "aws_iam_openid_connect_provider" "this":
│ 7: thumbprint_list = [data.tls_certificate.this.certificates[0].sha1_fingerprint]
│ ├────────────────
│ │ data.tls_certificate.this.certificates is empty list of object
│
│ The given key does not identify an element in this collection value: the collection has no elements.
Steps to Reproduce
Add
data "tls_certificate" "this" {
url = aws_eks_cluster.this.identity[0].oidc[0].issuer
}
resource "aws_iam_openid_connect_provider" "this" {
client_id_list = ["sts.amazonaws.com"]
thumbprint_list = [data.tls_certificate.this.certificates[0].sha1_fingerprint]
url = aws_eks_cluster.this.identity[0].oidc[0].issuer
}
to your state and run terraform plan
How much impact is this issue causing?
High
Logs
No response
Additional Information
Works with downgrading hashicorp/tls
provider version to 3.4.0
.
This use-case is also shown as an example in https://registry.terraform.io/providers/hashicorp/tls/latest/docs/data-sources/certificate
Code of Conduct
☑︎ I agree to follow this project’s Code of Conduct
Thank you @Jeremy G (Cloud Posse) and @Erik Osterman (Cloud Posse)
Amazing, thank you again!
Thanks @Jeremy G (Cloud Posse) !!
Okay so the issue was the v4 AWS provider then?
@Jeremy (UnderGrid Network Services)
Okay so the issue was the v4 AWS provider then?
Close. The issue was the hashicorp/tls
v4 provider. See https://github.com/hashicorp/terraform-provider-tls/issues/244
Terraform CLI and Provider Versions
terraform
1.2.5
hashicorp/tls
4.0.0
hashicorp/aws
4.23.0
Terraform Configuration
data "tls_certificate" "this" {
url = aws_eks_cluster.this.identity[0].oidc[0].issuer
}
resource "aws_iam_openid_connect_provider" "this" {
client_id_list = ["sts.amazonaws.com"]
thumbprint_list = [data.tls_certificate.this.certificates[0].sha1_fingerprint]
url = aws_eks_cluster.this.identity[0].oidc[0].issuer
}
Expected Behavior
terraform plan
should work normally
Actual Behavior
terraform plan
fails with the following error
╷
│ Error: Invalid index
│
│ on ../../../eksv2/cluster/oidcp.tf line 7, in resource "aws_iam_openid_connect_provider" "this":
│ 7: thumbprint_list = [data.tls_certificate.this.certificates[0].sha1_fingerprint]
│ ├────────────────
│ │ data.tls_certificate.this.certificates is empty list of object
│
│ The given key does not identify an element in this collection value: the collection has no elements.
Steps to Reproduce
Add
data "tls_certificate" "this" {
url = aws_eks_cluster.this.identity[0].oidc[0].issuer
}
resource "aws_iam_openid_connect_provider" "this" {
client_id_list = ["sts.amazonaws.com"]
thumbprint_list = [data.tls_certificate.this.certificates[0].sha1_fingerprint]
url = aws_eks_cluster.this.identity[0].oidc[0].issuer
}
to your state and run terraform plan
How much impact is this issue causing?
High
Logs
No response
Additional Information
Works with downgrading hashicorp/tls
provider version to 3.4.0
.
This use-case is also shown as an example in https://registry.terraform.io/providers/hashicorp/tls/latest/docs/data-sources/certificate
Code of Conduct
☑︎ I agree to follow this project’s Code of Conduct
Ah okay that makes sense. Looks like the provider may have gotten a fix this morning
Has anyone use qovery’s terraform provider to provision infra, esp EKS cluster? Just curious about people’s experiences.
2022-07-24
2022-07-25
Hi all, I have provisioned an EKS cluster and two node-groups using the latest version of the cloudposse/eks-node-group/aws
module. The max allocatable pods for a t3a.large
node is 8 and I can not find a way to specify this value as the extra bootstrap arguments to be added to the bootstrap.sh script is removed in the last versions. Has anyone else experienced this issue so far?
*the max allocatable pods for a t3a.large
node is 35 when you create a node-group using the EKS console.
@Dimitris Kargatzis Are you saying you’re trying to change the max allocatable pods via the kubelet configuration via bootstrap.sh
?
Have you looked at before_cluster_joining_userdata?
@Matt Gowie Yes this is the case. I tried the bootstrap_additional_options = ["--max_pods=20"]
(this) input with no result. Can I use the before_cluster_joining_userdata for specifing the max allocatable pods via the kubelet configuration via bootstrap.sh
?
I am also curious to know why the default ‘max allocable pods’ is 8 and the default that AWS proposed for the t3a.large
is 35? Ideally we don’t want to specify this value as this is dependent on the node type and node-groups support multiple types e.g. if the node-group creates a medium node the ‘max allocatable pods’ for this node type is 17
@Dimitris Kargatzis not 100% what the deal is.
@RB might know when he comes online — I’ll defer to him so I don’t lead you astray.
It looks like you’re using --max_pods
but it seems like the correct flag name is --max-pods
see https://docs.aws.amazon.com/eks/latest/userguide/cni-increase-ip-addresses.html
–use-max-pods false –kubelet-extra-args ‘–max-pods=110’
Learn how to significantly increase the number of IP addresses that you can assign to pods on each Amazon EC2 node in your cluster.
That input bootstrap_additional_options
gets passed into
which then uses bootstrap.sh
https://github.com/awslabs/amazon-eks-ami/blob/master/files/bootstrap.sh
I only see the --use-max-pods
flag and it defaults to true
already
echo "--use-max-pods Sets --max-pods for the kubelet when true. (default: true)"
Thanks for your prompt response @RB, much appreciated! Can I set this value to false?
@RB I just tried the bootstrap_additional_options = ["--max-pods=20"]
, still the same. The node-groups creation failed with the following error
╷
│ Error: error waiting for EKS Node Group (quine-eks-prod-cluster:test-test-eks-prod-workers-marlin) to create: unexpected state 'CREATE_FAILED', wanted target 'ACTIVE'. last error: 1 error occurred:
│ * i-0e9c62de251df915b: NodeCreationFailure: Instances failed to join the kubernetes cluster
│
│
│
│ with module.cloudposse-eks.module.eks_test_node_group.aws_eks_node_group.cbd[0],
│ on .terraform/modules/cloudposse-eks.eks_test_node_group/main.tf line 185, in resource "aws_eks_node_group" "cbd":
│ 185: resource "aws_eks_node_group" "cbd" {
You can see the related module below
module "eks_test_node_group" {
source = "cloudposse/eks-node-group/aws"
version = "0.27.2"
subnet_ids = module.subnets.public_subnet_ids
cluster_name = module.eks_cluster.eks_cluster_id
instance_types = ["t3a.small", "t3a.medium", "t3a.large"]
desired_size = 1
min_size = 1
max_size = 2
environment = "test"
namespace = "test"
kubernetes_labels = {
"performance" : "test"
}
additional_tag_map = {
TerraformManaged = true
Performance = "test"
}
bootstrap_additional_options = ["--max-pods=20"]
# Auto-scaling discovery
cluster_autoscaler_enabled = var.cluster_autoscaler_enabled
# Prevent downtime - create the new node group before destroying the old one
create_before_destroy = true
# Prevent the node groups from being created before the Kubernetes aws-auth ConfigMap
module_depends_on = module.eks_cluster.kubernetes_config_map_id
context = module.this.context
}
This may be out of my expertise. I have seen something like this before but not in regarding to maximum pods per instance type. I will defer to @Jeremy G (Cloud Posse)
The error “Instances failed to join the kubernetes cluster” can have many causes. You would need to dig into the logs to figure out what exactly the problem is.
--max-pods
is not an argument to bootstrap.sh
. When --use-max-pods
is true, bootstrap.sh
will set --max-pods
on in the Kubelet config based on the value in /etc/eks/eni-max-pods.txt
.
I think what you want to be doing is setting:
bootstrap_additional_options = ["--use-max-pods=false"]
kubelet_additional_options = ["--max-pods=20"]
if [[ "$USE_MAX_PODS" = "true" ]]; then
echo "$(jq ".maxPods=$MAX_PODS" $KUBELET_CONFIG)" > $KUBELET_CONFIG
fi
Thanks @Jeremy (UnderGrid Network Services) for your input. However, I am not still able to create an EKS node-group using the bootstrap_additional_options
and kubelet_additional_options
inputs.
See my findings below
• An EKS node-group (AMI release version 1.22.9-20220629
) is created successfully without these inputs
• When I specify these inputs the cloudposse/eks-node-group/aws
module tries to create a new EKS node-group (AMI release version ami-0fd784d3523cda0fa
) and fails after ≈20 minutes with the following error message
╷
│ Error: error waiting for EKS Node Group (quine-eks-prod-cluster:prod-prod-eks-prod-workers-bird) to create: unexpected state 'CREATE_FAILED', wanted target 'ACTIVE'. last error: 1 error occurred:
│ * i-00fb87d7894617347: NodeCreationFailure: Instances failed to join the kubernetes cluster
│
│
│
│ with module.cloudposse-eks.module.eks_test_node_group.aws_eks_node_group.cbd[0],
│ on .terraform/modules/cloudposse-eks.eks_test_node_group/main.tf line 185, in resource "aws_eks_node_group" "cbd":
│ 185: resource "aws_eks_node_group" "cbd" {
│
I suggest looking at the instance logs to find out what errors are being generated. The default setting for a t3a.large is 35, so I don’t know why you are seeing 8 to begin with, and I suspect the problem is that whatever is causing you to see 8 is causing the setting of 20 to create a problem setting up the instance networking.
As a test, try setting --max-pods=6
and see if that works.
t3a.large 35
@Jeremy (UnderGrid Network Services) Same issue with --max-pods=6
. I am looking at the instance logs now. But, I am still wondering why a different AMI release version is created when I specify these inputs.
we see this regression happening in our infra - we also discovered that in eks 1.20 or early versions of 1.21 works but anything from late minor versions of 1.21 and 1.22+ is broken.
Hi again @Jeremy (UnderGrid Network Services) @Matt Gowie @RB.
I have created a new EKS node-group using the AWS console, same AMI release version and node type. The pods capacity is 35 as expected.
I have also tried exact the same Terraform implementation / config (using the cloudposse/eks-cluster/aws
and cloudposse/eks-node-group/aws
Terraform modules) with Kubernetes versions 1.21 and 1.20. Same EKS node-groups with t3a.large
node types were created. The pods capacity for the 1.21 Kubernetes version were again 8 and for 1.20 was 35 as the default one proposed by AWS.
*I have not defined any additional bootstrap or kubelet option.
Let me if you are aware of this issue otherwise I will create a new issue on GitHub.
If you can identify the solution we might be able to implement it relatively quickly, but this is not something we are going to be able to investigate to find the root cause anytime soon. Most likely it is some kind of change due to the AMI or VPC CNI or new/changed flags in Kubernetes 1.21.
it would be great to note this in the project, as it’s completely broken.
Auto-scaling does not work and also results in severe underutilisation of the cluster leading to massive cost increases.
@andylamp When you say “it’s completely broken”, do you mean you can no longer create EKS clusters with it?
no, we can - but using this in production is insane.
for example for a large node which costs a lot of money - only 8 pods can be used.
if we are talking about using a “toy” example, for sure - it is not broken. You can indeed provision eks clusters. But can they work in a production environment? Definitely not.
Please check that you are using the required version of the required add-ons. See https://docs.aws.amazon.com/eks/latest/userguide/managing-vpc-cni.html
Learn how to add, update, and remove the Amazon EKS Amazon VPC CNI plugin for Kubernetes
this resolution came from contacting AWS support about it.
I was not sure what was happening and we escalated to eks support about it.
@andylamp
this resolution came from contacting AWS support about it.
What resolution are you referring to?
that the tf module was suspect, we provisioned through regular eks aws modules and both pod capacity per node type worked as well as autoscaling.
not sure what exactly is happening and is wrong but the behaviour is very easy to replicate.
try to provision a cluster with 1.21 or 1.22 using clouseposse modules using large nodes (or greater) and see the node capacities that are reported after provision.
in all instances are 8
(and auto-scaling does not work)
we had to tweak the console settings in order to enable so we could debug what was happening.
What did your debugging find was the root cause of the problem?
nothing yet - we opted to use default aws modules as this was taking too much time.
same issue happens for me now
╷
│ Error: error waiting for EKS Node Group (ireland-staging-eks-shard1-cluster:ireland-staging-eks-shard1-workers-cardinal) to create: unexpected state 'CREATE_FAILED', wanted target 'ACTIVE'. last error: 1 error occurred:
│ NodeCreationFailure: Instances failed to join the kubernetes cluster
│
│
│
│ with module.eks_node_group.aws_eks_node_group.cbd[0],
│ on .terraform/modules/eks_node_group/main.tf line 195, in resource "aws_eks_node_group" "cbd":
│ 195: resource "aws_eks_node_group" "cbd" {
│
which logs I can check on the ec2? I can login to the ec2, no idea why it fails to join. using plain modules, both the cluster and the node group, with common settings nothing special might it help to specify a different AMI for the launch template somehow?
@Dmitry Shmakov I’ve just had a nearly 2 hour AWS support chat late last night because I had some nodes failing to join the cluster. Can you check the DHCP options for the VPC you’re attempting to place the cluster in?
hmm, it shows this
it’s a new VPC that gets created and peered with our internal one i use for devops access to other VPCs. the peering works so terraform keeps running and can work with the cluster (i’m applying some helm charts on the 2nd pass from another folder and it kinda works just to check i have access to api, but nodes never start so first pass of the infra fails at the nodes group step)
Okay so it isn’t the same issue I was encountering… You already have domain-name-servers
set to AmazonProvidedDNS
. The VPC I was trying to deploy into had a custom domain-name
and domain-name-servers
pointing to a Windows AD DC and nodes were unable to join until I setup domain-name-servers
to use the AWS provided.
i see, an interesting direction though, I just noticed that this new VPC has no DNS Hostnames enabled maybe that’s a reason, if a hostname of a node that tries to join from boot script is not what the cluster expects? will try that fix
oh yeah.. you have to have DNS hostnames and DNS resolution both enabled… all 3 of those settings are pre-reqs I was told
Glad that helped @Dimitris Kargatzis
Thank you @Jeremy (UnderGrid Network Services)
Hey here I’m pretty new to eks and I’m also facing the max-pod limitation that is below the node’s native pod limit. Although I’ve tried to follow the discussion, I’m currently not able to draw a conclusion.
As far as I understood, the node’s max pod value should default to the instance type’s native pod limit, but for some networking reasons it could not be applied. Is that correct?
If so, I’m wondering what in my terraform setup I need to check in order to not run into this issue.
Hello @Ben! I am not sure if this is related with networking issues. But, yes the node’s max pod capacity should be the same with the instance type’s pod capacity. I surpassed this issue by replacing the module provided by Cloudposse with related Terraform resources provided by AWS. Let me know if you are interested in this solution.
Hi @Dimitris Kargatzis, thank you! I already got it solved: https://sweetops.slack.com/archives/CCT1E7JJY/p1677056510367349?thread_ts=1676993089.822849&cid=CCT1E7JJY The main issue was, that I once did have the right config in terraform, but applying it did not fix the issue, as the old pod limit value was still baked in in the node group’s launch template. So I needed to trigger a launch template update, which seems kind of a workaround, but finally helped.
Yeah, thank you, @z0rc3r! As you suggested, updating the node group to the latest AMI version did the trick . Now my new nodes show the 98 pods limit, which is the expected number.
bubbling this question back to the top. https://sweetops.slack.com/archives/CB6GHNLG0/p1658352038677179
trying to use the module https://github.com/cloudposse/terraform-aws-dynamodb with global replicated tables and autoscaling. How do I turn on autoscaling in the other regions? it seems that autoscaling is only performed on the primary region.
@Eric Edgar It’s possible this isn’t supported by the module as the community hasn’t run into this use-case yet.
Do you know how you would implement this if you were developing your own Terraform module around Dynamo? Can you fork the existing module and add support for what you’re looking for?
trying to use the module https://github.com/cloudposse/terraform-aws-dynamodb with global replicated tables and autoscaling. How do I turn on autoscaling in the other regions? it seems that autoscaling is only performed on the primary region.
FYI — I don’t know much about Dynamo or this specific functionality. I’m just answering your question given you’re looking for something that likely isn’t there.
I am not sure how I would tackle it yet myself. as a workaround we would need to create a provider for each region and then add the autoscaling rule to that provider based on a data lookup.. crossing the regions is tricky in a module I would assume.
are there other modules that perform actions that cross regions?
Ah if it’s not a native Dynamo thing to enable via the existing provider then it’s not likely we would support it. Cloud Posse module usually stick to one region unless it’s truly crucial to the happy path implementation of that module. What you’re describing is likely non-happy path, but more advanced use-case.
I would suggest either updating your root module to handle your case OR you could form and do your changes inside the module and then maintain it there.
hrmm yeah I guess we might need to wrap the module based on that thought..
Because of how Terraform providers work, it’s not possible to define dynamic providers at runtime. So to create multi-region infrastructure, you generally want to use single-region modules and explicitly instantiate them across all required regions:
provider "aws" { region = "us-east-1" }
provider "aws" { region = "eu-west-1", alias = "ew1" }
provider "aws" { region = "ap-southeast-2", alias = "ase2" }
module "dynamodb_primary" {
# ...
}
module "dynamodb_replica_ew1" {
providers = { aws = aws.ew1 }
source_table = module.dynamodb_primary.table_arn
# ...
}
module "dynamodb_replica_as2" {
providers = { aws = aws.ase2 }
source_table = module.dynamodb_primary.table_arn
# ...
}
(or something. I don’t know what the module’s interface/dynamodb’s API for global replication looks like)
2022-07-26
I remember hearing in one of the meetups that cloudposse has a configuration component (I can’t remember if it is a provider, module or just scripts), what is it called?
we have a yaml configuration that is deep merged by atmos
to create tfvars, generate a backend, select a workspace dynamically, and run terraform
commands
https://github.com/cloudposse/atmos
we use root modules that we call components
to create infrastructure. we upstream our components here.
https://github.com/cloudposse/terraform-aws-components
you can learn more about this setup here
here is an example of how we set this up with customers
https://github.com/cloudposse/atmos/tree/master/examples/complete
The Cloud Posse Terraform Provider for various utilities (e.g. deep merging, stack configuration management)
This is what does the YAML deep merging
So basically, we have a CLI, and a provider
(we will move moving the functionality into an atmos
provider, but haven’t gotten back to it)
definitely interesting as there is significant overlap with how I’ve been doing things but custom for each project
Not enough docs though, so i’d have to experiment, I’ll see… is there perhaps a recording of a demo somewhere?
I’m particularly interested in the deepmerge of yaml configuration files to generate tfvars files, BUT ALSO the generation of the corresponding config schema in variables.tf. Terraform does not make that easy.
2022-07-27
v1.2.6 1.2.6 (July 27, 2022) ENHANCEMENTS: Add a warning and guidance when terraform init fails to fully populate the .terraform.lock.hcl file. (#31399) Add a direct link to the relevant documentation when terraform init fails on missing checksums. (<a href=”https://github.com/hashicorp/terraform/issues/31408“…
This PR addresses part of the concerns raised in #29958 It also builds on the change in #31389 terraform init will now print a warning when any provider will only have a single checksum written int…
terraform init will now tell users to try terraform providers lock whenever it cannot download a provider due to a checksum mismatch. The error message explaining a provider could not be downloaded…
crazy question - is there a way to get the running version of Terraform in HCL?
(besides data external)
There is a terraform provider but it only has a data source to read from remote state
https://registry.terraform.io/providers/hashicorp/terraform/latest/docs
you mentioned the data external provider and that may be the best way to get the terraform version
what is the problem you’re trying to solve ?
You can set the allowed terraform versions to run a configuration. That might do what you need
This is a CLI hack but it works:
Code:
variable "tf_version" {
type = string
default = "v1.2.6"
}
output "tf_version" {
value = var.tf_version
}
CLI:
tf apply -auto-approve -var="tf_version=$(terraform -version | head -1 | awk '/Terraform/{ print $2 }')"
Output:
tf_version = "v1.2.6"
Switch to a different version (ie, using tfswitch
) and rerun:
10:43 $ tf apply -auto-approve -var="tf_version=$(terraform -version | head -1 | awk '/Terraform/{ print $2 }')"
Changes to Outputs:
~ tf_version = "v1.2.6" -> "v1.1.8"
You can apply this plan to save these new output values to the Terraform state, without changing any real
infrastructure.
Apply complete! Resources: 0 added, 0 changed, 0 destroyed.
Outputs:
tf_version = "v1.1.8"
I have the same question for modules… We had a simple use case: To label all resources with the version of the module that last modified them. We thought we had a (domain specific) solution using the git provider to reflect the repo that was pulled by terraform, but we found it to be unreliable because of some implementation details in Terraform core. Anyone ever come across this?
Simple answer is to set it as an input variable
But if you want a module to tag its own resources with its version, you can hardcode the version into the module
@Robert Jordan you might be able to use something like Yor from Bridgecrew https://github.com/bridgecrewio/yor
Extensible auto-tagger for your IaC files. The ultimate way to link entities in the cloud back to the codified resource which created it.
I guess I forgot to mention the automatic tagging and release part of the problem… We did look into Yor, but it didn’t quite do what we needed. I think we actually could have done it by running a release step that does as @Alex Jurkiewicz suggests and hardcode it automatically. ah well… next time. Thanks for the suggestions!
Hi I keep getting the below issue in the terraform cloud.
Any help would be highly appreciated, as it is blocking the pipeline.
Error: Invalid Configuration for Read-Only Attributewith module.ec2_client_vpn.module.self_signed_cert_ca.tls_self_signed_cert.default
on .terraform/modules/ec2_client_vpn.self_signed_cert_ca/main.tf line 62, in resource "tls_self_signed_cert" "default":
key_algorithm = var.private_key_algorithm
Cannot set value for this attribute as the provider has marked it as read-only. Remove the configuration line setting the value.
Refer to the provider documentation or contact the provider developers for additional information about configurable and read-only attributes that are supported.
Try pinning the tls provider too
Id pin it to ~> 3
https://registry.terraform.io/providers/hashicorp/tls/latest/docs
Terraform version 1.1.5
source = “cloudposse/ec2-client-vpn/aws” version = “0.10.1”
terraform {
required_version = ">= 1.0.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = ">= 4.21.0"
}
random = {
source = "hashicorp/random"
version = ">= 3.1.0"
}
awsutils = {
source = "cloudposse/awsutils"
version = ">= 0.11.0"
}
}
}
2022-07-28
hi all, please can someone give me a hint on how I can approach in writing a module for
https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/backup_framework
https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/backup_report_plan
A nudge in the right direction would be of great help
Why do you want to write a module?
a) each framework will be addressing a different plan as each plan has diff setting like backup frequency and backup duration
b) the report looks at framework and stores a copy in csv/json format but has the capability to report on various frameworks
I have written terraform code for frameworks but its like 12 different frameworks for 12 different backup plans
also, I am not getting anywhere with backup_report_plan
if there’s any other way to get around it, I am happy to go down that road @Alex Jurkiewicz
@dan callum check out : https://github.com/cloudposse/terraform-aws-backup
Terraform module to provision AWS Backup, a fully managed backup service that makes it easy to centralize and automate the back up of data across AWS services such as EBS volumes, RDS databases, DynamoDB tables, EFS file systems, and AWS Storage Gateway volumes.
thanks @Matt Gowie I’ve looked at that, that is for backup
I am looking for backup framework and backup reports
You might need to be more specific about the technical problem you are having. I and others probably don’t have experience with backup framework, but we can advise on Terraform’s language features
i think I got the framework part alright, it is not at cloudposse level but it does what I want
the reporting is what I need some help with
I have to word this better… for getting help
2022-07-29
Terraform module to provision an Elastic MapReduce (EMR) cluster on AWS
Does this module support instance fleet?
Not sure many of us know what it supports or doesn’t support. You’ll have to dig into the code + variables to figure that out.
If doesn’t include support for what you’re looking for then be sure to add support on a fork and put it up on PR!
Hi all, is there any way to provide self- service mechanism for app teams to publish terraform module into a central private registry? assuming app team don’t have direct access to publish in Terraform, only DevOps team can publish
Uploading modules to s3. App ppl can download the modules but not upload controlled by iam policies?
in this case, some app teams do source their custom modules directly from Github on-prem. some do source directly from third party sources e.g Hashi public registry
we just migrated from multi-org TFE to single org TFC.
some app folks are asking for easy way to publish modules without needing to consult DevOps team
@olad I guess you should look at all the rest of the TACOS out there that might be a better fit for your needs - this blog post might help you as well - https://medium.com/@elliotgraebert/four-great-alternatives-to-hashicorps-terraform-cloud-6e0a3a0a5482
It was also talked about in the latest Office hours - https://youtu.be/gWbBF-bflPw?t=925
- Disclaimer - I am the CTO and co-founder of env0
Picking the Best IaC CI Platform
@olad something we’ve specifically built is a first class module repository, since it is built on top of authress it has really flexible permissions and role management policies. If you want a link or a demo, let me know.
looks like only org owner can publish and delete modules. unfortunately for us we only have a single org for all app teams. so devops admins will always be needed to publish modules to the registry.
@olad we are actually working on a more fined grain RBAC to enable to give users the specific roles to publish and delete modules from the private module registry.
2022-07-30
2022-07-31
is there a way around using count & for_each in a module? I know that both cannot be used but what I am trying to do is, write a module, which will iterate over a map and create resources using for_each but at the same time, also using a count to check a boolean value in the map to conditionally create the resource
if the boolean value is false then no creating and if true then create
any suggestions/work arounds?
easy peasy
module hello_world {
source = "./modules/hello"
for_each = is_this_true ? local.the_values : []
}
basically pass an empty set to the module if your condition is not meet, otherwise pass the value that needs to be iterated.