#terraform (2024-03)
Discussions related to Terraform or Terraform Modules
Archive: https://archive.sweetops.com/terraform/
2024-03-01
Trying to get a bit more awareness in the Terraform community that state files need to be well secured. If anyone is interested, I’m happy to share some research I published going from state file edit access to code execution in a pipeline.
Is this with native terraform or does the research also mention opentofu?
I read recently opentofu encrypts the state so secrets can be used there somewhat safer than terraform
#309 was the first change in Terraform that I could find that moved to store sensitive values in state files, in this case the password
value for Amazon RDS. This was a bit of a surprise for me, as previously I’ve been sharing our state files publicly. I can’t do that now, and feel pretty nervous about the idea of storing state files in version control at all (and definitely can’t put them on github or anything).
If Terraform is going to store secrets, then some sort of field-level encryption should be built in as well. In the meantime, I’m going to change things around to use https://github.com/AGWA/git-crypt on sensitive files in my repos.
Here’s the opentofu RFC on client-side state encryption, https://github.com/opentofu/opentofu/issues/874
Summary
This feature adds the option to encrypt local state files, remote state, and plan files. Encryption is off-by-default.
Partial encryption, when enabled, only encrypts values marked as sensitive to protect credentials contained in the
state. Full encryption, when enabled, protects against any information disclosure from leaked state or plans.
Problem Statement
OpenTofu state and plans contain lots of sensitive information.
The most obvious example are credentials such as primary access keys to storage, but even ignoring any credentials
state often includes a full map of your network, including every VM, kubernetes cluster, database, etc.
That is a treasure trove for an attacker who wishes to orient themselves in your private network.
Unlike runtime information processed by OpenTofu, which only lives in memory and is discarded when the run ends,
state and plans are persisted. In large installations, state is not (just) stored in local files because multiple
users need access to it. Remote state backend options include simple storage (such as storage accounts, various
databases, …), meaning these storage options do not “understand” the state, but there are also extended backends,
which do wish to gain information from state. The persistent nature and (often) cloud storage of state increases
the risk of it falling into the wrong hands.
Large corporations and financial institutions have compliance requirements for storage of sensitive information.
One frequent requirement is encryption at rest using a customer managed key. This is exactly what this feature
provides, and if you use it intelligently, even the cloud provider storing your state will not have access to the
encryption key at all.
User-facing description
OpenTofu masks sensitive values in its printed output, but those very same sensitive values are written to state:
example snippet from a statefile for an Azure storage account with the primary access key (of course not a real one)
Pay particular attention to the line listing the primary access key. The storage account listed here doesn’t exist,
but if it did, the primary access key would give an attacker full access to all the data on the storage account.
Getting Started
Note: the exact format of the configuration is likely to change as we test out the implementation and figure out
the precise details. So don’t rely too much on exact field names or format of the contents at this point in time.
With the feature this RFC is about, you could simply set an environment variable before running OpenTofu:
export TF_STATE_ENCRYPTION='{"backend":{"method":{"name":"full"},"key_provider":{"name":"passphrase","config":{"passphrase":"foobarbaz"}}}}'
For readability let’s spell out the value of the environment variable even though you wouldn’t normally set it like this:
export TF_STATE_ENCRYPTION='''{
"backend": {
"method": {
"name": "full"
},
"key_provider": {
"name": "passphrase",
"config": {
"passphrase": "foobarbaz"
}
}
}
}'''
And suddenly, your remote state looks like this:
{
"encryption": {
"version": 1,
"method": {
"name": "full",
"config": {}
}
},
"payload": "e93e3e7ad3434055251f695865a13c11744b97e54cb7dee8f8fb40d1fb096b728f2a00606e7109f0720aacb15008b410cf2f92dd7989c2ff10b9712b6ef7d69ecdad1dccd2f1bddd127f0f0d87c79c3c062e03c2297614e2effa2fb1f4072d86df0dda4fc061"
}
This is the same state as before, only fully encrypted with AES256 using a key derived from the passphrase you provided.
Actually, most of the settings shown in the environment variable have sensible defaults, so this also works:
export TF_STATE_ENCRYPTION='''{
"backend": {
"key_provider": {
"config": {
"passphrase": "foobarbaz"
}
}
}
}'''
You can also specify the 32-byte key directly instead of providing a passphrase:
export TF_STATE_ENCRYPTION='''{
"backend": {
"method": {
"name": "full"
},
"key_provider": {
"name": "direct",
"config": {
"key": "a0a1a2a3a4a5a6a7a8a9b0b1b2b3b4b5b6b7b8b9c0c1c2c3c4c5c6c7c8c9d0d1"
}
}
}
}'''
Whether you use a passphrase or directly provide the key, it comes from an environment variable. Even if your state
is stored in another storage account, noone outside your organisation would have the encryption key.
Your users that run OpenTofu will need it, though.
Better yet, the key can also come from AWS KMS, all you’d need to change for that is the environment variable value:
export TF_STATE_ENCRYPTION='''{
"backend": {
"method": {
"name": "full"
},
"key_provider": {
"name": "awskms",
"config": {
"region": "us-east-1",
"key_id": "alias/terraform"
}
}
}
}'''
Or retrieve your encryption key from an Azure Key Vault, or GCP Key Mgmt, or Vault. Of course, if you retrieve
the key from the cloud provider your state storage is located at, they have both the state and the key now, so
maybe don’t use the same cloud provider if you worry about attacks from their side (or from government actors):
Using external key retrieval options allows you to place the equivalent configuration in the
remote state configuration, so the configuration is checked in with your code, and still be
properly secure, because now the configuration does not need to include the actual encryption key.
Instead of full state encryption, you can have just the sensitive values encrypted in the state:
export TF_STATE_ENCRYPTION=TODO example
This will make your state look almost exactly like the original unencrypted state, so you can still easily doctor it if
you need to, except that the primary access key is now encrypted, and that the encryption
section is present.
{
"encryption": {
"version": 1,
"methods": {
"encrypt/SOPS/xyz": {
...
}
}
},
TODO
}
Once Your State Is Encrypted
State encryption is completely transparent. All OpenTofu commands work exactly the same, even tofu state push
and
tofu state pull
work as expected. The latter downloads the state, and prints it in decrypted form, which is useful
if you ever run into the need to manually doctor your state. Lately, that need has become much rarer than it
used to be.
Since the configuration can be set in environment variables, wrappers like Terragrunt work just fine. As do typical
CI systems for OpenTofu such as Atlantis.
Note: We will need to test whether it is possible to use multiple different encryption keys with terragrunt. It may
be that within the same tree, you must stick to one key. We know from experience, that terragrunt run-all
works
in that scenario.
If your CI system is more involved and insists on reading your state contents, you can’t use full state encryption.
You may still be able to use partial state encryption, configuring it to only encrypt the sensitive values. This will still
prevent exposing your passwords to both the CI system and the state storage, greatly frustrating any threat actors
trying to get into your infrastructure through those attack vectors.
If you want to rotate state encryption keys, or even switch state encryption methods, there is a second
environment variable called TF_STATE_DECRYPTION_FALLBACK
. This one is tried for decryption if the primary
configuration in TF_STATE_ENCRYPTION
fails to decrypt your state successfully. Encryption, unlike decryption, always
uses only the primary configuration, so you can use this to rotate your key on the next write operation.
Unencrypted state is recognized and automatically bypasses the decryption step. That’s what happens during initial
encryption, or if for some other reason your state happens to be currently unencrypte…
What can an attacker do if they can edit Terraform state? The answer should be ‘nothing’ but is actually ‘take over your CI/CD pipeline’.
One of the tofu maintainers reached out so I assume it works similarly.
Question. In atmos.yaml we are using “auto_generate_backend_file: true” and seeing in S3 atmos creating a folder for component then a sub-folder for stack, which it them places the terraform.tf into. When we run the same layout of components/stacks/config against GCP GCS we are seeing only a state file be created, no folders, example core-usc1-auto.tfstate which it is renaming the terraform.tf to which is the name of the stage. Has anyone seen this behaviour or can advise? Thanks
did you correctly configure GCP backend in Atmos manifests? something like this:
terraform:
# Backend
backend_type: gcs
backend:
gcs:
bucket: "xxxxxxx-bucket-tfstate"
prefix: "terraform/tfstate"
(also, just a heads up, atmos is best for these questions)
Anyone with some insight into the Cloudposse AWS modules or routing in S2S VPN Connections in general I posted a question here https://sweetops.slack.com/archives/CDYGZCLDQ/p1709301177485109
Greetings everyone. I’m using the cloudposse/vpn-connection/aws
module and I’m facing some issues that I really don’t understand..
My module code is as follows
module "vpn_connection" {
source = "cloudposse/vpn-connection/aws"
version = "1.0.0"
namespace = var.namespace
stage = var.env
name = var.vpn_connection_name
vpc_id = var.vpc_id
vpn_gateway_amazon_side_asn = var.amazon_asn
customer_gateway_bgp_asn = var.customer_asn
customer_gateway_ip_address = var.customer_gateway_ip_address
route_table_ids = var.route_table_ids
vpn_connection_static_routes_only = true
vpn_connection_static_routes_destinations = [var.vpn_connection_static_routes_destinations]
vpn_connection_local_ipv4_network_cidr = var.vpn_connection_static_routes_destinations
vpn_connection_remote_ipv4_network_cidr = var.vpc_cidr
}
route_table_ids
should contain a single element found using https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/route_tables and vpn_connection_static_routes_destinations
is a simple ipv4 cidr coming in as a string
The ‘calling’ of the module
module "vpn-connection" {
source = "../../modules/vpn-connection"
namespace = var.namespace
env = var.environment
vpn_connection_name = var.vpn_connection_name
vpc_id = module.staging-vpc.vpc_id
amazon_asn = var.amazon_asn
customer_asn = var.customer_asn
customer_gateway_ip_address = var.customer_gateway_ip_address
route_table_ids = data.aws_route_tables.route_tables_for_vpn_connection_to_public_subnets.ids
vpn_connection_static_routes_destinations = var.vpn_connection_static_routes_destinations
vpc_cidr = var.vpc_cidr
}
Should I not in the route tables inside route_table_ids see a non-propagated / aka static route to the contents of var.vpn_connection_static_routes_destinations
I see Route propagation set to No under the Route table which is also what I want..
But where’s my static route?
Anyone with a good example of how to structure ECS resources in Terraform.
Looking to soon build an AWS ECS Fargate Cluster, numerous services (some utilizing CloudMap) and numerous tasks.
How do I organise the task definitions in the code and make use of templating as much as possible?
My idea was to use the following resource types, but I’m in doubt of the structure and what makes the most sense
data template_file reference .tpl file in another folder
aws_ecs_task_definition -> container_definitions = data.template_file.shop.rendered
aws_ecs_service -> task_definition = aws_ecs_task_definition.shop.arn
aws_appautoscaling_target
aws_appautoscaling_policy
For the external facing containers I guess I’d also need a lot of ALB resources, I was hoping a module could help me here…
Was initially looking towards this module: https://github.com/terraform-aws-modules/terraform-aws-ecs/blob/master/examples/fargate/main.tf
Any other recommendations or perhaps a repo I can peek at or a blogpost or something similar?
provider "aws" {
region = local.region
}
data "aws_availability_zones" "available" {}
locals {
region = "eu-west-1"
name = "ex-${basename(path.cwd)}"
vpc_cidr = "10.0.0.0/16"
azs = slice(data.aws_availability_zones.available.names, 0, 3)
container_name = "ecsdemo-frontend"
container_port = 3000
tags = {
Name = local.name
Example = local.name
Repository = "<https://github.com/terraform-aws-modules/terraform-aws-ecs>"
}
}
################################################################################
# Cluster
################################################################################
module "ecs_cluster" {
source = "../../modules/cluster"
cluster_name = local.name
# Capacity provider
fargate_capacity_providers = {
FARGATE = {
default_capacity_provider_strategy = {
weight = 50
base = 20
}
}
FARGATE_SPOT = {
default_capacity_provider_strategy = {
weight = 50
}
}
}
tags = local.tags
}
################################################################################
# Service
################################################################################
module "ecs_service" {
source = "../../modules/service"
name = local.name
cluster_arn = module.ecs_cluster.arn
cpu = 1024
memory = 4096
# Enables ECS Exec
enable_execute_command = true
# Container definition(s)
container_definitions = {
fluent-bit = {
cpu = 512
memory = 1024
essential = true
image = nonsensitive(data.aws_ssm_parameter.fluentbit.value)
firelens_configuration = {
type = "fluentbit"
}
memory_reservation = 50
user = "0"
}
(local.container_name) = {
cpu = 512
memory = 1024
essential = true
image = "public.ecr.aws/aws-containers/ecsdemo-frontend:776fd50"
port_mappings = [
{
name = local.container_name
containerPort = local.container_port
hostPort = local.container_port
protocol = "tcp"
}
]
# Example image used requires access to write to root filesystem
readonly_root_filesystem = false
dependencies = [{
containerName = "fluent-bit"
condition = "START"
}]
enable_cloudwatch_logging = false
log_configuration = {
logDriver = "awsfirelens"
options = {
Name = "firehose"
region = local.region
delivery_stream = "my-stream"
log-driver-buffer-limit = "2097152"
}
}
linux_parameters = {
capabilities = {
drop = [
"NET_RAW"
]
}
}
memory_reservation = 100
}
}
service_connect_configuration = {
namespace = aws_service_discovery_http_namespace.this.arn
service = {
client_alias = {
port = local.container_port
dns_name = local.container_name
}
port_name = local.container_name
discovery_name = local.container_name
}
}
load_balancer = {
service = {
target_group_arn = module.alb.target_groups["ex_ecs"].arn
container_name = local.container_name
container_port = local.container_port
}
}
subnet_ids = module.vpc.private_subnets
security_group_rules = {
alb_ingress_3000 = {
type = "ingress"
from_port = local.container_port
to_port = local.container_port
protocol = "tcp"
description = "Service port"
source_security_group_id = module.alb.security_group_id
}
egress_all = {
type = "egress"
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
service_tags = {
"ServiceTag" = "Tag on service level"
}
tags = local.tags
}
################################################################################
# Supporting Resources
################################################################################
data "aws_ssm_parameter" "fluentbit" {
name = "/aws/service/aws-for-fluent-bit/stable"
}
resource "aws_service_discovery_http_namespace" "this" {
name = local.name
description = "CloudMap namespace for ${local.name}"
tags = local.tags
}
module "alb" {
source = "terraform-aws-modules/alb/aws"
version = "~> 9.0"
name = local.name
load_balancer_type = "application"
vpc_id = module.vpc.vpc_id
subnets = module.vpc.public_subnets
# For example only
enable_deletion_protection = false
# Security Group
security_group_ingress_rules = {
all_http = {
from_port = 80
to_port = 80
ip_protocol = "tcp"
cidr_ipv4 = "0.0.0.0/0"
}
}
security_group_egress_rules = {
all = {
ip_protocol = "-1"
cidr_ipv4 = module.vpc.vpc_cidr_block
}
}
listeners = {
ex_http = {
port = 80
protocol = "HTTP"
forward = {
target_group_key = "ex_ecs"
}
}
}
target_groups = {
ex_ecs = {
backend_protocol = "HTTP"
backend_port = local.container_port
target_type = "ip"
deregistration_delay = 5
load_balancing_cross_zone_enabled = true
health_check = {
enabled = true
healthy_threshold = 5
interval = 30
matcher = "200"
path = "/"
port = "traffic-port"
protocol = "HTTP"
timeout = 5
unhealthy_threshold = 2
}
# There's nothing to attach here in this definition. Instead,
# ECS will attach the IPs of the tasks to this target group
create_attachment = false
}
}
tags = local.tags
}
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "~> 5.0"
name = local.name
cidr = local.vpc_cidr
azs = local.azs
private_subnets = [for k, v in local.azs : cidrsubnet(local.vpc_cidr, 4, k)]
public_subnets = [for k, v in local.azs : cidrsubnet(local.vpc_cidr, 8, k + 48)]
enable_nat_gateway = true
single_nat_gateway = true
tags = local.tags
}
These root level components might help
https://github.com/cloudposse/terraform-aws-components/tree/main/modules/ecs
https://github.com/cloudposse/terraform-aws-components/tree/main/modules/ecs-service
If you use atmos, you can reuse the code per region-account using yaml inputs
provider "aws" {
region = local.region
}
data "aws_availability_zones" "available" {}
locals {
region = "eu-west-1"
name = "ex-${basename(path.cwd)}"
vpc_cidr = "10.0.0.0/16"
azs = slice(data.aws_availability_zones.available.names, 0, 3)
container_name = "ecsdemo-frontend"
container_port = 3000
tags = {
Name = local.name
Example = local.name
Repository = "<https://github.com/terraform-aws-modules/terraform-aws-ecs>"
}
}
################################################################################
# Cluster
################################################################################
module "ecs_cluster" {
source = "../../modules/cluster"
cluster_name = local.name
# Capacity provider
fargate_capacity_providers = {
FARGATE = {
default_capacity_provider_strategy = {
weight = 50
base = 20
}
}
FARGATE_SPOT = {
default_capacity_provider_strategy = {
weight = 50
}
}
}
tags = local.tags
}
################################################################################
# Service
################################################################################
module "ecs_service" {
source = "../../modules/service"
name = local.name
cluster_arn = module.ecs_cluster.arn
cpu = 1024
memory = 4096
# Enables ECS Exec
enable_execute_command = true
# Container definition(s)
container_definitions = {
fluent-bit = {
cpu = 512
memory = 1024
essential = true
image = nonsensitive(data.aws_ssm_parameter.fluentbit.value)
firelens_configuration = {
type = "fluentbit"
}
memory_reservation = 50
user = "0"
}
(local.container_name) = {
cpu = 512
memory = 1024
essential = true
image = "public.ecr.aws/aws-containers/ecsdemo-frontend:776fd50"
port_mappings = [
{
name = local.container_name
containerPort = local.container_port
hostPort = local.container_port
protocol = "tcp"
}
]
# Example image used requires access to write to root filesystem
readonly_root_filesystem = false
dependencies = [{
containerName = "fluent-bit"
condition = "START"
}]
enable_cloudwatch_logging = false
log_configuration = {
logDriver = "awsfirelens"
options = {
Name = "firehose"
region = local.region
delivery_stream = "my-stream"
log-driver-buffer-limit = "2097152"
}
}
linux_parameters = {
capabilities = {
drop = [
"NET_RAW"
]
}
}
memory_reservation = 100
}
}
service_connect_configuration = {
namespace = aws_service_discovery_http_namespace.this.arn
service = {
client_alias = {
port = local.container_port
dns_name = local.container_name
}
port_name = local.container_name
discovery_name = local.container_name
}
}
load_balancer = {
service = {
target_group_arn = module.alb.target_groups["ex_ecs"].arn
container_name = local.container_name
container_port = local.container_port
}
}
subnet_ids = module.vpc.private_subnets
security_group_rules = {
alb_ingress_3000 = {
type = "ingress"
from_port = local.container_port
to_port = local.container_port
protocol = "tcp"
description = "Service port"
source_security_group_id = module.alb.security_group_id
}
egress_all = {
type = "egress"
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
service_tags = {
"ServiceTag" = "Tag on service level"
}
tags = local.tags
}
################################################################################
# Supporting Resources
################################################################################
data "aws_ssm_parameter" "fluentbit" {
name = "/aws/service/aws-for-fluent-bit/stable"
}
resource "aws_service_discovery_http_namespace" "this" {
name = local.name
description = "CloudMap namespace for ${local.name}"
tags = local.tags
}
module "alb" {
source = "terraform-aws-modules/alb/aws"
version = "~> 9.0"
name = local.name
load_balancer_type = "application"
vpc_id = module.vpc.vpc_id
subnets = module.vpc.public_subnets
# For example only
enable_deletion_protection = false
# Security Group
security_group_ingress_rules = {
all_http = {
from_port = 80
to_port = 80
ip_protocol = "tcp"
cidr_ipv4 = "0.0.0.0/0"
}
}
security_group_egress_rules = {
all = {
ip_protocol = "-1"
cidr_ipv4 = module.vpc.vpc_cidr_block
}
}
listeners = {
ex_http = {
port = 80
protocol = "HTTP"
forward = {
target_group_key = "ex_ecs"
}
}
}
target_groups = {
ex_ecs = {
backend_protocol = "HTTP"
backend_port = local.container_port
target_type = "ip"
deregistration_delay = 5
load_balancing_cross_zone_enabled = true
health_check = {
enabled = true
healthy_threshold = 5
interval = 30
matcher = "200"
path = "/"
port = "traffic-port"
protocol = "HTTP"
timeout = 5
unhealthy_threshold = 2
}
# There's nothing to attach here in this definition. Instead,
# ECS will attach the IPs of the tasks to this target group
create_attachment = false
}
}
tags = local.tags
}
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "~> 5.0"
name = local.name
cidr = local.vpc_cidr
azs = local.azs
private_subnets = [for k, v in local.azs : cidrsubnet(local.vpc_cidr, 4, k)]
public_subnets = [for k, v in local.azs : cidrsubnet(local.vpc_cidr, 8, k + 48)]
enable_nat_gateway = true
single_nat_gateway = true
tags = local.tags
}
2024-03-02
2024-03-03
Hello everyone,
I’ve encountered an issue with my Terraform configuration for managing an Amazon RDS database. Here’s the situation:
I initially created an RDS instance from a snapshot using Terraform. Now, I need to update the instance size (e.g., change from db.t2.micro to db.t3.medium). However, when I rerun my Terraform script, it destroys the existing RDS instance and creates a new DB.
Is there a way to avoid this behavior? Ideally, I’d like to modify the existing RDS instance without causing unnecessary downtime or data loss.
Any suggestions or best practices would be greatly appreciated!
You can use the nested block lifecycle
, first of all do a terraform plan
look for the changes making Terraform think that the RDS resource needs to be replaced and ignore them using the lifecycle nested block
lifecycle {
ignore_changes = [
# Ignore changes to tags, e.g. because a management agent
# updates these based on some ruleset managed elsewhere.
tags,
]
}
Adding deletion_protection = true
is always a good practice to prevent the resource destruction
Thanks @HAMZA AZIZ
2024-03-04
Question. Using Atmos, have a single stack, that has a few network components that deployed fine, added a new GCE component, in same stack file, using same imports uses same GCS backend file for terraform.state, and we have atmos.yaml setup for auto state creation and management, seeing weird behaviour where it keeps wanting to destroy all the other components created. This is a standalone component that is no hard dependencies in config of terraform, we do ref the VPC network name and subnet for the GCE by name but not using module.X.selflink etc from the other component. Any ideas as I have spent hours on this. Thanks
can you clarify this :
VPC network name and subnet for the GCE by name
as in data lookup after passing name or ID?
No just hardcoded in stack vars: like “vpc-01” and “subnet-01”
ok
and you say you created like a network component, deployed to the stack and then you added a new component and somehow when planning wants to delete the other components?
Yes. Deployed a few components VPC, Subnet, NAT GW, Router, all fine. Then add to same stack.yaml file a GCE Bastion and wants to destroy all the other components.
Confirmed in describe component it is using the right backend file and workspace name.
that makes no sense unless you used the same name
same name for?
you can’t have the same name for a component in yaml
# (because google_compute_subnetwork.subnetwork is not in configuration)
Diff different names
just says its not in configuration why it wants to destroy it
components:
terraform:
example:
vars:
enabled: true
example:
vars:
enabled: false
that will set example.enabled = false
So I am not using context or component.tf in the component, so not used this enabled: true, could that be an issue?
no that is just an example
are you using terraform state data.source?
no
can you show some of your stack.yaml?
Sure. Where you see XX is just added there to project sensitive of company I work for.
components:
terraform:
vpc:
metadata:
component: vpc
inherits:
- vpc/defaults
vars:
enabled: true
label_key_case: lower
project_id: auto-v1u
region: us-central1
shared_vpc_host: false
subnets:
- subnet_name: subnet-01
subnet_ip: 10.150.2.0/24
subnet_region: us-central1
subnet_private_access: true
subnet_flow_logs: true
subnet_flow_logs_interval: INTERVAL_5_SEC
subnet_flow_logs_sampling: 0.5
subnet_flow_logs_metadata: INCLUDE_ALL_METADATA
secondary_ranges:
XX-glb-vpc-auto:
- ip_cidr_range: "10.158.128.0/17"
range_name: "us-central1-gke-01-pods"
- ip_cidr_range: "10.160.208.0/20"
range_name: "us-central1-gke-01-services"
cloud_nat:
subnetworks:
- name: XX-glb-vpc-auto
gke:
vars:
ip_range_pods: "us-central1-gke-01-pods"
ip_range_services: "us-central1-gke-01-services"
master_ipv4_cidr_block: "10.100.0.0/28"
gke_name: "us-central1-gke-01"
network: "XX-gbl-auto"
project_id: "auto-v1u"
subnetwork: "XX-glb-vpc-auto"
region: "us-central1"
machine_type: "e2-medium"
disk_size_gb: "100"
location: "us-central1"
min_count: "1"
max_count: "100"
local_ssd_count: "0"
The VPC and Cloud NAT deploy perfect. The GKE or GCE components don’t
do you see a {component-name}/terraform.tfstate
folder in your state backend?
every component will have it’s own state file in a folder called the sane name as the component ( usually)
So this is interesting in AWS, that is the setup each component has a folder, then another folder for stack. In our GCS here its each stack has a tf state file at moment, so three components here in one state file.
Be really interested as to the why each component needs own state file?
That is how atmos works, it uses workspaces heavily
Thanks. This sounds like the issue then. Is there any links or docs that goes into this in more detail?
That way you can have multiple components in one stack, and they all have their own state file that does not interfere with other components and makes the blast radius smaller
big state files are not a cool thing. they make plans and apply slow and if you think from the point of separation of concerns a bit dangerous too
The Terraform Component Remote State is used when we need to get the outputs of an Terraform component,
@Christopher McGill please use atmos
2024-03-06
v1.8.0-beta1 1.8.0-beta1 (March 6, 2024) UPGRADE NOTES: If you are upgrading from Terraform v1.7 or earlier, please refer to the Terraform v1.8 Upgrade Guide.
backend/s3: The use_legacy_workflow argument has been removed to encourage consistency with the AWS SDKs. The backend will now search for credentials in the same order as the default provider chain in the AWS SDKs and AWS CLI.
NEW FEATURES:…
Upgrading to Terraform v1.8
Can someone explain how module.this.enabled is used across your modules? When i try to replicate in my code, terraform says “there is no module named “this””. I see it used a lot throughout your code and it looks really neat, but i’m missing something. https://github.com/cloudposse/terraform-aws-api-gateway/blob/main/main.tf
locals {
enabled = module.this.enabled
create_rest_api_policy = local.enabled && var.rest_api_policy != null
create_log_group = local.enabled && var.logging_level != "OFF"
log_group_arn = local.create_log_group ? module.cloudwatch_log_group.log_group_arn : null
vpc_link_enabled = local.enabled && length(var.private_link_target_arns) > 0
}
resource "aws_api_gateway_rest_api" "this" {
count = local.enabled ? 1 : 0
name = module.this.id
body = jsonencode(var.openapi_config)
tags = module.this.tags
endpoint_configuration {
types = [var.endpoint_type]
}
}
resource "aws_api_gateway_rest_api_policy" "this" {
count = local.create_rest_api_policy ? 1 : 0
rest_api_id = aws_api_gateway_rest_api.this[0].id
policy = var.rest_api_policy
}
module "cloudwatch_log_group" {
source = "cloudposse/cloudwatch-logs/aws"
version = "0.6.8"
enabled = local.create_log_group
iam_tags_enabled = var.iam_tags_enabled
permissions_boundary = var.permissions_boundary
context = module.this.context
}
resource "aws_api_gateway_deployment" "this" {
count = local.enabled ? 1 : 0
rest_api_id = aws_api_gateway_rest_api.this[0].id
triggers = {
redeployment = sha1(jsonencode(aws_api_gateway_rest_api.this[0].body))
}
lifecycle {
create_before_destroy = true
}
depends_on = [aws_api_gateway_rest_api_policy.this]
}
resource "aws_api_gateway_stage" "this" {
count = local.enabled ? 1 : 0
deployment_id = aws_api_gateway_deployment.this[0].id
rest_api_id = aws_api_gateway_rest_api.this[0].id
stage_name = var.stage_name != "" ? var.stage_name : module.this.stage
xray_tracing_enabled = var.xray_tracing_enabled
tags = module.this.tags
variables = {
vpc_link_id = local.vpc_link_enabled ? aws_api_gateway_vpc_link.this[0].id : null
}
dynamic "access_log_settings" {
for_each = local.create_log_group ? [1] : []
content {
destination_arn = local.log_group_arn
format = replace(var.access_log_format, "\n", "")
}
}
}
# Set the logging, metrics and tracing levels for all methods
resource "aws_api_gateway_method_settings" "all" {
count = local.enabled ? 1 : 0
rest_api_id = aws_api_gateway_rest_api.this[0].id
stage_name = aws_api_gateway_stage.this[0].stage_name
method_path = "*/*"
settings {
metrics_enabled = var.metrics_enabled
logging_level = var.logging_level
}
}
# Optionally create a VPC Link to allow the API Gateway to communicate with private resources (e.g. ALB)
resource "aws_api_gateway_vpc_link" "this" {
count = local.vpc_link_enabled ? 1 : 0
name = module.this.id
description = "VPC Link for ${module.this.id}"
target_arns = var.private_link_target_arns
}
see eg https://github.com/cloudposse/terraform-aws-components/blob/3727f96af1ed4a81c00445290e23360af3ee0cfe/modules/vpc/context.tf#L23 A generic “mixin” in a self-contained context.tf
module "this" {
coming from cloudposse/terraform-null-label/exports
(we vendor that file explicitly in all of our own components)
(forgive the messed up, left-channel audio)
Also,
A post highlighting one of our favorite terraform modules: terraform-null-label. We dive into what it is, why it’s great, and some potential use cases in …
A post highlighting some advanced usage of the terraform-null-label module showing root/child module relationship and implementation of a naming + tagging …
As explained above, module.this
is defined in a drop-in file named [context.tf](http://context.tf)
that is vendored in from null-label
.
By convention, when module.this.enabled
is false
, the module should create no resources, and all outputs should be null or empty. This configuration is propagated to Cloud Posse Terraform modules (all of which include [context.tf](http://context.tf)
) by the assignment
context = module.this.context
If you want to make a variant of the label (see the video above), you instantiate null-label
, passing context = module.this.context
, but then also passing in overrides or additions, and then use the context, tags, IDs, etc from that module instantiation going forward.
I hate asking this but are there any user modules besides s3-user or iam-system-user? iam-system-user I ran into a few issues with where it landed the account created, and it’s directly attaching policies. I’m still pretty new to TF, but I think I could make something that matches our compliance requirements with a little work, but I figured I’d ask before I go writing this. Definitely do not want to use the user, but vendor can’t provide trust relationship requirements for a role otherwise. New to the community otherwise, so hi everyone.
I think you need to expand on what you’re trying to do. Sounds like system-user would work for you, but I don’t understand the reason why it won’t from what you shared.
Apologies. I need an access key to let a Cisco product into our environment, and yea system-user ideally should work but it needs to be modified to meet my compliance standards, nothing crazy except that it’s attaching permissions directly to the user/key in where compliance requires whatever access reqs to be part of a role.
I tried using system-user yesterday as well, and it would create a system-user as I wanted, but it was creating it in the account SAML goes through and not in my assumed role account. I believe our module of system-user is modified so I’m unsure if that’s part of the issue, I didn’t have time to go back and try it again yet.
@Ryan – Okay, if I’m understanding correctly, I think what you may need to do is use iam-system-user
to create your user resource, use terraform-aws-iam-role
to create your role, give the role the right permissions for what Cisco needs to do, and give the system-user the permissions to assume the role. You can do that in a root module that combines those two child modules. Does that make sense and sound like what you need to do?
it was creating it in the account SAML goes through and not in my assumed role account.
The account the resources are created in are dependent on what is in providers.tf, no exact what role you have assumed locally. I would check the logic in your root module and see if that will help.
A Terraform module that creates IAM role with provided JSON IAM polices documents.
Awesome response thank you. I’ll dig into this a little bit later today.
You were right no providers. Thank you for setting me down the right path. Definitely a lightbulb moment when I compared provider vs no provider tf, after reading providers.tf code. I’m still getting the hang of atmos and terraform but really enjoying it.
2024-03-07
2024-03-08
Is there a file formatting I can use for “tftpl” template files? Does Jinja2 work? (I’m using Intellij IDE)
@matt @Jeremy White (Cloud Posse)
Not that I know of on VSCode at least.
Interestingly, I just saw this in the latest jetbrains terraform plugin: https://github.com/JetBrains/intellij-plugins/commit/9c3336ff5ead368e3fb120263479340e701a0f32#diff-05a30b5d16dee0a3b96a[…]5a95f6047f23ec697bd356eR40
I do’t think there’s support for it in a lot of the editors, but jetbrains is giving this area some love
2024-03-11
2024-03-13
Been smashing my head against a wall on this one for a while. We have a set of kubernetes ingresses defined via kubernetes_ingress_v1 resources, using a kubernetes_ingress_class resource, spec’d with the the ingress.k8s.aws/alb controller. I need to update the SSL Policy for the ALB, but I can’t find documentation on how to define it. The only place that seems to be relevant is as an annotation in the ingress definition, but that means I have to define it for every ingress that uses that ingress class - which seems inefficient and prone to problems. What happens if two ingresses define different values here?
Does anyone know how to set the SSL Policy for an ALB ingress class?
I assume you’re using ingress class provided by aws-loadbalancer-controller
. If so, here is the schema definition for it. It does have sslPolicy
attribute.
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
annotations:
controller-gen.kubebuilder.io/version: v0.11.1
creationTimestamp: 2024-03-13T15:02:07Z
generation: 1
name: ingressclassparams.elbv2.k8s.aws
resourceVersion: "198995"
uid: 0fd1a62b-d774-423a-89fe-fbc87fe0cda2
spec:
group: elbv2.k8s.aws
names:
plural: ingressclassparams
singular: ingressclassparams
kind: IngressClassParams
listKind: IngressClassParamsList
scope: Cluster
versions:
- name: v1beta1
served: true
storage: true
schema:
openAPIV3Schema:
description: IngressClassParams is the Schema for the IngressClassParams API
type: object
properties:
apiVersion:
description: "APIVersion defines the versioned schema of this representation of
an object. Servers should convert recognized schemas to the
latest internal value, and may reject unrecognized values. More
info:
<https://git.k8s.io/community/contributors/devel/sig-architectur>\
e/api-conventions.md#resources"
type: string
kind:
description: "Kind is a string value representing the REST resource this object
represents. Servers may infer this from the endpoint the client
submits requests to. Cannot be updated. In CamelCase. More info:
<https://git.k8s.io/community/contributors/devel/sig-architectur>\
e/api-conventions.md#types-kinds"
type: string
metadata:
type: object
spec:
description: IngressClassParamsSpec defines the desired state of
IngressClassParams
type: object
properties:
group:
description: Group defines the IngressGroup for all Ingresses that belong to
IngressClass with this IngressClassParams.
type: object
required:
- name
properties:
name:
description: Name is the name of IngressGroup.
type: string
inboundCIDRs:
description: InboundCIDRs specifies the CIDRs that are allowed to access the
Ingresses that belong to IngressClass with this
IngressClassParams.
type: array
items:
type: string
ipAddressType:
description: IPAddressType defines the ip address type for all Ingresses that
belong to IngressClass with this IngressClassParams.
type: string
enum:
- ipv4
- dualstack
loadBalancerAttributes:
description: LoadBalancerAttributes define the custom attributes to
LoadBalancers for all Ingress that that belong to
IngressClass with this IngressClassParams.
type: array
items:
description: Attributes defines custom attributes on resources.
type: object
required:
- key
- value
properties:
key:
description: The key of the attribute.
type: string
value:
description: The value of the attribute.
type: string
namespaceSelector:
description: NamespaceSelector restrict the namespaces of Ingresses that are
allowed to specify the IngressClass with this
IngressClassParams. * if absent or present but empty, it
selects all namespaces.
type: object
properties:
matchExpressions:
description: matchExpressions is a list of label selector requirements. The
requirements are ANDed.
type: array
items:
description: A label selector requirement is a selector that contains values, a
key, and an operator that relates the key and values.
type: object
required:
- key
- operator
properties:
key:
description: key is the label key that the selector applies to.
type: string
operator:
description: operator represents a key's relationship to a set of values. Valid
operators are In, NotIn, Exists and DoesNotExist.
type: string
values:
description: values is an array of string values. If the operator is In or
NotIn, the values array must be non-empty. If the
operator is Exists or DoesNotExist, the values
array must be empty. This array is replaced during
a strategic merge patch.
type: array
items:
type: string
matchLabels:
description: matchLabels is a map of {key,value} pairs. A single {key,value} in
the matchLabels map is equivalent to an element of
matchExpressions, whose key field is "key", the operator
is "In", and the values array contains only "value". The
requirements are ANDed.
type: object
additionalProperties:
type: string
x-kubernetes-map-type: atomic
scheme:
description: Scheme defines the scheme for all Ingresses that belong to
IngressClass with this IngressClassParams.
type: string
enum:
- internal
- internet-facing
sslPolicy:
description: SSLPolicy specifies the SSL Policy for all Ingresses that belong to
IngressClass with this IngressClassParams.
type: string
subnets:
description: Subnets defines the subnets for all Ingresses that belong to
IngressClass with this IngressClassParams.
type: object
properties:
ids:
description: IDs specify the resource IDs of subnets. Exactly one of this or
`tags` must be specified.
type: array
minItems: 1
items:
description: SubnetID specifies a subnet ID.
type: string
pattern: subnet-[0-9a-f]+
tags:
description: Tags specifies subnets in the load balancer's VPC where each tag
specified in the map key contains one of the values in
the corresponding value list. Exactly one of this or
`ids` must be specified.
type: object
additionalProperties:
type: array
items:
type: string
tags:
description: Tags defines list of Tags on AWS resources provisioned for
Ingresses that belong to IngressClass with this
IngressClassParams.
type: array
items:
description: Tag defines a AWS Tag on resources.
type: object
required:
- key
- value
properties:
key:
description: The key of the tag.
type: string
value:
description: The value of the tag.
type: string
subresources: {}
additionalPrinterColumns:
- name: GROUP-NAME
type: string
description: The Ingress Group name
jsonPath: .spec.group.name
- name: SCHEME
type: string
description: The AWS Load Balancer scheme
jsonPath: .spec.scheme
- name: IP-ADDRESS-TYPE
type: string
description: The AWS Load Balancer ipAddressType
jsonPath: .spec.ipAddressType
- name: AGE
type: date
jsonPath: .metadata.creationTimestamp
conversion:
strategy: None
Ok, so I can define this at the load balancer controller? It looks like we implemented that via a helm release. Now to figure out how to override values in that chart…
Thanks!
No problem. If you’re using CloudPosse’s eks/alb-controller
, you can add sslPolicy
to this section.
Or here if your using their eks/alb-controller-ingress-class
…
https://github.com/cloudposse/terraform-aws-components/blob/37d8a5bfa04054231a04bf[…]66a575978352c8/modules/eks/alb-controller-ingress-class/main.tf
ok cool, we are using a kubenetes_manifest resource for the ingress class params. I’m trying this:
resource "kubernetes_manifest" "stack_ingress_public_class_params" {
provider = kubernetes.cluster
manifest = {
apiVersion = "elbv2.k8s.aws/v1beta1"
kind = "IngressClassParams"
metadata = {
name = "stack-ingress-public"
}
spec = {
group = {
name = "stack-ingress-public"
}
sslPolicy = "ELBSecurityPolicy-TLS13-1-2-2021-06"
}
}
}
hmm…. I must have the structure wrong.
Error: Manifest configuration incompatible with resource schema
You may need to check your controller’s (and its CRDs) version. I provided you the CRD schema for the latest (v2.7) aws-loadbalancer-controller
. I am not sure when sslPolicy
was added.
How did you pull that schema?
kubectl --context <your-kube-context> get ingressclassparams.elbv2.k8s.aws -o yaml
Correction… This is the command.
kubectl --context <your-kube-context> get crd ingressclassparams.elbv2.k8s.aws -o yaml
v1.7.5 1.7.5 (March 13, 2024) BUG FIXES:
backend/s3: When using s3 backend and encountering a network issue, the retry code would fail with “failed to rewind transport stream for retry”. Now the retry should be successful. (#34796)
This pull request updates the AWS SDKs to the latest version. The intent is to fix #34528, which is intermittent and can’t be easily reproduced. The root cause was discussed in the relevant AWS SDK…
2024-03-14
Hello, we encounter an issue with the CloudPosse AWS backup vault module. During the destruction of a backup vault, the process trying to remove the backup vault before the recovery points, and due to this sequence, the deployment failed.
• Do we need to update the module to be able to remove the recovery points before the backup vault ?
• Or Could we add a lifecycle in the cloudposse module ?
@Ben Smith (Cloud Posse)
Same issue with the 1.0.0 version
2024-03-15
Hi Guys, is someone of you met this issue too ? https://github.com/cloudposse/terraform-aws-backup/issues/60
Describe the Bug
if i set disable on module by adding count parameter. when i execute my terraform code. terraform try to delete backup vault but it failed because it containing recovery points
Expected Behavior
should be able to set count to 0. apply terraform and see backup vault destroyed without issue
Steps to Reproduce
set count to 0 o backup vault with recovery point inside and apply terraform
Screenshots Screenshot 2023-11-16 at 14 33 18 Environment
actual module deploy in our environments 0.7.1
aws = {
source = “hashicorp/aws”
version = “5.16.1”
}
and terraform 1.0.0
Additional Context
No response
IMHO, deleting backups should be hard to do, so I am not bothered by this behavior. What happens if you run terraform destroy
with the module enabled?
Describe the Bug
if i set disable on module by adding count parameter. when i execute my terraform code. terraform try to delete backup vault but it failed because it containing recovery points
Expected Behavior
should be able to set count to 0. apply terraform and see backup vault destroyed without issue
Steps to Reproduce
set count to 0 o backup vault with recovery point inside and apply terraform
Screenshots Screenshot 2023-11-16 at 14 33 18 Environment
actual module deploy in our environments 0.7.1
aws = {
source = “hashicorp/aws”
version = “5.16.1”
}
and terraform 1.0.0
Additional Context
No response
The issue is still remaining. After upgraded to 1.0.0, issue is the same : Error: deleting Backup Vault (MY_BACKUP_VAULT_NAME): InvalidRequestException: Backup vault cannot be deleted because it contains recovery points.
I noticed that even with a depends_on, the module don’t take into account this option !
Did you tried to run a terraform destroy with at least 1 recovery point inside the backup vault ? I don’t think so, because it’s a requisite to remove a backup vault, it’s mandatory to delete all recovery points before to delete a backup vault
There is similar behavior around deleting S3 buckets: because of the permanent loss of data, extra precautions are in place.
I believe the best approach is to provide a force_destroy
option which would override this protection, but I will ask @Ben Smith (Cloud Posse) to look into it, since he is the most familiar with the topic. CC @Erik Osterman (Cloud Posse)
Hi, regarding S3, the issue was due to the versioning, the sequence tried to delete the S3 before the versioning, we have fix the issue with a sleep_time resource to avoid to remove the S3 before to suspend the versioning and it’s working fine, but not with the cloudposse aws backup module
Trying to remove recovery poins from a null_resource and add a depends_on on this null_resource from the module, same issue, even if the recovery points are removed from the AWS console before the backup vault, in Cloudtrail we can see that the backup vault is removed before the recovery points, and it’s failed as well :
For info, the depends_on works fine when we enabled the module, but not when we deactivate it.
One question : in your documentation from https://docs.cloudposse.com/modules/library/aws/backup/ it’s mentioned the below code regarding the retention period : rules = [ { name = “${module.this.name}-daily” schedule = var.schedule start_window = var.start_window completion_window = var.completion_window lifecycle = { cold_storage_after = var.cold_storage_after delete_after = var.delete_after } } ]
But from https://github.com/cloudposse/terraform-aws-backup/blob/1.0.0/docs/migration-0.13.x-0.14.x+.md it’s mentioned the below one :
rules = [
{
schedule = var.schedule
start_window = var.start_window
completion_window = var.completion_window
cold_storage_after = var.cold_storage_after
delete_after = var.delete_after
}
]
With the second code, I noticed that the retention period is “Always” and not my specific value from the AWS console :
Terraform module to provision AWS Backup, a fully managed backup service that makes it easy to centralize and automate
the back up of data across AWS services such as Amazon EBS volumes, Amazon EC2 instances, Amazon RDS databases, Amazon DynamoDB tables,
Amazon EFS file systems, and AWS Storage Gateway volumes.
[!NOTE]
The syntax of declaring a backup schedule has changed as of release 0.14.0
, follow the instructions in the 0.13.x to 0.14.x+ migration guide.
[!WARNING] The deprecated variables have been fully deprecated as of
1.x.x
. Please use the new variables as described in the 0.13.x to 0.14.x+ migration guide.
# Migration from 0.13.x to 0.14.x
Version 0.14.0 of this module implements ability to add multiple schedules in a backup plan. This requires changing inputs to the module slightly. Make sure to update your configuration to use the new syntax.
Before:
hcl module “backup” { source = “cloudposse/backup/aws”
schedule = var.schedule start_window = var.start_window completion_window = var.completion_window cold_storage_after = var.cold_storage_after delete_after = var.delete_after }
After:
hcl module “backup” { source = “cloudposse/backup/aws”
rules = [ { schedule = var.schedule start_window = var.start_window completion_window = var.completion_window cold_storage_after = var.cold_storage_after delete_after = var.delete_after } ] }
Now you can have multiple backup schedules:
hcl module “backup” { source = “cloudposse/backup/aws”
rules = [ { name = “daily” schedule = “cron(0 10 * * ? *)” start_window = 60 completion_window = 120 cold_storage_after = 30 delete_after = 180 }, { name = “monthly” schedule = “cron(0 12 1 * ? *)” start_window = 60 completion_window = 120 cold_storage_after = 30 delete_after = 180 } ] }
Do we need to set the lifecycle block to take into account the retention period for the source AWS backup and recovery points ?
thanks
2024-03-16
2024-03-17
Cloud software vendor HashiCorp is exploring options, including a sale, Bloomberg News reported on Friday citing people familiar with the matter.
“HashiCorp has been working with a financial adviser in recent months and has held exploratory talks with other industry players, the report said.”
I’m eager to know who the ‘industry players’ are!
Cloud software vendor HashiCorp is exploring options, including a sale, Bloomberg News reported on Friday citing people familiar with the matter.
chef is going to buy it and deprecate terraform lol
Perhaps they will get beat out by Puppet.