SweetOps #refarch for July, 2024

2024-07-02

Taimur Gibson

Hi, how would I use assume_role_conditions in the iam-role module to set a condition to require an STS external ID for role assumption? https://github.com/cloudposse/terraform-aws-components/tree/main/modules/iam-role#input_assume_role_conditions

1

Gabriela Campana (Cloud Posse)

04:26:30 PM

@Dan Miller (Cloud Posse)

Dan Miller (Cloud Posse)

04:55:19 PM

Take a look at the module here: https://github.com/cloudposse/terraform-aws-iam-role/blob/main/variables.tf#L64-L78

Then you can list the condition with test, variable, and values following the Terraform resource documentation here: https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document#source_policy_documents

All given conditions defined by var.assume_role_conditions will be included for the iam policy doc using the given allow actions defined by var.assume_role_actions

variable "assume_role_actions" {
  type        = list(string)
  default     = ["sts:AssumeRole", "sts:TagSession"]
  description = "The IAM action to be granted by the AssumeRole policy"
}

variable "assume_role_conditions" {
  type = list(object({
    test     = string
    variable = string
    values   = list(string)
  }))
  description = "List of conditions for the assume role policy"
  default     = []
}

Taimur Gibson

08:22:18 PM

Thanks, this got me where I needed to be

2

2024-07-03

Dan Miller (Cloud Posse)

08:59:41 PM

@Marat Bakeev following up from office hours today

Dan Miller (Cloud Posse)

09:00:36 PM

can you please summarize the issue with the webhook secret? I will rope in our SME on actions runner controller

Dan Miller (Cloud Posse)

09:01:33 PM

cc @Jeremy G (Cloud Posse)

Marat Bakeev

09:02:39 PM

Sure. So, the issue is - we’re using actions-runner-controller. Our webhook logs state that there is no webhook secret configured:
2024-07-03T2022Z INFO -github-webhook-secret-token and GITHUB_WEBHOOK_SECRET_TOKEN are missing or empty. Create one following https://docs.github.com/en/developers/webhooks-and-events/securing-your-webhooks and specify it via the flag or the envvar

Jeremy G (Cloud Posse)

09:03:07 PM

Yes, sorry, that is a known issue, too, should be fixed this week.

2

Marat Bakeev

09:03:21 PM

We have created the webhook and it’s secret in github, and also placed the secret into SSM. And it’s available there. But the secret controller-manager does not have the key for this webhook secret

Marat Bakeev

09:04:05 PM

ah, okay, no worries. thanks

Jeremy G (Cloud Posse)

04:09:52 PM

@Marat Bakeev This has been fixed in Components v1.470.0.

3

2024-07-08

2024-07-18

Marat Bakeev

05:55:07 AM

Do you guys have plans to update ArgoCD version? I think the one you have enabled (2.5.9) has a security issue, plus later versions add support for ApplicationSet Progressive Syncs.

Erik Osterman (Cloud Posse)

02:42:03 PM

All our updates are (financially) sponsored by customers, or open source contributors

Erik Osterman (Cloud Posse)

02:42:42 PM

We are open to updating everything :-)

Erik Osterman (Cloud Posse)

02:49:41 PM

@Michael maybe something interesting for you?

Michael

03:04:44 PM

This is a great recommendation! I’ll take a look into it today!

Michael

05:05:24 PM

@Marat Bakeev Just to confirm, this is in reference to the CloudPosse packages repositories?

Erik Osterman (Cloud Posse)

05:13:51 PM

No, our components for ArgoCD need updates

1

Erik Osterman (Cloud Posse)

05:13:59 PM

(same ones you’re using, I think)

Michael

07:45:30 PM

Created a GitHub issue so I can track the work on this! https://github.com/cloudposse/terraform-aws-components/issues/1079

#1079 Upgrade Supported ArgoCD Chart Version

Describe the Feature

Argo versions 0.1.0 through 2.10.0-rc1, v2.9.3, v2.8.7, v2.7.15 are affected by CVE-2024-22424, a CSRF attack when the attacker has the ability to write HTML to a page on the same parent domain as Argo CD.

Expected Behavior

Propose that we update the default values for Argo’s chart from:

argo/argo-cd 5.19.12 v2.5.9

to an unaffected version patched after 2.10-rc2, 2.9.4, 2.8.8, 2.7.16

Use Case

N/A

Describe Ideal Solution

Update default value for:

variable “chart_version” { type = string description = “Specify the exact chart version to install. If this is not specified, the latest version is installed.” default = “5.19.12” }

And validate it works as intended

Alternatives Considered

No response

Additional Context

No response

Michael

03:32:57 AM

And here is the PR! https://github.com/cloudposse/terraform-aws-components/pull/1081

#1081 Upgrade Supported ArgoCD Chart Version

what and why

• Argo versions 0.1.0 through 2.10.0-rc1, v2.9.3, v2.8.7, v2.7.15 are affected by CVE-2024-22424, a CSRF attack when the attacker has the ability to write HTML to a page on the same parent domain as Argo CD. • Propose that we update the default values for Argo’s chart from:

argo/argo-cd 5.19.12 v2.5.9

to an unaffected version patched after 2.10-rc2, 2.9.4, 2.8.8, 2.7.16

notable changes

• Argo CD 2.10 upgraded kubectl from 1.24 to 1.26. This upgrade introduced a change where client-side-applied labels and annotations are no longer preserved when using a server-side kubectl apply • Note that bundled Helm version has been upgraded from 3.13.2 to 3.14.3 • Starting with Argo CD 2.10.11, the NetworkPolicy for the argocd-redis and argocd-redis-ha-haproxy dropped Egress restrictions. This change was made to allow access to the Kubernetes API to create a secret to secure Redis access

testing

• This version has been tested and verified to work with the existing component configuration

references

• Argo Upgrade Docs

1

Erik Osterman (Cloud Posse)

02:54:49 AM

Thanks @Michael

1

2024-07-19

2024-07-22

2024-07-24

2024-07-25

Erik Osterman (Cloud Posse)

05:44:09 PM

@Marat Bakeev was this fully answered? https://github.com/orgs/cloudposse/discussions/12 any thing we can mark as the answer?

1

Erik Osterman (Cloud Posse)

06:17:26 PM

/github subscribe cloudposse/community discussions

06:17:26 PM

:white_check_mark: Subscribed to cloudposse/community. This channel will receive notifications for issues, pulls, commits, releases, deployments, discussions

1

Erik Osterman (Cloud Posse)

06:18:46 PM

/github unsubscribe cloudposse/community pulls commits releases deployments issues

06:18:46 PM

This channel will receive notifications from cloudposse/community for: discussions

1

github3

01:07:00 AM

Spinning my wheels a bit on this one so figured I’d ask.

In the baseline steps there is a note:
The IAM User for SuperAdmin will be granted access to Terraform State by principal ARN. This ARN is passed to the tfstate-backend stack catalog under allowed_principal_arns. Verify that this ARN is correct now. You may need to update the root account ID.

And possibly related:
With the addition of support for dynamic Terraform roles, our baseline cold start refarch layer now depends on/requires that we have aws-teams and aws-team-roles stacks configured. This is because account-map uses those stacks to determine which IAM role to assume when performing Terraform in the account, and almost every other component uses account-map (indirectly) to chose the role to assume.

However, none of the steps in the baseline seem to provision the roles for accessing the tfstate. Tracing through it looks like the -var=access_roles_enabled=false prevents these roles from being created in the baseline tfstate backend workflow. The full deploy/tfstate workflow isn’t run until later in the identity phase.

The result is that the atmos workflow deploy/accounts -f accounts workflow cannot run and does not create the account-map due to an error:

Error: error configuring S3 Backend: IAM Role (arn:aws:iam::1234567890:role/xxx-core-gbl-root-tfstate) cannot be assumed.

The role doesn’t exist so the error is clear, however, passing access_roles_enabled=true doesn’t work since the account-map needs to be created.

Erik Osterman (Cloud Posse)

01:11:11 AM

@Jeremy White (Cloud Posse)

Spinning my wheels a bit on this one so figured I’d ask.

In the baseline steps there is a note:
The IAM User for SuperAdmin will be granted access to Terraform State by principal ARN. This ARN is passed to the tfstate-backend stack catalog under allowed_principal_arns. Verify that this ARN is correct now. You may need to update the root account ID.

And possibly related:
With the addition of support for dynamic Terraform roles, our baseline cold start refarch layer now depends on/requires that we have aws-teams and aws-team-roles stacks configured. This is because account-map uses those stacks to determine which IAM role to assume when performing Terraform in the account, and almost every other component uses account-map (indirectly) to chose the role to assume.

However, none of the steps in the baseline seem to provision the roles for accessing the tfstate. Tracing through it looks like the -var=access_roles_enabled=false prevents these roles from being created in the baseline tfstate backend workflow. The full deploy/tfstate workflow isn’t run until later in the identity phase.

The result is that the atmos workflow deploy/accounts -f accounts workflow cannot run and does not create the account-map due to an error:

Error: error configuring S3 Backend: IAM Role (arn:aws:iam::1234567890:role/xxx-core-gbl-root-tfstate) cannot be assumed.

The role doesn’t exist so the error is clear, however, passing access_roles_enabled=true doesn’t work since the account-map needs to be created.

1

Erik Osterman (Cloud Posse)

04:54:01 PM

@Marat Bakeev did you encounter this?

1

Marat Bakeev

03:55:08 AM

@Erik Osterman (Cloud Posse) I don’t think we encountered this, no. But we didn’t start with a clean env, maybe it was already created for us when we were trying to reverse-engineer refarch ourselves

1

2024-07-26

github3

04:26:18 PM

I have deployed a AWS x-ray daemonset on our cluster without any node selector or tolerations , but its deployed only on few nodes, not every node in the cluster , I also don’t see any pods in pending state if the resources are insufficient, Wanted to know thoughts regarding what can be the issue. screenshot

nodes
daemonset

1

Gabriela Campana (Cloud Posse)

04:27:52 PM

@Jeremy G (Cloud Posse)

I have deployed a AWS x-ray daemonset on our cluster without any node selector or tolerations , but its deployed only on few nodes, not every node in the cluster , I also don’t see any pods in pending state if the resources are insufficient, Wanted to know thoughts regarding what can be the issue. screenshot

nodes
daemonset

Jeremy G (Cloud Posse)

09:22:26 PM

When a DaemonSet is first deployed, it is only deployed to Nodes that have enough resources for it, unless you set up a PriorityClass with preemption and assign the DaemonSet to that class. See https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption.

Pod Priority and Preemption

FEATURE STATE: Kubernetes v1.14 [stable] Pods can have priority. Priority indicates the importance of a Pod relative to other Pods. If a Pod cannot be scheduled, the scheduler tries to preempt (evict) lower priority Pods to make scheduling of the pending Pod possible. Warning:In a cluster where not all users are trusted, a malicious user could create Pods at the highest possible priorities, causing other Pods to be evicted/not get scheduled.

2024-07-28

2024-07-29

github3

03:29:16 PM

After completing nearly all of the identity steps in the refarch I’m hitting an issue where atmos doesn’t seem to be switching from the planner IAM role to the terraform IAM role in some workflows.

For example this fails with a few permission denials since the planner role cannot actually create resources (KMS keys, etc.):

terraform deploy cloudtrail -s core-gbl-root

I’ve pull the above out of the deploy workflow from baseline.

Example error:

Error: creating KMS Key: operation error KMS: CreateKey, https response error StatusCode: 400, RequestID: x, api error AccessDeniedException: User: arn:aws:sts::x:assumed-role/x-core-gbl-root-planner/aws-go-sdk-1722266462231447631 is not authorized to perform: kms:TagResource because no identity-based policy allows the kms:TagResource action

The account-map seems to have the correct roles set for plan vs. apply.

1

github3

04:17:36 PM

No AWS Teams should have access to apply Terraform in the core-root account.

I see now that the managers Team does have terraform access in core-root. Do you know which AWS Team you have assumed before running Terraform?

Within your infra geodesic shell, run this to check:

√ . [foo-identity] (HOST) infrastructure ⨠ aws sts get-caller-identity { “UserId”: “ABCD1234:foo-identity”, “Account”: “1234567890”, “Arn”: “arn:aws:sts::1234567890:assumed-role/foo-core-gbl-identity-devops/foo-identity” }

For example here I am using the devops team, so I would only have planner access in core-root

1

2024-07-30

Taimur Gibson

04:42:38 PM

Is there a way in the refarch to set up S3 event notifications to go to SNS/SQS? Can’t find anything in the documentation about it

Taimur Gibson

04:42:52 PM

@Dan Miller (Cloud Posse)

Dan Miller (Cloud Posse)

04:47:42 PM

I dont believe we have anything existing for it, but it looks like it shouldnt be too hard to set up like this:

https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/s3_bucket_notification

1

Dan Miller (Cloud Posse)

04:48:59 PM

I’d recommend creating a new component that pulls the aws_sns_topic.topic.arn and aws_s3_bucket.bucket.id by remote state from the s3-bucket / sns-topic components and then adds the notifications. But ofc there’s many ways to do it

Taimur Gibson

04:49:11 PM

thanks!

1