SweetOps #release-engineering for August, 2020

CI/CD Discussions

Archive: https://archive.sweetops.com/release-engineering/

2020-08-03

loren

More options for running GitHub actions during a pull request initiated from a fork…. https://github.blog/2020-08-03-github-actions-improvements-for-fork-and-pull-request-workflows/

GitHub Actions improvements for fork and pull request workflows - The GitHub Blog

Today GitHub Actions shipped a series of features designed to improve your workflows when working with PRs from repository forks. New settings for private repository forks Many GitHub customers choose to work in a forking

Matt Gowie

11:11:09 PM

cc @Erik Osterman (Cloud Posse). I’m sure he’ll be stoked.

GitHub Actions improvements for fork and pull request workflows - The GitHub Blog

Erik Osterman (Cloud Posse)

11:13:08 PM

thanks @Matt Gowie! this sounds very interesting indeed.

Erik Osterman (Cloud Posse)

11:13:34 PM

Still gotta digest how this helps us. we were kind of wishing for the ability to “bless” PRs or forks.

Erik Osterman (Cloud Posse)

11:14:06 PM

sounds like pull_request_target is what we want, but still don’t get it.

Erik Osterman (Cloud Posse)

11:15:12 PM

@Andriy Knysh (Cloud Posse)

loren

11:18:45 PM

Yeah, I like the comment keyword approach, from someone with write privs to the repo, to kick off a build. And I want to be sure secrets remain protected so they can’t just be echoed in logs or captured and sent elsewhere

Erik Osterman (Cloud Posse)

11:19:53 PM

ya exactly… even for debugging I’ve added a printenv

loren

11:25:04 PM

This doesn’t yet get there, I don’t think, for public projects… But glad they’re actively iterating and releasing features!

Erik Osterman (Cloud Posse)

11:30:04 PM

have you seen their public roadmap? it’s pretty exciting how much investment is going into it

Erik Osterman (Cloud Posse)

11:30:07 PM

and in the nearterm

loren

11:40:31 PM

Saw it, but haven’t yet spent any time poking around, or trying to gauge velocity

loren

11:47:57 PM

Went ahead and followed the project, we’ll see if I need an email filter too

Erik Osterman (Cloud Posse)

12:33:32 AM

Erik Osterman (Cloud Posse)

12:34:29 AM

okay, so using this, I thikn what we can do for infra tests is first check if a blessed label is set (or something like that) - otherwise skip. Then checkout their branch. Then run the tests.

Erik Osterman (Cloud Posse)

12:34:56 AM

it works basically the same as dispatched events, that also always trigger on the default branch, but then checkout the merge commit using settings from the payload

2020-08-17

04:13:48 PM

Is anyone else in this Slack using GitLab?

roth.andy

04:19:49 PM

yep. what’s your question?

04:23:00 PM

Awesome. One question I have had is: How are you dealing with boilerplate AWS CLI code in your repos that doesn’t really change from app to app?

04:24:06 PM

I just spent last Friday setting up a separate repo of bash scripts that I can clone in by evaling a CI variable….but this seems overly complex and eval is icky.

04:25:14 PM

Looking at using remote template includes now as an alternative. https://docs.gitlab.com/ee/ci/yaml/#includeremote

GitLab CI/CD pipeline configuration reference | GitLab

Documentation for GitLab Community Edition, GitLab Enterprise Edition, Omnibus GitLab, and GitLab Runner.

roth.andy

04:35:13 PM

You’ll have better luck if you boil things down to something more generic. You are trying to have reusable code in different pipelines. Are you trying to reuse whole pipeline stages, or just a bash script? Or something else?

04:39:03 PM

Good questions…. So there are a few things that keep popping up. We are using ECS for most deploys now. We also have 2 AWS accounts (prod and pre-prod). Things I am trying to abstract away is:

• Selection of the correct ECR URL to use when publishing docker images.

• Forcing the deploy of the ECS cluster using the AWS CLI

04:41:21 PM

For the CLI, I’m pulling the official docker image and creating an alias.

roth.andy

04:43:26 PM

ECS has great support in Terraform. Have you looked at using that rather than coming up with a custom flow in Bash?

05:51:38 PM

I have all the code for building out the cluster in TF. For the CI/CD, it’s mostly just building the docker image, pushing it to the right ECR repo, and then triggering the ECS cluster to re-deploy via CLI

Maycon Santos

09:38:21 PM

@DJ have you check https://docs.gitlab.com/ee/ci/cloud_deployment/#deploy-your-application-to-the-aws-elastic-container-service-ecs ?

Cloud deployment | GitLab

Documentation for GitLab Community Edition, GitLab Enterprise Edition, Omnibus GitLab, and GitLab Runner.

01:47:24 PM

I had not @Maycon Santos! Thanks. That looks like a pretty easy way to push a deploy to ECS. Might borrow some of those techniques..

2020-08-20

2020-08-21

2020-08-23

roth.andy

06:54:28 PM

I’m currently working on a Terraform module that needs to create a Kubernetes cluster as well as deploy some helm charts to it. I need it to be as “production-ready” as possible. What’s the best approach right now for using Terraform to deploy things to Kubernetes?

For further context, the module will spin up AWS resources (EC2 instances, security groups, etc), then use the Terraform RKE provider to create the k8s cluster. Here’s an example from Rancher that is close to what I want to do, but they clearly say that it is not meant for production. Here’s my repo if you want to follow along with my progress. I’m working in the feature/initial_dev branch.

While not a set-in-stone requirement, if at all possible, I would like to avoid requiring any local-exec or dependencies on any local installed tools other than Terraform.

Terraform Helm Provider? I don’t know much about it, though it looks to have decently good support

Does it require helm to be installed on the machine running Terraform?
Is it being used anywhere successfully in production? Terraform Helmfile Provider? Probably not much more than an honorable mention since it is so new, but I do :heart: pretty much anything @mumoshu touches :grin:
Does it require helm, helmfile, helm-diff, helm-git, etc to be installed on the machine running Terraform? (If I am reading correctly, the answer is yes) Local-exec using helm/helmfile in an idempotent way? Some of my colleagues do this, but I believe it is just too crude to use in production

Terraform Shell Provider? This feels like a souped-up version of local-exec that at least gives me better lifecycle management (thanks @mumoshu for linking to it in the helmfile provider docs)

Flux Helm Operator? the Flux project has a Helm operator that looks really nice. I’d need to get the operator installed, and then need to figure out the best way to get the CRDs applied, but it looks like it has nice potential

roth.andy

07:00:38 PM

FWIW, I’m currently leaning toward the Helm operator. Maybe do something like this?

Install the operator using the Terraform Helm provider
Use local-exec or the Shell provider to kubectl apply the CRDs Advantages to this seem to be that if the Terraform Helm provider does have any issues, it is only being used to deploy the operator, rather than all the other deployments I need

For things that need to be run locally, I should be able to run whatever I want with docker run rather than having to have actual dependencies installed on the machine running terraform (though, that would complicate running terraform inside a container, which is very frequently done, so maybe that isn’t the best approach)

roth.andy

07:02:02 PM

kubectl apply is nicely idempotent, so I don’t have to worry if Terraform wants to run it every time I terraform apply

mumoshu

01:25:22 AM

@roth.andy Hey! I’m currently researching towards building something similar to yours, but based on a custom terraform provider(https://github.com/mumoshu/terraform-provider-eksctl). there should be some nuance but i think i can share a few things that might be common

mumoshu/terraform-provider-eksctl

Manage AWS EKS clusters using Terraform and eksctl - mumoshu/terraform-provider-eksctl

mumoshu

01:26:02 AM

for avoiding local-exec for helmfile, i can definitely recommend using https://github.com/mumoshu/terraform-provider-helmfile as you’ve noted

you can even use the helmfile provider install the helm operator and HelmRelease custom resources, if you want to defer/delegate application deployment to the helm operator

mumoshu/terraform-provider-helmfile

Deploy Helmfile releases from Terraform. Contribute to mumoshu/terraform-provider-helmfile development by creating an account on GitHub.

mumoshu

01:27:06 AM

Does it require helm, helmfile, helm-diff, helm-git, unfortunately yes. however, i’m going to integrate a binary package manager(https://github.com/mumoshu/shoal/) into the helmfile provider so that it can install those binaries on terraform apply

mumoshu/shoal

Declarative, Go-embeddable, and cross-platform package manager powered by https://gofi.sh/ - mumoshu/shoal

roth.andy

04:27:20 AM

Thanks @mumoshu, I saw a note about the package manager in an issue on the helmfile provider as well, sounds cool.

Do you think the helmfile provider is in a place where you would feel comfortable using it in production?

Pierre Humberdroz

11:16:19 AM

I have 2 repos.

infrastructure-terraform with 2 layers and infrastructure-kuberentes which has uses helmfile .. Maybe I can join next office hours and show it .. I wanted to get feedback on it anyways for quite some time.

Matt Gowie

02:34:51 PM

@roth.andy I’m working for a client now where I’m building a v0.1 of a very similar project to what you’re targeting.

Couple things of note:

Running on EKS, we run 1 local-exec to update the local Kube Configuration after the cluster is created.
We decided to install Flux and HelmOperator via the Helm provider and then have a separate repo where Flux / HO looks for our HelmReleases for all other CRDs / in-house Applications that we’re deploying. This is in a very early stage, but so far seems to be doing the trick. I also wanted to use Helmfile + the provider with similar reasoning on why you wanted to use them, but Flux was already chosen as the tool of choice by an colleague before I joined the project. Didn’t want to rock the boat too much.

Definitely interested in where you land — be sure to provide updates!

roth.andy

04:23:50 PM

@Matt Gowie how are you guys liking the helm operator?

• stable?

• easy to update?

• easy to use?

Matt Gowie

04:42:53 PM

• Stable — Unseen as of right now. Still in pre-prod environments with this new toolset.

• Easy to update — Haven’t updated the Helm Operator itself yet, but updating the Helm Releases that it uses is easy enough since it’s also GitOps and it does a good job of tracking changes there and doing deployments.

• Easy to use — Easy to install for sure. HelmReleases definitely seem less DRY than something like Helmfile, but they are fairly straight forward and it’s easy to customized Flux/HO to do what we want so far. Again, super early stage, but so far good experience.

Erik Osterman (Cloud Posse)

04:47:39 PM

:thumbsup: for terraform-provider-helmfile supporting shoal for dependency management. we want to use it with terraform cloud, but lack of automatic dependencies has made it very hard to utilize for this use-case.

btai

07:30:22 PM

i’ve been using the terraform helm provider since late 2018 and it has worked fine. unless things have changed recently, it does require helm on the machine you run terraform from — annoying if you use TF cloud.

Erik Osterman (Cloud Posse)

08:32:25 PM

but terraform helm provider also doesn’t diff

btai

06:57:56 PM

@Erik Osterman (Cloud Posse) what do you mean by diff? It is aware that changes are made to the helm values but probably just uses the terraform state to show that diff

btai

07:04:03 PM

I’ll be honest I only use it to deploy a cluster-wide services charts (i.e cert-manager) and logging/monitoring helm charts and that bit of terraform has not changed much — the only things that i ever changed there are the helm values and chart versions and that use case has worked fine for me.

roth.andy

03:10:57 PM

@mumoshu should the Helmfile provider be able to work in the same lifecycle as the creation of a cluster? I’m set up to do that now, but when I run terraform apply the Helmfile provider immediately tries to run helm diff though the cluster doesn’t even exist yet.

roth.andy

03:12:40 PM

The log says Config not found (the config is a local_file resource that gets created after the cluster gets provisioned)

roth.andy

03:19:02 PM

roth.andy

03:21:34 PM

Wait hold the phone. I might have a bug in my helmfile

roth.andy

03:21:44 PM

possible ID10T error

roth.andy

05:28:42 PM

Okay now I’m back to thinking it is an issue with the helmfile provider. I do not see any issue with the actual helmfile itself

roth.andy

05:29:10 PM

I’m starting from a completely empty terraform state, and Terraform needs to create the cluster, and then deploy to it using the helmfile provider.

roth.andy

05:29:34 PM

It doesn’t seem like that is something the Helmfile provider supports, since it is immediately trying to connect to a nonexistent cluster

Erik Osterman (Cloud Posse)

06:27:08 PM

You probably need to do this in your terraform code:

Erik Osterman (Cloud Posse)

06:27:08 PM

https://github.com/cloudposse/terraform-aws-eks-cluster/blob/df8b991bef53fcab8f01c542cd1c3ccc6242b61c/auth.tf#L49-L60

cloudposse/terraform-aws-eks-cluster

Terraform module for provisioning an EKS cluster. Contribute to cloudposse/terraform-aws-eks-cluster development by creating an account on GitHub.

Erik Osterman (Cloud Posse)

06:27:49 PM

This is to defer the attempt to connect to the cluster until after the cluster exists

Erik Osterman (Cloud Posse)

06:27:53 PM

and responds to healthcheck

Erik Osterman (Cloud Posse)

06:28:31 PM

then…

Erik Osterman (Cloud Posse)

06:28:32 PM

roth.andy

06:30:12 PM

@Erik Osterman (Cloud Posse) I think the issue is that the helmfile provider tries to read cluster state before it even starts applying anything. Since a cluster doesn’t even exist yet it obviously can’t do that

roth.andy

06:31:07 PM

I do already have a depends_on on the helmfile release

Erik Osterman (Cloud Posse)

06:46:22 PM

yes, this is true even of the kubernetes provider for terraform

Erik Osterman (Cloud Posse)

06:46:26 PM

so that’s why this hack exists to get around that limitation. probably the same thing would work with the helmfile provider.

Erik Osterman (Cloud Posse)

06:48:27 PM

but the way null_resource works and the curl command means it won’t complete until the cluster is online and availble, which means the depends on blocks the providers from connecting to the cluster until the cluster is online.

roth.andy

07:03:04 PM

okay I’ll take a look. thanks

Erik Osterman (Cloud Posse)

07:15:38 PM

let me know how it goes - if you get stuck can probably unblock

roth.andy

07:49:04 PM

still happening. happens even on terraform plan

roth.andy

07:49:40 PM

roth.andy

07:50:02 PM

Erik Osterman (Cloud Posse)

07:51:51 PM

@Andriy Knysh (Cloud Posse) look familiar?

roth.andy

07:54:03 PM

The helmfile provider is trying to run helm diff as part of Terraform’s “Refreshing state…” process, but it can’t because there is no cluster yet and it doesn’t seem like it is able to handle that.

Andriy Knysh (Cloud Posse)

08:00:24 PM

is the cluster on localhost?

roth.andy

08:00:31 PM

roth.andy

08:00:47 PM

that’s what kubectl defaults to if it is not pointed at a valid kubeconfig file

Andriy Knysh (Cloud Posse)

08:01:05 PM

so looks like kubeconfig is not read or is incorrect

roth.andy

08:01:50 PM

There is no kubeconfig yet. This terraform project creates the cluster, then the intention is to use the helmfile provider on the brand new cluster immediately

roth.andy

08:02:27 PM

This, for example, works fine with a local-exec that calls helmfile apply, but the helmfile provider wants to run helmfile diff inside an as-yet nonexistent cluster when you run terraform plan

roth.andy

08:07:54 PM

It may be that this is just not something that the helmfile provider supports, but I suspect otherwise since @mumoshu said this in a comment on one of the github issues: https://github.com/mumoshu/terraform-provider-helmfile/issues/20#issuecomment-681598191

[Missing Doc] How does helmfile provider authenticate to kubectl : with external kubeconfig or with kubernetes provider · Issue #20 · mumoshu/terraform-provider-helmfile

I am using kubernetes terraform provider provider "kubernetes" { host = data.aws_eks_cluster.cluster.endpoint cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificat…

roth.andy

08:18:12 PM

I’m thinking this situation is one of the following:

I’m doing something wrong - If so I’ll put in a PR to add to the docs once I figure it out
I’m experiencing a bug - If so I’ll write up a github issue
The Helmfile provider is not meant to be used in the same terraform apply as the creation of the cluster it needs to connect to - If so I’ll put in a PR to add to the docs

roth.andy

08:18:56 PM

I’ll assume #2 for now and write up a github issue

Erik Osterman (Cloud Posse)

08:26:13 PM

I think start with #2. @mumoshu might have some ideas how to mitigate it. i did some more digging too and don’t see a way right now around it.

mumoshu

11:44:33 PM

@mumoshu should the Helmfile provider be able to work in the same lifecycle as the creation of a cluster? I’m set up to do that now, but when I run terraform apply the Helmfile provider immediately tries to run helm diff though the cluster doesn’t even exist yet. @roth.andy Hey! I just started reading this thread, but anyway- It should work since recent versions of the provider I’ve released yesterday

mumoshu

11:46:26 PM

This was the fix(or a workaround for the underlying terraform limitation) required to make it work: https://github.com/mumoshu/terraform-provider-helmfile/commit/681ecb39707823cd3e76ed16c8351aa58f0d11b6 It’s included since v0.4.3

fix: Enable running plan/destroy when helmfile_release_set kubeconfig… · mumoshu/terraform-provider-helmfile@681ecb3

… depends on another, missing resource The provider was running helmfile-diff with the default kubeconfig, not one from helmfile_releaset_set.kubeconfig nor helmfile_release_set.environment_variab…

mumoshu

11:50:15 PM

honestly, i dont have much experience with terraform-aws-eks-cluster or rke_rancher_master_cluster modules, but, since v0.4.3, the helmfile provider should work as long as you can somehow pass helmfile_releaset_set a kubeconfig path to the working k8s cluster.

mumoshu

11:55:17 PM

so looks like kubeconfig is not read or is incorrect This is mostly correct, but to be extra clear - I think this as rather a Terraform issue.

Terraform tries to call the provider for the diff operation(part of plan) EVEN if the target resource has a missing value that depends on another, not-yet-created tf resource, replacing the missing value as an empty value.

Example:

resource "eksctl_cluster" "mycluster" {
   ...
}

resource "helmfile_release_set" "mystack" {
  kubeconfig_path = eksctl_cluster.mycluster.kubeconfig_path
}

In this case, you’ll probably expect Terraform to trigger diff on helmfile_release_set if and only if the eksctl_cluster is created. But that’s not how Terraform works. Terraform tries to run diff anyway replacing eksctl_cluster.mycluster.kubeconfig_path with an empty value

resource "helmfile_release_set" "mystack" {
  kubeconfig_path = ""
}

0.4.3 fixes that, by skipping diff when kubeconfig_path is empty.

roth.andy

12:33:55 AM

kubeconfig_path is new to me. Sounds like that would fix my issue. I’ve been adding KUBECONFIG to the environment variables since that is what the README does. I will submit a PR tomorrow to improve the docs

mumoshu

12:35:19 AM

I wasn’t clear but you can use either. environment_variables.KUBECONFIG and kubeconfig_path works exactly the same way internally

roth.andy

12:39:02 AM

Oh. Hmm. I was using the latest version today and getting the results above. Could a later version have caused a regression? I will try v0.4.3 tomorrow

roth.andy

12:42:02 AM

The good news is that the provider worked really well when there was a cluster to point at. Im looking forward to using it more. I had some issues with the new shoal stuff but I only spent a few minutes with it so nothing significant to report yet

mumoshu

12:47:43 AM

Ah okay then it might be another issue. I’m now wondering if it’s just that-

You’ve actually passed non-empty kubeconfig path to the helmfile provider, but the path does not exist yet when Terraform calls the provider for diff?

mumoshu

12:50:10 AM

I think you’ve tried to use wait_cluster null_resource to let Terraform defer applying helmfile_release_set until the cluster is created. I believe it would work for apply, but not for diff. diff happens regardless of dependencies among tf resources, as described above.

roth.andy

12:52:19 AM

It’s definitely possible. I will investigate more tomorrow and report back

roth.andy

12:54:28 AM

environment_variables = {
    KUBECONFIG = abspath(local_file.kubeconfig.filename)
  }

Is what I am using now. Tomorroq I will try changing it to a ternary that more explicitly sets the value to an empty string if the file doesn’t exist

mumoshu

12:54:54 AM

Thanks!

The important and not well known point of this issue is that Terraform runs internal diff operations on resources regardless of resource dependencies, with emptying any resource attributes if the dependent resources are not created.

So we’ll probably end up needing to somehow “signal” helmfile provider if the dependent resource(=cluster) is already created or not.

mumoshu

12:56:05 AM

environment_variables = {
    KUBECONFIG = abspath(local_file.kubeconfig.filename)
  }

Is kubeconfig.filename static?

Erik Osterman (Cloud Posse)

12:56:09 AM

I take it you haven’t yet tried terraform-eksctl-provider with terraform-helmfile-provider in the same terraform workspace?

Erik Osterman (Cloud Posse)

12:56:22 AM

…on a coldstart

mumoshu

12:56:29 AM

No! I did try it and it works

Erik Osterman (Cloud Posse)

12:56:48 AM

hrmmmmmmm that’s promising then! so this should definitely be possible

roth.andy

12:57:01 AM

Yes it is static. It comes from a local_file resource

mumoshu

12:57:16 AM

Gotcha, so that’s the issue I think

mumoshu

12:58:32 AM

Let’s try making the filename dynamic so that it becomes empty on Terraform-internal diff phase, which signals the helmfile provider to not fail on invalid cluster.

roth.andy

12:58:49 AM

Definitely sounds like it. I definitely have some avenues to explore when I get back to it. Thanks!

roth.andy

01:01:01 AM

Possible enhancement could be to skip the diff if empty string OR the file specified doesn’t exist. Might even try getting a PR in, though my golang skills are nothing to be proud of

mumoshu

01:01:59 AM

Sounds like a great idea!

mumoshu

01:04:39 AM

A PR is definitely welcomed. I’ll also try adding it when I have a chance(Currently working on stabilizing the R53 + NLB cluster canary thing so maybe after that

mumoshu

01:18:59 PM

@roth.andy fyi i’ve added the enhancement that we discussed above in v0.5.3

2020-08-24

2020-08-25

2020-08-26

Yoni Leitersdorf (Indeni Cloudrail)

09:28:23 PM

Anybody here using tfscan, checkov, or something else in their CI/CD to spot security issues in TF code? If you are, or are considering doing it, can you share how you do it, and what do you do with the results?

Erik Osterman (Cloud Posse)

03:41:10 AM

@barak can probably answer this one :-)

barak

04:21:56 AM

Hi Yoni :)

Yoni Leitersdorf (Indeni Cloudrail)

04:24:58 AM

Hi :)

barak

04:51:23 AM

Checkov is probably the most rich in content and frameworks. You can integrate it in various stages on the CI/CD pipeline and get a hunch of misconfigs in IaC (k8, serverless, terraform, cfn, arm) without the need to provision them. It enables you to shift security testing left in the lifecycle.

For a wider solution that can shift left and right (production security testing) you can try the bridgecrew.io SaaS platform.

P.s. I’m one of the the developers of both checkov and bridgecrew :)

Zach

01:13:22 PM

@barak is it expected that Checkov doesn’t ‘deep scan’ from a composition (ie, root module) down into its declared modules? I was trying it out and if I scan my module itself it gives me detailed results on the resources in the module. If I go up to the composition where I’m passing in various config vars to do things, the checkov result was only verifying that I didn’t have hardcoded access keys. Should it only be scanning my modules? But then, how do I verify that someone has not passed in configs that would otherwise violate a check?

barak

01:36:29 PM

Hi @Zach take a look on “scanning third party modules” here: https://www.checkov.io/2.Concepts/Evaluations.html

it’s a workaround , but some found it usefull

Zach

01:37:42 PM

In this case these are my modules, just pulled in from a different repository, though I suppose that distinction doesn’t really matter since its still an ‘external’ module?

Zach

01:38:57 PM

Ohhh hm. Also we have our modules in a ‘mono-repo’ so scanning the entire .terraform might be a mess

Zach

01:41:37 PM

this is probably a ‘me’ problem more than Checkov. We have our compositions in one mono repo with all the environments/stages, and all the modules in a second repo that we tag for module ‘releases’

Matt Johnson

02:51:56 PM

While a little manual (potentially useful with flags/vars available in your build pipeline for which modules you currently care about?) you don’t have to stop at -d .terraform, can narrow the scope further to individual -d .terraform/module paths. ie checkov -d .terraform/modules/eks