SweetOps #release-engineering for November, 2018

CI/CD Discussions

Archive: https://archive.sweetops.com/release-engineering/

2018-11-01

Erik Osterman (Cloud Posse)

04:43:35 PM

@dustinvb just joined! he dreams in pipelines

dustinvb

04:54:26 PM

Ah bah, no :superfresh: emoji.

Erik Osterman (Cloud Posse)

04:54:36 PM

haha

Erik Osterman (Cloud Posse)

04:54:40 PM

we can add that

Erik Osterman (Cloud Posse)

04:56:31 PM

for those that don’t know dustin, he is an awesome support engineer at Codefresh, which is what we use for all of our CI/CD

Andriy Knysh (Cloud Posse)

04:57:02 PM

hey @dustinvb

dustinvb

04:58:03 PM

Hi Andriy.

dustinvb

04:58:44 PM

Happy to help out in anyway I can.

Andriy Knysh (Cloud Posse)

04:59:08 PM

thanks

Andriy Knysh (Cloud Posse)

04:59:16 PM

we all love Codefresh

dustinvb

05:06:37 PM

I do as well. Was a customer for a year before joining.

2018-11-06

pecigonzalo

06:08:26 PM

Howdy releasers

pecigonzalo

06:09:52 PM

I wanted to discuss one topic that is spinning in my head for a while:

Reusing docker or other artifacts from PRs or not

Basically, so speed up the build/release cycle, I was thinking on reusing the tested/built artifact in a PR when we deploy to the QA environment and ofcourse subsequently

Erik Osterman (Cloud Posse)

06:42:26 PM

yes, reusing the docker image is ideal for deployment to production and preproduction

Erik Osterman (Cloud Posse)

06:42:36 PM

we tend to treat preproduction and staging slightly different

Erik Osterman (Cloud Posse)

06:42:42 PM

staging ~ master

Erik Osterman (Cloud Posse)

06:42:47 PM

preproduction is a tagged release

Erik Osterman (Cloud Posse)

06:42:56 PM

that tagged release gets promoted to production repos

pecigonzalo

06:44:43 PM

but do you build it on PR as well? if you dont, do you rerun all tests on the merged commit?

Erik Osterman (Cloud Posse)

06:45:06 PM

yes, we rerun on merge

Erik Osterman (Cloud Posse)

06:45:15 PM

so not quite what you say i guess

Erik Osterman (Cloud Posse)

06:45:40 PM

the key for us is to reuse the image on deploy to production

Erik Osterman (Cloud Posse)

06:45:49 PM

but not necessarily for all other steps

pecigonzalo

06:46:12 PM

Yep, that part I have “pinned” i was wondering about the ohter part

Erik Osterman (Cloud Posse)

06:46:15 PM

we have a pipeline step to promote images and helm charts to production registry

pecigonzalo

06:46:31 PM

We have basically 2 “stable” branches

stage (staging/preprod)
master (prod)

pecigonzalo

06:46:40 PM

and I wanted to reuse the artifact from the PR to staging, to save some more time

Erik Osterman (Cloud Posse)

06:47:49 PM

gotcha - yea, we don’t have any thing for that. we also tend to do squash+merge, so the commit sha wouldn’t be the same

pecigonzalo

06:48:07 PM

Yep, you dont even get a parent commit that way do you?

pecigonzalo

06:48:36 PM

TBH, the core of the issue is not even on the pipeline, but mostly that some steps are just too slow (thanks Java/SBT dependancies)

Erik Osterman (Cloud Posse)

06:49:51 PM

yea, i can see why you’d want to do that

pecigonzalo

06:50:58 PM

So to clean up:

feature/xyz -> staging (PR): build,test
staging (commit from merge): build, test, release, deploy

pecigonzalo

06:51:13 PM

Is that your workflow?

pecigonzalo

06:51:41 PM

(maybe you do direct to master and use tags or some other CD workflow for prod)

Erik Osterman (Cloud Posse)

06:55:05 PM

feature/1234/xyz -> staging (pr), build image, run compose integration tests, push image to registry, deploy helm chart to pr-namespace in kubernetes cluster (e.g. pr-1234)
master (squash merge): build, test, release, deploy
tag release - build, test, release, deploy to preproduction
production deploy - promote artifacts to production, deploy

Erik Osterman (Cloud Posse)

06:55:20 PM

honestly, every customer engagement is slightly different

Erik Osterman (Cloud Posse)

06:55:29 PM

but it’s more or less like that

pecigonzalo

06:55:33 PM

Yeah, indeed, im just trying to get some ideas

pecigonzalo

06:55:49 PM

and most of the examples from “blogs” are simple stuff that would not work/fly in production

Erik Osterman (Cloud Posse)

06:56:21 PM

yea… i can show you some examples if you want to zoom sometime

Erik Osterman (Cloud Posse)

06:56:56 PM

what are you using for cicd? are you deploying to kubernetes?

pecigonzalo

06:57:25 PM

We are still on ECS, looking at EKS right now as we reached that point were ECS is just… annoying and we have enough people to support an EKS

pecigonzalo

06:58:34 PM

At the moment? Travis, but forget about it as we are migrating to either GitlabCI or Buildkite before EoY hopefully, as we as well outgrew travis

Erik Osterman (Cloud Posse)

07:13:26 PM

What I don’t like about GitLabCI is it only supports one pipeline per repo (last i checked)

Erik Osterman (Cloud Posse)

07:13:34 PM

have you looked at Codefresh?

pecigonzalo

07:53:14 PM

Yeah, tbh im right now leaning to BuildKite as you can even dynamically load “pipeline”

pecigonzalo

07:53:22 PM

I did, but found it a bit less flexible

pecigonzalo

07:53:30 PM

Maybe I just did not understood it completly

Erik Osterman (Cloud Posse)

07:54:26 PM

dynamically load pipeline? what’s that?

pecigonzalo

08:13:55 PM

Lets say you have a pipeline like:

- build:
  - run x
- if build == commit:
  - loadpipeline pecigonzalo/this/.pipeline.yml

pecigonzalo

08:14:56 PM

let me find the docs

pecigonzalo

08:15:48 PM

https://buildkite.com/docs/pipelines/defining-steps

Defining Your Pipeline Steps | Buildkite Documentation attachment image

Automate your team’s software development processes, from testing through to delivery, no matter the language, environment or toolchain.

pecigonzalo

08:16:46 PM

and also, seamless triggers: https://buildkite.com/docs/pipelines/trigger-step

Trigger Step | Buildkite Documentation attachment image

Automate your team’s software development processes, from testing through to delivery, no matter the language, environment or toolchain.

pecigonzalo

08:17:15 PM

In many things is like a nice buildbot (https://buildbot.net/) but pre done

Buildbot

Buildbot - The Continuous Integration Framework

pecigonzalo

06:10:08 PM

Now, this is not an easy challenge as its “hard” to identify the built artifact

pecigonzalo

06:11:53 PM

since we enforce “up to date branches” im testing using git rev-parse --short HEAD^2 on the merge|commit job to identify the artifact by the tag (we tag with the commit) of the branch that did the PR that was merged and created that commit.

pecigonzalo

06:12:07 PM

*   2341145 (HEAD -> master, tag: 0.2.1, origin/master, workingset) Merge pull request #15 from thithat/staging
|\  
| *   d1bd417 Merge pull request #14 from thithat/feature/this-that
| |\  
| | * 43ba953 (feature/this-that) Test Version Flow release
* | |   16c043a Merge pull request #13 from thithat/staging

pecigonzalo

06:12:22 PM

❯ git rev-parse --short HEAD^2
d1bd417

❯ git rev-parse --short HEAD^1
16c043a

pecigonzalo

06:13:00 PM

thoughts? ideas? what is your current CI workflow? What do you build on PR and on merge?

Erik Osterman (Cloud Posse)

06:43:32 PM

so our workflow is slightly different

Erik Osterman (Cloud Posse)

06:44:14 PM

we have “unlimited staging environments” which correspond to PRs - one per PR; those just build images pinned to 0.0.0-gitsha

pecigonzalo

06:45:04 PM

entire environments or just images?

2018-11-07

pecigonzalo

07:33:47 PM

In a similar tone to my previous question, anyone has a working workflow for using terraform plan from PR build or similar? Because while its the recommended way for terraform, I have yet to see a working workflow for it

Erik Osterman (Cloud Posse)

08:14:58 PM

I think the “interactive pull requests” model is ideally suited for terraform

Erik Osterman (Cloud Posse)

08:15:33 PM

the slides linked below show a number of companies (recognizable brands) who use atlantis

Erik Osterman (Cloud Posse)

08:15:53 PM

our fork of atlantis addresses the immediate shortcomings until they are fixed upstream

Andriy Knysh (Cloud Posse)

08:13:14 PM

@pecigonzalo if you are asking about how to trigger terraform plan/apply from an open PR, we recently used atlantis for a GitOps workflow. See #atlantis channel. Also @Erik Osterman (Cloud Posse) recently held a meetup during #connectweek in Pasadena (CA) where he gave a live demo using Atlantis with Terraform to provision AWS user accounts using only Pull Requests

Andriy Knysh (Cloud Posse)

08:13:45 PM

https://www.meetup.com/DevOpsMastermind

DevOps Mastermind Group (Pasadena, CA) attachment image

This group is targeted to those interested in cloud automation & management, with a specific emphasis on Kubernetes/Docker, Helm, Prometheus, CI/CD, Microservices, etc. All skill levels are welcome. I

Andriy Knysh (Cloud Posse)

08:13:59 PM

https://cloudposse.com/meetup/how-to-use-terraform-with-teams-using-atlantis

How to use Terraform with Teams using Atlantis (#GitOps) attachment image

GitOps is where everything, including infrastructure, is maintained in Git and controlled via a combination of Pull Requests and CI/CD pipelines. Reduce the learning curve for new devs by providing a familiar, repeatable process. Use Code Reviews to catch bugs and increase operational competency. Pr

2018-11-08

pecigonzalo

08:06:16 AM

No, not exactly like that, im not a big fan of what atlantis does

pecigonzalo

08:06:27 AM

It applies before merge, to me that is anti-pattern

Erik Osterman (Cloud Posse)

08:06:54 AM

Except for terraform plans are poor

pecigonzalo

08:06:56 AM

I was thinking appling the plan generated on PR, on the merge

Erik Osterman (Cloud Posse)

08:06:59 AM

They are optimistic at best

Erik Osterman (Cloud Posse)

08:07:15 AM

So if you merge and apply now what is in git is not deployed

Erik Osterman (Cloud Posse)

08:07:27 AM

If others are developing against it, they are just as blocked

Erik Osterman (Cloud Posse)

08:07:43 AM

So what we’ve reconciled is auto merge on successful apply

pecigonzalo

08:07:47 AM

Well, but that is the same case for a CD container release

Erik Osterman (Cloud Posse)

08:08:16 AM

But for container releases it’s more stable

Erik Osterman (Cloud Posse)

08:08:21 AM

You have more under your control

Erik Osterman (Cloud Posse)

08:08:27 AM

Less value based errors

Erik Osterman (Cloud Posse)

08:08:47 AM

Most terraform failures in my experience are due to bad values

Erik Osterman (Cloud Posse)

08:09:10 AM

Containers get their values at runtime

Erik Osterman (Cloud Posse)

08:09:26 AM

Not at compile time (not generally but ideally)

Erik Osterman (Cloud Posse)

08:09:48 AM

You can achieve what you want easily with Codefresh.

pecigonzalo

08:09:54 AM

Yeah, that is true, and a fair point, but not this one:
So if you merge and apply now what is in git is not deployed
If others are developing against it, they are just as blocked

pecigonzalo

08:10:15 AM

In theory in the container you have the same “posibility” of failure

pecigonzalo

08:10:30 AM

but I agree, is not as likely, as easiert to caught on test

pecigonzalo

08:10:43 AM

TF has a lot of sideeffect failure scenarios, that plan does not catch

Erik Osterman (Cloud Posse)

08:10:51 AM

To me atlantis is a practical approach

Erik Osterman (Cloud Posse)

08:10:57 AM

Not the theoretical ideal

Erik Osterman (Cloud Posse)

08:12:07 AM

So rather than clutter the master commit history with a bunch of patch releases, rather get it in clean. We preserve a full transcript in the git comments so we have a record of what is deployed.

Erik Osterman (Cloud Posse)

08:12:31 AM

Even if the PR is closed but half applied, there’s a record of that’s

pecigonzalo

08:13:11 AM

Yeah, its not a bad approach and im not saying Atlantis sucks, much the contrary, even Hashi bought them

pecigonzalo

08:13:21 AM

I just feel that is a “hack” around a bigger issue

pecigonzalo

08:13:29 AM

a great hack, but a hack anyway

Erik Osterman (Cloud Posse)

08:13:37 AM

Agreed

pecigonzalo

08:14:11 AM

In my previous company we had a rake looper to get around TF Modules not having count, it worked, and it saved us a lot of time, but in “terraform” it was a hack

Erik Osterman (Cloud Posse)

08:23:17 AM

Yea I have heard of similar hacks… basically around code generation

Erik Osterman (Cloud Posse)

08:23:24 AM

Slippery slope

Erik Osterman (Cloud Posse)

08:24:17 AM

I think within a closed ecosystem of a corporate environment that might fly, but it makes it very difficult to write portable code for open source

Erik Osterman (Cloud Posse)

08:24:43 AM

https://github.com/antonbabenko/terrible/blob/master/README.md

antonbabenko/terrible

Let’s orchestrate Terraform configuration files with Ansible! Terrible! - antonbabenko/terrible

pecigonzalo

08:28:48 AM

Yeah indeed, im not so keen on it anymore. Good thing TF 0.12 is around the corner . This was a long time ago

joshmyers

01:49:37 AM

I wouldn’t touch 0.12 for a while…

Erik Osterman (Cloud Posse)

08:14:29 AM

I just don’t see a way around it that’s practical to solve (unless your HashiCorp and have 100mil in fresh green)

pecigonzalo

08:14:38 AM

pecigonzalo

08:14:43 AM

Yeah true

pecigonzalo

08:15:04 AM

I might give it 2nd chance, at least it will avoid those fix PRs for a stupid terraform value problem

pecigonzalo

08:17:22 AM

BTW, its great to have a place to ping-pong this ideas

Erik Osterman (Cloud Posse)

08:17:52 AM

Yea totally!! That what this place is for

pecigonzalo

08:18:05 AM

Yep, thanks !

2018-11-10

joshmyers

01:54:48 AM

Run a plan, push an artefact, namely the plan output. Run an apply of said build artefact number. This could be a gitsha, PR number, Jenkins build number. I don’t see a huge advantage of Atlantis… although I caveat with not having used so opinions maybe wrong. Have been doing CI/CD of terraform plan + apply for a long time in Jenkins. Not a massive fan of having output in the PR for history. Your git history lives forever, github PRs may not. Running pre merge isnt ideal for team scenario’s. Have seen a lot of failed apply due to vars, state, lots of TF bugs, race conditions…

joshmyers

01:56:07 AM

Things like tflint can help with some of these

joshmyers

01:57:00 AM

E.g. checking a type of AMI is even available in a region

joshmyers

01:57:36 AM

Won’t catch bugs though.

joshmyers

01:58:34 AM

Is Atlantis just like serverless Jenkins ci/cd for terraform with output posted back to the PR?

joshmyers

01:59:19 AM

Sounds nice if you don’t already have a CI/CD solution/don’t need one for other things?

2018-11-11

yurchenko

06:25:40 PM

2018-11-29

Gabe

12:58:05 AM

anyone here use github actions yet? do you know if there is a way to cache dependencies between builds?

Gabe

12:58:09 AM

haven’t seen anything in the docs

Erik Osterman (Cloud Posse)

01:21:35 AM

no one @cloudposse has been invited to the beta

Erik Osterman (Cloud Posse)

01:21:57 AM

i reached out to some peeps at GitHub but we don’t have the klout =0

Erik Osterman (Cloud Posse)

01:22:07 AM

@Gabe are you in the beta?

Gabe

01:22:23 AM

yeah we just got accepted in

Erik Osterman (Cloud Posse)

01:23:19 AM

just checked my inbox

Erik Osterman (Cloud Posse)

01:23:21 AM

nothing yet

Erik Osterman (Cloud Posse)

01:23:23 AM

Gabe

01:23:49 AM

i don’t think i got an email… just saw the new actions button on our repo

mrwacky

05:28:01 PM

I got the email after the actions button appeared

Gabe

01:24:13 AM

Erik Osterman (Cloud Posse)

01:25:15 AM

Gabe

01:38:14 AM

ohh… they are also only available on private repos

Erik Osterman (Cloud Posse)

01:39:19 AM

Ohhhhhhhhh snap

Erik Osterman (Cloud Posse)

01:39:31 AM

We have 250 public repos and like 5 private

Andriy Knysh (Cloud Posse)

04:53:57 PM

yep only private

Gabe

01:28:27 AM

hmm yeah… it looks pretty cool so far but a few things i’ve noticed is not being able to cache dependencies between builds, no control over the size of the machine it runs on (1 cpu 3.75 gb), and only two concurrent workflows running at one time per repo

Gabe

01:30:54 AM

pros are that it seems simpler than circle/jenkins and you can create actions that take environment variables so it’s easier to reuse/share actions between repos

Gabe

01:32:28 AM

… and we just got the email saying they have enabled it for us

Andriy Knysh (Cloud Posse)

01:35:40 AM

i just got the same email a few minutes ago, and Actions on my personal GitHub account

Erik Osterman (Cloud Posse)

01:42:39 AM

Steven

03:53:12 AM

I’m not so lucky. Still waiting

2018-11-30

davidvasandani

04:13:50 PM

@Andriy Knysh (Cloud Posse) this screenshot from https://github.com/cloudposse/github-status-updater how do you release to a namespace?

Andriy Knysh (Cloud Posse)

04:22:09 PM

every PR is a new k8s namespace. It’s how we do unlimited staging environments

davidvasandani

04:25:10 PM

Thanks!

Andriy Knysh (Cloud Posse)

04:22:19 PM

see @Erik Osterman (Cloud Posse) presentation https://cloudposse.com/devops/unlimited-staging-environments/

Unlimited Staging Environments attachment image

How to run complete, disposable apps on Kubernetes for Staging and Development What if you could rapidly spin up new environments in a matter of minutes entirely from scratch, triggered simply by the push of a button or automatically for every Pull Request or Branch. Would that be cool? That’s

Andriy Knysh (Cloud Posse)

04:25:20 PM

@davidvasandani if you have questions or need more info, we can provide it

davidvasandani

05:04:12 PM

@Andriy Knysh (Cloud Posse) do you happen to have an example where all these pieces are glued together?

Erik Osterman (Cloud Posse)

06:43:36 PM

@davidvasandani here’s a simpler and complete working example:

Erik Osterman (Cloud Posse)

06:43:37 PM

https://github.com/cloudposse/statup/pull/2/files

Add helmfile and codefresh by osterman · Pull Request #2 · cloudposse/statup

what Add helmfile for deployment with monochart Add codefresh build manifest why Easy deployment to kubernetes

Erik Osterman (Cloud Posse)

06:43:41 PM

I use it for my demos

Erik Osterman (Cloud Posse)

06:44:16 PM

basically I took a random app called statup (self-hosted statuspage.io clone) and deploy it on kubernetes using our monochart with helmfile and helm using codefresh

Erik Osterman (Cloud Posse)

06:44:27 PM

this supports unlimited staging environments

Erik Osterman (Cloud Posse)

06:44:44 PM

and automatic destruction when the PR is closed using the pull-request-closed.yaml pipeline

davidvasandani

08:13:18 PM

Thanks @Erik Osterman (Cloud Posse) can’t wait to dig into this.

Erik Osterman (Cloud Posse)

05:35:31 AM

Crap I realized you need to know all the ENV

Erik Osterman (Cloud Posse)

05:36:03 AM

@davidvasandani if you PM me I can get them to you

Andriy Knysh (Cloud Posse)

05:06:45 PM

give me a few minutes

Andriy Knysh (Cloud Posse)

05:09:10 PM

re: Self-hosted Helm Chart Registry - Codefresh added Managed Helm Repositories after the presentation, so we use it now instead of deploying our own chart museum

Andriy Knysh (Cloud Posse)

05:29:26 PM

Andriy Knysh (Cloud Posse)

05:29:35 PM

@davidvasandani ^

#release-engineering (2018-11)

All things CI/CD. Specific emphasis on Codefresh and CodeBuild with CodePipeline.

2018-11-01

2018-11-06

2018-11-07

2018-11-08

2018-11-10

2018-11-11

2018-11-29

2018-11-30