SweetOps #release-engineering for December, 2020

CI/CD Discussions

Archive: https://archive.sweetops.com/release-engineering/

2020-12-10

Jonathan

I have an interesting issue, and would love to get some input on how to continue to troubleshoot: We have a jenkins pipeline that runs hourly, at the top of the hour, and without fail for a while now, it fails each job it starts at midnight and recovers the following run. The reason for failing also differs from time to time. One time “docker login” failed, another time the “helm” command timed out, or an error regarding forwarding of ports. The failurs are not seen during the rest of the day, but only during that one execution at midnight. Any pointers would be wonderful!

jose.amengual

08:31:09 AM

so it is a networking issue

jose.amengual

08:31:48 AM

is that onprem or cloud?

jose.amengual

08:32:52 AM

time outs, forward ports, docket login they all need to connect somehow

jose.amengual

08:33:50 AM

if it was on prem, I will say maybe an auotmated backup of some router that needs to do a failover and it happens to do it at the same time than the jenkins run

jose.amengual

08:34:37 AM

a Nat device loosing arp tables, way to many things

jose.amengual

08:35:10 AM

or it could be a simple dns issue?

jose.amengual

08:36:59 AM

if I were you I will install some network monitoring software in jenkins and capture network traffic and run a continuous test of dns lookups, connectivity to external endpoints and internal gateways and do that every 15 sec or so and record it for 1 day and well then analyse all that

Jonathan

08:38:30 AM

Thanks a bunch for the questions and tips. I’ll definitely go for the network monitoring testing, and work from that!

jose.amengual

08:39:44 AM

if you add tests to different ips

jose.amengual

08:40:06 AM

make sure to add a test to localhost and it’s own ip in the jenkins server

jose.amengual

08:40:21 AM

to discard any internal issue

Jonathan Marcus

01:56:13 PM

What if it ran at a different time, not at 00:00 but at 0000? There could be lots of time-of-day factors, including simple resource contention with other things that also start at midnight. If this fixes it then that narrows down your search space.

Andrew Nazarov

06:33:29 PM

I’m wondering what best practices of managing a version of an app as a whole in a microservices world are if any. Say, we have tens or a hundred of microservices and a monolith. Back then we versioned this monolith and that was the version of the app. Dead simple. Now every microservice gets its own version since each of them has its own lifecycle. And still, by inertia, people tend to consider a version of a monolith as a version of the whole system which is only partially true now.

What are your thoughts on this?

One thing that came to my mind is the following. Since we define all deployed releases via some tooling (helmfile for example) and store this in git we can make a reference to a snapshot of a repo with helmfile values (commit hash or a tag) which represents a set of services of certain versions. And we can prepend this by a good old version of a monolith which everybody is aware of or a calendar version.

Chris Fowles

10:19:38 PM

“why does the version matter to anyone?” is usually the question i would start with to figure out what make sense as a version

Chris Fowles

10:20:21 PM

that’s not a rhetorical question either; why do people look at the version and what do they want to learn when they do?

kskewes

12:57:02 AM

Mm, we thought it was too hard to version everything and couldn’t see value so we only version microservices themselves and it’s by git sha. If dev wants to run everything as is typically the case for local dev, then they use latest tag in docker-compose and similar. They can obviously pin as need be.

Zach

01:11:07 AM

I’ve been pushing for services to have a ‘version’ only in the sense that it helps them say ‘deploy based on this commit ref’. System ‘version’ is meaningless and just serves as a talking point for Product when dealing with the clients. Product was very concerned about knowing what ‘release’ we’re on and I told them they could just make something up

Andrew Nazarov

06:41:47 AM

Yes it’s mostly for POs and clients. They want to differentiate a “state” of a whole system, especially when it’s an enterprise and the system is deployed per client.

2020-12-11

tim.j.birkett

12:10:27 PM

Semver is nice, and in theory tells you if anything has changed that you should be concerned about… in reality it doesn’t get abided so… for our services we moved to tagging images with the short hash of the git commit that produced them:

sheldonh

05:34:18 PM

What generated this screenshot it’s pretty cool. I use gitversion to do automatic semver versioning so it’s almost the same. You have bump major versions but all minor versions and patches are automatically calculated.

sheldonh

05:34:38 PM

Doesn’t work very well and in a mono repo though :-)

tim.j.birkett

09:48:34 AM

monorepos and java projects with parent pom’s are a different creature… and bring back a little PTSD of code shared across boundaries in all the worst ways…

sheldonh

05:14:00 PM

Anybody using the GitHub actions deployment feature? I use azure devops but I’m the only one on the team that uses actively. Was thinking about leveraging for some more controlled pull requests terraform deployments. Curious if anybody’s evaluated the pros and cons. I think the deploys been there but now they have the new visual feature of it I guess.

Yoni Leitersdorf (Indeni Cloudrail)

05:20:25 PM

Are you referring to GitHub Actions, or something else?

sheldonh

05:31:57 PM

Yes.

sheldonh

05:33:12 PM

I’m most familiar with azure devops and I find that the central management is easier to control. However because my repos are in GitHub I’m trying to work towards more pull request integrations that comment back on the work as well as some simple package and deployment features. I’m not heard quite as much about the deployment feature and how smooth of a process it is.

Yoni Leitersdorf (Indeni Cloudrail)

06:14:00 PM

Oh, the deployment itself I have less experience with. However, if you’re looking to integrate security analysis (github action), I have experience on that front.

Erik Osterman (Cloud Posse)

05:47:05 PM

https://sweetops.slack.com/archives/CQA2BH8AG/p1607708796014500

I need manual approvers for GitHub Actions!!!! And I got them now :) | Azure DevOps Blog attachment image

I love GitHub Actions, but holy moly, I really want manual approvers before I deploy into an environment!!!! My typical workflow would send my pull request to a dynamically provisioned PR specific staging environment and then to teststaging production. And I totally need manual approvers between environments!

Zach

12:42:39 AM

Requires Enterprise plan for private repos, per the lead dev at Github Universe when I asked.

I need manual approvers for GitHub Actions!!!! And I got them now :) | Azure DevOps Blog attachment image

Erik Osterman (Cloud Posse)

12:43:53 AM

I really hope that’s not the case

Zach

12:44:26 AM

https://github.com/githubevents/universe2020/discussions/519

Availability of environments for public repo · Discussion #519 · githubevents/universe2020

All things GitHub Universe 2020! Contribute to githubevents/universe2020 development by creating an account on GitHub.

Zach

12:45:35 AM

Kind of weird wording but it seemed like ‘enterprise only’ was the answer
All repos both public and private across all GitHub skus will get support for the deployment history tracking. Public repos on all plans will have full support for Environments, Protection rules and Secrets. Only private repos under an Enterprise plan will have the ability to use those features.

Zach

12:46:56 AM

oh and the first rep said ‘under Enterprise plans’ which is why I asked about the other paid tier

2020-12-14

2020-12-17

tim.j.birkett

03:56:21 PM

I’m not sure if this has been discussed before but couldn’t really find anything when searching… Preview environments, branch based environments, ephemeral environments… call them what you will, is anybody doing it? It’s always the dream that a developer can create a branch and some bot pings them on Slack to say to visit the magic url…

roth.andy

03:58:03 PM

Check out the office hours recording from yesterday. Env0, one of the vendors that did a presentation, looks to have some pretty cool stuff around creating ephemeral environments with an automatic destroy after a configurable period of time

tim.j.birkett

04:03:10 PM

I meant to catch up on that today but my headphones died - will take a look, thanks!

Christos

04:04:17 PM

Hey yeah I can also help with this @tim.j.birkett. I was doing some research on the topic the other day.

There are many vendors providing a managed solution for this but I would say building it your own isnt a great pain either

Christos

04:04:22 PM

Where you stand?

tim.j.birkett

04:16:54 PM

Thinking of building but wanted to seek out some of the higher level design opinions. For example, you have an application made up of a few microservices and a UI.

The UI would be the simplest thing to deal with, but backend services and data sources (and other infrastructure), not so easy.

Do you clone the entire application (all microservices, UI, and data), or just the microservice being worked on… - what are the patterns and views around this

tim.j.birkett

04:18:57 PM

Maybe starting with a fully distributed microlith isn’t the way to think about it. Starting with something simple, a single container and a DB, like Ghost, or Wordpress perhaps…

Christos

04:27:58 PM

I think as you say a simple fe app is the easier part. Perhaps is a little bit slower to with spinning a huge database. But I was only talking about the FE part when I was thinking of ephimeral environments.

I am keen to see what others think about the BE part

Christos

04:28:25 PM

Perhaps a simple MVP would be a way to start with it

Vincent Behar

07:04:20 PM

you can also check Jenkins X, which has native support for preview env: https://jenkins-x.io/

Jenkins X - Cloud Native CI/CD Built On Kubernetes

Documentation, guides and support for Jenkins X

mfridh

07:04:40 PM

15 years ago did that with php, lighttpd, and wildcard host names mapped to dynamic web roots. It’s amazing how complex the old simple things have gotten .

mfridh

07:06:54 PM

Can you even start a complete environment in docker compose? Start with the simple things first.

roth.andy

07:09:00 PM

@mfridh good point, for some reason I thought OP was talking about infra. If this is just an app then this gets a heck of a lot easier with things like kubernetes, and having a docker-compose to be able to spin one up locally is a great start

Alex Jurkiewicz

09:45:55 PM

Yes if your compute is containers or functions, ephemeral test environments are achievable. If you are using legacy compute (EC2) then I suggest sticking with hard coded test environments. The complexity is too high

roth.andy

09:47:11 PM

Agree. Containerize all the things and this becomes pretty simple

Alex Jurkiewicz

09:49:11 PM

(We actually use ephemeral environments with AWS Elastic Beanstalk. But I’d argue this is basically equivalent to ECS/k8s, rather than self-managed EC2)

mfridh

09:53:19 PM

Fargate or Beanstalk is beautiful for that. But I wouldn’t shy away from doing it with EC2 either if I had to. In some situations you could even argue plain EC2 is easier because some of the other fancy stuff even have a dependency on EC2 itself.

“So you want to solve your problem with Kubernetes? Now you have two problems” is not a completely exaggerated silliness.

mfridh

09:55:20 PM

I may be saying this because of the beautiful 5+ year old vpc dns caching/forwarding solution I just modernized yesterday and today on EC2 fully terraformed and on spot instances with asg lifecycle hooks and all… I bloody like it to be honest.

mfridh

10:13:00 PM

When it comes to ECS and fargate there are so many interesting tools I haven’t tried yet.

Here’s one I stumbled on the other day…

https://github.com/askwonder/wonqa

askwonder/wonqa

Quickly create disposable QA environments. Contribute to askwonder/wonqa development by creating an account on GitHub.

Christian

04:26:53 AM

I managed to have this working with a combination of terraform w/ terragrunt and ECS. Essentially, I grouped all the terragrunt.hcl files in one folder and then I just do a cp on my CI which then runs another apply . Not perfect but it served its purpose.

btai

06:13:44 AM

We’ve achieved this at my work with Kubernetes and internally built tooling. We have hundreds of QA/PR/Demo/Trial sites where every engineer, designer, sales, bdr, marketing person has their own site (or multiple) “environments”, each one essentially being a group of pods deployed into a single namespace (and some aws resources that get auto-provisioned for each site via the internal tool). The internally built tool is the brains that allows granular control of a single QA or PR site (override-able env vars, feature flags, different version, etc), deploy a percentage of sites, or all of them.

It’s actually one of the things I love to highlight when chatting w/ candidates. I too am familiar being at a company where “dev is down” or “qa is broken” brings the entire engineering team’s efficiency to a halt.

Jonathan Marcus

01:54:20 PM

We use IaC (CloudFormation), and the first part of every pipeline is to spin up a clean infra, then deploy the microservices (all containerized) to it. Once the infra & services are deployed, then we run integration tests on the whole thing. Once those pass, we tear it down. In the meantime, we are free to hit the services manually for debugging purposes.

We do this for PRs as well as for deployments. It’s nice, repeatable, and isolated. There are a few shared resources (CloudWatch log groups, VPCs, DBs) but it’s really nice.

Christos

01:57:05 PM

@Jonathan Marcus sounds good! I am curious do you have databases that you populate with data? How long usually populating the database and add data would take in your experience? When I say populate with data I mean a meaningful good amount of data. Very close to the amount of data a production env might have. – And one more question. You say you run integration tests. You run them on every push or how you handling this situation?

Jonathan Marcus

02:01:22 PM

If we abort a pipeline early then we can end up with orphaned environments, but we have a few ways to mitigate that. • Each env is named with a hash of the branch the PR is for, and each pipeline starts by tearing down whatever is there. So if you reuse the same branch name a lot (I’m usually on jm-dev by default) then you’ll reuse the same env.

• Our CI pipeline lets us fix errors in place, instead of starting over at each error. This lets us ensure that we can get to the end & clean up each time, so no orphaned environments. Also we don’t have to do a slow spin-up/teardown each time we want to iterate with new code.

• We also have a check that shows us each env and how old it is

Jonathan Marcus

02:06:15 PM

@Christos the DB is part of the infra that is shared (along with VPC, log groups, etc). We do that because RDS takes a super long time to spin up. We therefore don’t have to load fresh data into it either.

On each PR, we:

Deploy new infra
Unit test microservices
Deploy microservices to new infra
Run integration tests against new env
Tear down On each push we do the same, except step 5 is replaced with 5) manual QA, 6) blue/green deployment.

And per @tim.j.birkett’s original question, we do also get a Slack notification with the URL to the new env. Yes that is possible, and yes it is as amazing as you’re hoping.

Christos

02:13:18 PM

I see, thanks for taking the time to explain. Makes sense about the DB because we are having the problem you mention. Spinning the db is being very slow.

We wanted to run some e2e tests on our webapps on each push on the PR but this is very time consuming.. At least if you want to spin a new clean db on each push the developer does on the PR. We wanted it to be clean database so that tests dont fail unexpectedly when someone manipulated data from a shared database. This way tests could always run against the same database.

Jonathan Marcus

02:34:39 PM

Maybe try reusing the same DB instance but making a new table each time. It’ll probably be much faster to do INSERT INTO new_tbl SELECT * FROM src_tbl than to load it fresh from an external source.

Christos

02:35:16 PM

Right, thats smart

mfridh

02:58:21 PM

Yeah. We done this as well for 15 years. Also using snapshots on LVM historically for repeated db migration script verification tests on production data snapshots. My thought is: What am I really testing? My applications? Or Amazon’s infrastructure… so I remove as much of the cloud provider parts as possible. It gets tested enough anyway in many other manners.

Jonathan Marcus

03:02:12 PM

Nice, very clever.

tim.j.birkett

03:12:27 PM

@Jonathan Marcus - when you create the PR infra / environment, is it just the microservice you’re changing that gets deployed in step 3? Or all other dependencies that make up the full software “product”?

Jonathan Marcus

04:35:33 PM

All of them. We want to test how the new microservice interacts with all the others, so for full coverage we deploy the full product.

loren

09:51:53 PM

The aws amplify console build service does this pretty well, with branches and pr preview builds

2020-12-18

2020-12-20

loren

08:59:19 PM

anyone know of tooling for managing select files across numerous github repositories? mostly thinking about a handful of files that are often identical (or nearly), like .editorconfig, LICENSE, or maybe a github actions yaml… i did find one github action, curious if anyone has experience with it or any other options… https://github.com/kbrashears5/github-action-file-sync

kbrashears5/github-action-file-sync

Github Action to sync files across repositories. Contribute to kbrashears5/github-action-file-sync development by creating an account on GitHub.

loren

09:00:27 PM

figuring some kind of file templating will be necessary, also…

kbrashears5/github-action-file-sync

Github Action to sync files across repositories. Contribute to kbrashears5/github-action-file-sync development by creating an account on GitHub.

kskewes

12:28:02 AM

first thing that comes to mind is a template repo and makefile or similar to iterate a list of concrete repo’s.

Matt Gowie

01:50:35 AM

I recently copied Cloud Posse’s build-harness / gomplate pattern that they use for READMEs for a client to handle this type of thing. I use it to template drone.yml (Drone CI, similar to Jenkinsfile), gitignore, and .editorconfig files right now across 30+ repos.

It solves the problem of allowing centralized management of files that are cross cutting over many projects, but you do still have to execute commands against the repos to pull the file updates and then go through the PR process. If I had the time, I’d invest in going the mergify route and make sure that the automated PRs that I put up when I do updates across all repos would merge automagically when CI passes.

loren

02:58:37 AM

For sure, it’s the entirety of the workflow that I’m looking for… Pull from one central repo with the templates, for every managed repo compare the current contents of the default branch, if different then branch, update, commit, and open a pr…

Matt Gowie

03:39:49 AM

Yeah, if you find that… I want to know about it

loren

05:04:19 AM

The github action I linked is actually pretty close, though it doesn’t look like it supports file templating… I’ve also considered using terraform, with the resource github_repository_file, which can do the diff and the templating but doesn’t feel like it can handle the workflow of checking the files in the default branch but making any changes in a (new) bug/feature branch…

Matt Gowie

02:15:12 PM

I feel like the file templating is a must.

I’ve had rough going with the the GH provider and while I love Terraform obviously… using that repo_file resource always seemed like a bad idea to me. That’s without much research though so if you go down that path I’d be interested in your results.

loren

02:46:35 PM

using the repo_file resource is definitely a little rough… and fails completely when branch protection is enabled. but also seems to have some api restrictions that i do not understand… for example, currently cannot manage files under the .github/workflows path… https://github.com/terraform-providers/terraform-provider-github/issues/633

github_repository_file: 404 Not Found when creating .github/workflows/foo.yaml · Issue #633 · terraform-providers/terraform-provider-github

Terraform Version $ terraform -v Terraform v0.14.3 + provider registry.terraform.io/hashicorp/github v4.1.0 + provider registry.terraform.io/hashicorp/random v3.0.0 Affected Resource(s) github_repo…

loren

03:19:17 PM

went ahead and made a feature request for file templating on that github action… we’ll see where that goes… https://github.com/kbrashears5/github-action-file-sync/issues/7

Feature request: Templating files · Issue #7 · kbrashears5/github-action-file-sync

Hi, cool tool! Was researching how to manage common files across GitHub repos, and came across this project. One of the things we are finding is that we often need to template the files a little bi…

loren

12:14:43 AM

hmm, gomplate may be close to doing this all on its own… found a ticket opened by @Erik Osterman (Cloud Posse) for the feature i could abuse to make this work… https://github.com/hairyhenderson/gomplate/issues/589

Feature Request: Gomplate as a Template Manager · Issue #589 · hairyhenderson/gomplate

@hairyhenderson I understand this may be a long shot feature request! Anyways, just wanted to get it on your radar, in case it sounds cool. what support a way to create templatized projects, e.g. a…

loren

12:16:53 AM

linked from there to another gomplate issue maybe indicating the datasource feature could be abused to retrieve the templates also… https://github.com/hairyhenderson/gomplate/issues/963#issuecomment-710559472

Template to process from remote datasource (such as aws+ssm) · Issue #963 · hairyhenderson/gomplate

I read the documentation but couldn't quite figure the answer to my question. What I would like to do is to store the complete template in AWS SSM Parameter Store. Inside the template is the da…

Erik Osterman (Cloud Posse)

12:32:13 AM

Yes I would like to use gomplate for the same purpose as you mention

Erik Osterman (Cloud Posse)

12:32:31 AM

There is also https://github.com/vmware-tanzu/carvel-vendir

vmware-tanzu/carvel-vendir

Easy way to vendor portions of git repos, github releases, helm charts, docker image contents, etc. declaratively - vmware-tanzu/carvel-vendir

Erik Osterman (Cloud Posse)

12:32:52 AM

But no templating

Erik Osterman (Cloud Posse)

12:33:53 AM

Can’t wait for https://github.com/github/roadmap/issues/98

Actions: Centrally managed workflow templates · Issue #98 · github/roadmap

Summary To encourage best practices and consistency, this feature enables you to provide centrally managed workflows across multiple repositories in your organization. This feature includes: Abilit…

Erik Osterman (Cloud Posse)

08:10:07 PM

@hairyhenderson any updates on https://github.com/hairyhenderson/gomplate/issues/589

Feature Request: Gomplate as a Template Manager · Issue #589 · hairyhenderson/gomplate

hairyhenderson

08:30:29 PM

oh hi

hairyhenderson

08:32:51 PM

@Erik Osterman (Cloud Posse) sort of? there are a few new features that help with that, and I’ve been working on refactoring some stuff to help with how data/templates/etc are read in general (so you can read input templates from any URL, etc…). But it’s been slow-going… I’ve been working on switching jobs (starting new job on Monday) so I haven’t been able to dedicate much time to gomplate unfortunately

Erik Osterman (Cloud Posse)

08:33:39 PM

Congrats on the job transition! Yea, understood. Appreciate the update!

hairyhenderson

08:33:45 PM

hairyhenderson

08:34:26 PM

one thing that’s bogging me down too is the changes I need to make are pretty major, and I may need to break API to do it… I don’t want to release gomplate 4.0 yet, but I may need to… we’ll see

Erik Osterman (Cloud Posse)

08:36:03 PM

We have more an more use-cases for generators/scaffolding (E.g. github actions, terraform modules) and I really want something simple like boilr (abandoned) that’s distributed as a single binary.

hairyhenderson

08:38:21 PM

yup, totally

loren

03:30:53 PM

oh, just came across copier, which seems built for exactly this repo templating and updating use case… https://copier.readthedocs.io/en/stable/

copier

Library and command-line utility for rendering projects templates.

Erik Osterman (Cloud Posse)

03:32:33 PM

Ah cool - also seems like an alternative to Cookie cutter

Erik Osterman (Cloud Posse)

03:33:16 PM

I am considering introducing a tool like cookie cutter - leaning towards it because of critical mass. It would be a done deal if it was a standalone binary

loren

03:37:37 PM

i’ve used it, wouldn’t recommend it

loren

03:38:25 PM

copier actually has a comparison page, includes cookiecutter, reads pretty fair to me, https://copier.readthedocs.io/en/stable/comparisons/

Comparisons - copier

Library and command-line utility for rendering projects templates.

Erik Osterman (Cloud Posse)

08:26:25 PM

What led you to copier? Are you seeking a solution for this right now? @loren

Erik Osterman (Cloud Posse)

08:27:34 PM

(Sorry I lost context and got confused by another thread)

Erik Osterman (Cloud Posse)

08:28:02 PM

…for templating GitHub actions and centralizing workflows

Erik Osterman (Cloud Posse)

08:29:39 PM

A few days ago we released the version we are using now for #codefresh

Erik Osterman (Cloud Posse)

08:29:43 PM

https://github.com/cloudposse/actions/tree/master/codefresh/pipeline-creator

cloudposse/actions

Our Library of GitHub Actions. Contribute to cloudposse/actions development by creating an account on GitHub.

Erik Osterman (Cloud Posse)

08:30:05 PM

Uses gomplate :-)

loren

09:22:12 PM

the idea is still a little undefined… but something like that, yeah. maintain a central “template” repo. use it to create new repos. and use it to periodically “refresh” contents in other repos. the pattern in that pipeline-creator comes pretty close i think… github action in each project that clones the template repo, runs gomplate to template files. i would like to also open a pr if there is a changeset… copier supports git sources and templating, and has options to control whether to overwrite any given file.

Erik Osterman (Cloud Posse)

09:23:36 PM

Ya for Codefresh it was a little bit easier because we didn’t need to commit back the files anywhere. We just deploy the pipelines manifests via the API.

#release-engineering (2020-12)

All things CI/CD. Specific emphasis on Codefresh and CodeBuild with CodePipeline.

2020-12-10

2020-12-11

2020-12-14

2020-12-17

2020-12-18

2020-12-20

2020-12-21