SweetOps #release-engineering for May, 2023

CI/CD Discussions

Archive: https://archive.sweetops.com/release-engineering/

2023-05-01

Sudhish KR

Hey Folks,

We at Dgraph Labs use Github as our VCS. And we have recently migrated our CI/CD setup to Github Actions. This was a huge win for us internally, especially in a startup setting like ours. Our wins were broadly in these 3 areas Compute Costs, Maintenance Efforts & Configuration Time.

With this new setup, we designed & developed Dynamic AutoScaling of Github Runners in house. We are thinking of open-sourcing this project. If there is any interest here - pls do reach out. We were able to save ~87% $$ of our Compute Costs with this setup.

Blog Link => https://www.sudhishkr.com/posts/20230217_dynamic-autoscaling-of-github-runners/

Dynamic AutoScaling Of GitHub Runners attachment image

In this article we explain our transition to GitHub Actions for our CI/CD needs at Dgraph Labs Inc. As a part of this effort we have built (in-house) & implemented a new architecture for “Dynamic AutoScaling of GitHub Runners” to power this setup. In the past, our CI/CD was powered by a self-hosted on-prem TeamCity setup - this turned out to be a little difficult to operate & manage in a startup setting like ours. Transitioning to GitHub Actions & implementing our new in-house built “Dynamic AutoScaling of GitHub Runners” - has helped us reduce our Compute Costs, Maintenance Efforts & Configuration Time across our repositories for our CI/CD efforts (with improved security).

Alex Jurkiewicz

01:47:24 AM

have you seen https://github.com/philips-labs/terraform-aws-github-runner ? How does your solution differ?

philips-labs/terraform-aws-github-runner

Terraform module for scalable GitHub action runners on AWS

Sudhish KR

01:50:33 AM

Hey @Alex Jurkiewicz.. we have actually looked at this, and found it a little expensive & difficult to manage. Our setup is quite straightforward with minimal components.

I have mentioned this in the blog post

Happy to discuss further if there is interest.

Alex Jurkiewicz

02:00:16 AM

gotcha. I’m not very clear on what you mean by expensive – doesn’t the philips solution scale to zero as well?

Sudhish KR

02:02:45 AM

The most difficult part with Philips solution is the dependency it has on number of components it uses on AWS. Which is a combination of Lambda’s, SQS, API-Gateway + Compute resources.

The solution that is proposed above is using SSM + Compute resources.

The more components we have - the harder it gets to track when one of them fails. This makes the philips solution a little expensive, in terms of maintenance costs & also to some extent w.r.t $ costs as well.

What we aimed for & achieved was a simpler solution.

Soren Jensen

05:53:56 PM

It’s always great to see new solutions. But I’m with Alex on this one. The Philips module is battle tested for a long time and scales to zero. We been using it for nearly 2 years now and haven’t had to debug it once.

Sudhish KR

05:13:23 AM

Hey @Soren Jensen thanks for the honest feedback. I understand the hesitation when other products are more battle tested & has had better traction.

We will be open sourcing this setup sometime soon. And I will be happy to share the github link. We would like your feedback. (positive or negative - both are welcome :))

We have battle tested this for ~6months (not as much as philips obviously). Given that we are a database company, our workloads are quite diverse - and it covers most edge cases really well. And handles different kinds of machine type requirements, dynamic scheduling, scaling to 0, smart re-use of a machine (if it’s in the cusp of finish, based on historic times). It has some amount of extra smarts, which are in beta mode. But more room to optimize.

#release-engineering (2023-05)

All things CI/CD. Specific emphasis on Codefresh and CodeBuild with CodePipeline.

2023-05-01

2023-05-02

2023-05-03