#release-engineering (2023-05)

jenkins_ci All things CI/CD. Specific emphasis on Codefresh and CodeBuild with CodePipeline.

CI/CD Discussions

Archive: https://archive.sweetops.com/release-engineering/


Sudhish KR avatar
Sudhish KR

Hey Folks,

We at Dgraph Labs use Github as our VCS. And we have recently migrated our CI/CD setup to Github Actions. This was a huge win for us internally, especially in a startup setting like ours. Our wins were broadly in these 3 areas Compute Costs, Maintenance Efforts & Configuration Time.

With this new setup, we designed & developed Dynamic AutoScaling of Github Runners in house. We are thinking of open-sourcing this project. If there is any interest here - pls do reach out. We were able to save ~87% $$ of our Compute Costs with this setup.

Blog Link => https://www.sudhishkr.com/posts/20230217_dynamic-autoscaling-of-github-runners/

Dynamic AutoScaling Of GitHub Runnersattachment image

In this article we explain our transition to GitHub Actions for our CI/CD needs at Dgraph Labs Inc. As a part of this effort we have built (in-house) & implemented a new architecture for “Dynamic AutoScaling of GitHub Runners” to power this setup. In the past, our CI/CD was powered by a self-hosted on-prem TeamCity setup - this turned out to be a little difficult to operate & manage in a startup setting like ours. Transitioning to GitHub Actions & implementing our new in-house built “Dynamic AutoScaling of GitHub Runners” - has helped us reduce our Compute Costs, Maintenance Efforts & Configuration Time across our repositories for our CI/CD efforts (with improved security).

Alex Jurkiewicz avatar
Alex Jurkiewicz

have you seen https://github.com/philips-labs/terraform-aws-github-runner ? How does your solution differ?


Terraform module for scalable GitHub action runners on AWS

Sudhish KR avatar
Sudhish KR

Hey @Alex Jurkiewicz.. we have actually looked at this, and found it a little expensive & difficult to manage. Our setup is quite straightforward with minimal components.

I have mentioned this in the blog post

Happy to discuss further if there is interest.

Alex Jurkiewicz avatar
Alex Jurkiewicz

gotcha. I’m not very clear on what you mean by expensive – doesn’t the philips solution scale to zero as well?

Sudhish KR avatar
Sudhish KR

The most difficult part with Philips solution is the dependency it has on number of components it uses on AWS. Which is a combination of Lambda’s, SQS, API-Gateway + Compute resources.

The solution that is proposed above is using SSM + Compute resources.

The more components we have - the harder it gets to track when one of them fails. This makes the philips solution a little expensive, in terms of maintenance costs & also to some extent w.r.t $ costs as well.

What we aimed for & achieved was a simpler solution.

Soren Jensen avatar
Soren Jensen

It’s always great to see new solutions. But I’m with Alex on this one. The Philips module is battle tested for a long time and scales to zero. We been using it for nearly 2 years now and haven’t had to debug it once.

Sudhish KR avatar
Sudhish KR

Hey @Soren Jensen thanks for the honest feedback. I understand the hesitation when other products are more battle tested & has had better traction.

We will be open sourcing this setup sometime soon. And I will be happy to share the github link. We would like your feedback. (positive or negative - both are welcome :))

We have battle tested this for ~6months (not as much as philips obviously). Given that we are a database company, our workloads are quite diverse - and it covers most edge cases really well. And handles different kinds of machine type requirements, dynamic scheduling, scaling to 0, smart re-use of a machine (if it’s in the cusp of finish, based on historic times). It has some amount of extra smarts, which are in beta mode. But more room to optimize.