SweetOps #spacelift for May, 2024

2024-05-01

Zing

we just adopted Atlantis, and while it seems to do the job.. we’re not super impressed with its capabilities. we MIGHT look into paid products like spacelift, but curious what a ballpark $ would be for the enterprise plan. I’m sure that’s a hard question to answer without specifics, but some general ballpark #s for an environment like this would be much appreciated:

• self hosted GHA via ARC

• ~ 40 terraform repos

• ~ 40 ish users who submit PRs against those repos, but only 10 ish who actually need to support the infra TY!

Erik Osterman (Cloud Posse)

12:59:30 PM

while I cannot comment on prices directly, the largest factor influencing price is the number of concurrent runners.

Erik Osterman (Cloud Posse)

01:02:00 PM

at least for our customers, we break the architecture into smaller components, which is great for increasing the speed of plans, reducing the blast radius, and increasing reusability of “root modules”, it also means more concurrent runs are necessary for it to be practical. If you don’t have that many instances of your root modules (what spacelift calls stacks), then it’s less of a big deal.

01:12:54 PM

@ you might be able to help

loren

01:27:06 PM

It’s easy to self-host spacelift runners, also…

Erik Osterman (Cloud Posse)

01:30:39 PM

(it’s easy to self-host, but still important to be aware they are billed by the number of agents deployed, which has led to “bill shock” for our customers when not carefully managed; contrast that with self-hosted GitHub runners, which are free and you can scale to hundreds)

Terry Flowers

02:26:37 PM

@Zing I’d be happy to discuss this with you further. My calendar is here

Zing

11:09:34 AM

thanks for the info, I’ll reach out if we decide to go this route!

slackbot

01:22:30 PM

Upgrade to reconnect this channel

Your team recently switched to a free plan. Spacelift has been removed from this channel. To automatically reconnect this channel, upgrade to a paid plan by Sunday, May 5th.

2024-05-02

2024-05-06

johncblandii

07:11:54 PM

Is there any guidance published regarding splitting an admin stack?

context: our plat admin stack is massive now. it handles dev, beta, qa, and prod for a lot of stacks. this takes a long while to run. i’d like to split stacks by stage and am wondering if we would have any issues (namely would stacks destroy and recreate) and if we could just import into the new admin stack.

Erik Osterman (Cloud Posse)

07:17:21 PM

@Andriy Knysh (Cloud Posse) @Dan Miller (Cloud Posse)

Dan Miller (Cloud Posse)

07:34:52 PM

absolutely. You could definitely split these up by stage in plat. If you haven’t seen it yet, take a look at this page that explains how the admin stacks are organized by default https://docs.cloudposse.com/components/library/aws/spacelift/#stack-configuration

In order to move the existing admin-stack from plat to plat-stage , there are a few ways you could handle it:

Yes you could use imports, but that would be extremely tedious - every single stack you have now would need to be imported. Maybe it could be scripted, but I don’t believe we’ve done that yet.
Or you could pull the state file locally and migrate it using .tfstate, see our guidance on moving components here, but that you’d still need to split the stacks into each stage. We’d have to look into how to do that
Finally what I would do. (1) If you don’t have stack destructor enabled, if you destroy a Spacelift stack, Spacelift will not destroy the resources in that stack. And (2) we store all Terraform state in S3, so we don’t have to worry about deleting state. Please confirm both of these before continuing. Therefore you can delete stacks in Spacelift entirely and then redeploy them at any point to restore them. This will kill all stack history in Spacelift but is by far the easiest way to rearrange stacks in my opinion

spacelift | The Cloud Posse Developer Hub

These components are responsible for setting up Spacelift and include three components: spacelift/admin-stack,

Atmos component migration in YAML config | atmos

Learn how to migrate an Atmos component to a new name or to use the metadata.inheritance.

johncblandii

07:45:23 PM

makes sense. i figured that’d be the case.

johncblandii

07:45:36 PM

simply changing spaces doesn’t have that impact, does it?

Dan Miller (Cloud Posse)

07:47:02 PM

I don’t believe so, as long as changing the space doesnt trigger a destroy and recreate with Terraform

Dan Miller (Cloud Posse)

07:47:40 PM

It should be able to be updated without a forced replacement

johncblandii

07:47:40 PM

gotcha

johncblandii

07:47:53 PM

meeting with Spacelift shortly to discuss this as well so will clarify that too

johncblandii

07:49:01 PM

< makes sense. considering a new space per tenant stage with a new admin stack in the same way

Andriy Knysh (Cloud Posse)

07:49:20 PM

yes, you can move a stack to a diff space w/o destroying it

Andriy Knysh (Cloud Posse)

07:50:42 PM

as Dan mentioned, the easiest way to move stacks to another admin stack is to destroy them. If the stack destructor is enabled, disable it first and deploy (this is the most dangerous part, especially for production stacks), then change the admin stack

johncblandii

07:52:36 PM

we don’t have any destructors

Andriy Knysh (Cloud Posse)

07:52:52 PM

we did not do any importing when migrating stacks from one admin stack to another b/c:

1) it’s a lot of resources, not only stacks, each stack has tens of diff resources, importing all of those is not trivial (although possible) 2) Spacelift keeps the stack history for 3 months (by default, unless you pay for more). In many cases it’s not a big deal to destroy the history

johncblandii

07:52:54 PM

would just like to keep the history, but yeah…that seems highly tedious

2024-05-10

johncblandii

04:08:09 PM

Any thoughts on how Spacelift manages squash commits?

We had a scenario where commit A ran (created 3 ssm params) and commit B ran (deleted those 3).

I’ve seen some seemingly weird nuance to how commits are handled where it uses hashes from PR and not solely the squash commit id.

is this an issue with the policy or something deeper in spacelift?

loren

04:23:04 PM

spacelift isn’t checking out the code in a pr, and performing a merge from the target branch. it’s just using the code from the pr branch directly. so it’s easy to get out-of-sync.

loren

04:24:58 PM

i’ve setup all my repos to require merge commits, and require branch to be up-to-date before merging. it forces spacelift to re-run the stack after anything is merged, and before merging anything new

loren

04:25:40 PM

if you have access to the feedback portal, thumbs-up this request, https://portal.feedback.us.pendo.io/app/#/case/351390

johncblandii

06:00:55 PM

The problem with requiring updates is an active repo will slow down the entire process of merging PRs.

Forcing the update will force all admin and affected stacks to run. This could take some time with limited runners available.

loren

06:06:57 PM

yep. but i don’t know any other way. and mostly, i want to require updates anyways, on a terraform project with state. exactly because two different prs may otherwise step on the same resource, and display an invalid/incomplete plan

loren

06:09:34 PM

and if i’m actually retrieving the plan file for the apply step, then the apply ought to fail since the plan will be different than what it wants to do now. and that puts the whole pipeline into a weird place that needs immediate attention to straighten it out

loren

06:10:13 PM

i mean, either way, i have a failed apply, or an apply that succeeded but didn’t do what it said it would do in the review step

Erik Osterman (Cloud Posse)

09:16:51 PM

@johncblandii heard this complaint before too, but agree with @loren - don’t know another way. Our default is requiring branches be up to date before merging, and squash commits (not merge commits). Using Mergify, you can automatically update PRs when they fall behind. Isn’t the problem with “holding up the other PRs” less about the “hours in the day” and more about developers having to manually update their branches and wait again for spacelift? If that were fully automated, and auto merge on approval and status checks passing, things would just flow through.

pull_request_rules:
  - name: automatically update pull requests behind main
    conditions:
      - base=main
      - "#commits-behind>0"
    actions:
      update:

Erik Osterman (Cloud Posse)

09:19:14 PM

Alternatively, I would love to explore merge queues

Erik Osterman (Cloud Posse)

09:19:16 PM

The merge queue provides the same benefits as the Require branches to be up to date before merging branch protection, but does not require a pull request author to update their pull request branch and wait for status checks to finish before trying to merge.

Erik Osterman (Cloud Posse)

09:19:31 PM

https://docs.github.com/en/repositories/configuring-branches-and-merges-in-your-repository/configuring-pull-request-merges/managing-a-merge-queue

loren

09:19:33 PM

i like the idea of using mergify to help. i wonder if i can figure out something similar for self-hosted gitlab…

loren

09:20:52 PM

huh. i always wondered what the point of the merge queue feature was. first use case i’ve heard of that actually makes sense

Erik Osterman (Cloud Posse)

09:21:17 PM

(same here)

Erik Osterman (Cloud Posse)

09:22:08 PM

It’s funny how a relatable problem statement often can make features so obvious

loren

09:23:27 PM

right? i do still think spacelift needs to fix the “checkout” problem. they store the plan from the propose step, and then use it again during the track step. if the plan file is wrong, and not updated throughout the merge queue, there will still be problems

loren

09:27:18 PM

and what if two prs step on the same resource? what is actually getting reviewed/approved? i dunno. requiring up to date branches before merge just still seems like the only way to stop apply-time errors from clogging the pipeline

Erik Osterman (Cloud Posse)

09:30:59 PM

We ended up using GHA + Spacelift so we could better detect affected components and stacks.

https://github.com/cloudposse/github-action-atmos-affected-trigger-spacelift

I wonder if this would make it easier to implement, since there’s a documented way for how to do it with GHA.

on:
  merge_group:

and what if two prs step on the same resource? what is actually getting reviewed/approved? Hmm… so it would kick it back if the space lift plan fails (negative status check), but you’re right if it succeeds, you’re not necessarily applying what was approved.

cloudposse/github-action-atmos-affected-trigger-spacelift

A GitHub action to trigger spacelift for all atmos affected stacks

johncblandii

03:10:28 PM

will definitely check into the merge queue

Erik Osterman (Cloud Posse)

03:14:32 PM

If you do, let us know how it goes!

#spacelift (2024-05)

2024-05-01

2024-05-02

2024-05-06

2024-05-10

2024-05-13