#spacelift (2024-05)
2024-05-01
we just adopted Atlantis, and while it seems to do the job.. we’re not super impressed with its capabilities. we MIGHT look into paid products like spacelift, but curious what a ballpark $ would be for the enterprise plan. I’m sure that’s a hard question to answer without specifics, but some general ballpark #s for an environment like this would be much appreciated:
• self hosted GHA via ARC
• ~ 40 terraform repos
• ~ 40 ish users who submit PRs against those repos, but only 10 ish who actually need to support the infra TY!
while I cannot comment on prices directly, the largest factor influencing price is the number of concurrent runners.
at least for our customers, we break the architecture into smaller components, which is great for increasing the speed of plans, reducing the blast radius, and increasing reusability of “root modules”, it also means more concurrent runs are necessary for it to be practical. If you don’t have that many instances of your root modules (what spacelift calls stacks), then it’s less of a big deal.
@ you might be able to help
It’s easy to self-host spacelift runners, also…
(it’s easy to self-host, but still important to be aware they are billed by the number of agents deployed, which has led to “bill shock” for our customers when not carefully managed; contrast that with self-hosted GitHub runners, which are free and you can scale to hundreds)
@Zing I’d be happy to discuss this with you further. My calendar is here
Upgrade to reconnect this channel
Your team recently switched to a free plan. Spacelift has been removed from this channel. To automatically reconnect this channel, upgrade to a paid plan by Sunday, May 5th.
2024-05-02
2024-05-06
Is there any guidance published regarding splitting an admin stack?
context: our plat admin stack is massive now. it handles dev, beta, qa, and prod for a lot of stacks. this takes a long while to run. i’d like to split stacks by stage and am wondering if we would have any issues (namely would stacks destroy and recreate) and if we could just import into the new admin stack.
@Andriy Knysh (Cloud Posse) @Dan Miller (Cloud Posse)
absolutely. You could definitely split these up by stage in plat
. If you haven’t seen it yet, take a look at this page that explains how the admin stacks are organized by default
https://docs.cloudposse.com/components/library/aws/spacelift/#stack-configuration
In order to move the existing admin-stack from plat
to plat-stage
, there are a few ways you could handle it:
- Yes you could use imports, but that would be extremely tedious - every single stack you have now would need to be imported. Maybe it could be scripted, but I don’t believe we’ve done that yet.
- Or you could pull the state file locally and migrate it using
.tfstate
, see our guidance on moving components here, but that you’d still need to split the stacks into each stage. We’d have to look into how to do that - Finally what I would do. (1) If you don’t have stack destructor enabled, if you destroy a Spacelift stack, Spacelift will not destroy the resources in that stack. And (2) we store all Terraform state in S3, so we don’t have to worry about deleting state. Please confirm both of these before continuing. Therefore you can delete stacks in Spacelift entirely and then redeploy them at any point to restore them. This will kill all stack history in Spacelift but is by far the easiest way to rearrange stacks in my opinion
These components are responsible for setting up Spacelift and include three components: spacelift/admin-stack,
Learn how to migrate an Atmos component to a new name or to use the metadata.inheritance
.
makes sense. i figured that’d be the case.
simply changing spaces doesn’t have that impact, does it?
I don’t believe so, as long as changing the space doesnt trigger a destroy and recreate with Terraform
gotcha
meeting with Spacelift shortly to discuss this as well so will clarify that too
< makes sense. considering a new space per tenant stage with a new admin stack in the same way
as Dan mentioned, the easiest way to move stacks to another admin stack is to destroy them. If the stack destructor is enabled, disable it first and deploy (this is the most dangerous part, especially for production stacks), then change the admin stack
we don’t have any destructors
we did not do any importing when migrating stacks from one admin stack to another b/c:
1) it’s a lot of resources, not only stacks, each stack has tens of diff resources, importing all of those is not trivial (although possible) 2) Spacelift keeps the stack history for 3 months (by default, unless you pay for more). In many cases it’s not a big deal to destroy the history
would just like to keep the history, but yeah…that seems highly tedious
2024-05-10
Any thoughts on how Spacelift manages squash commits?
We had a scenario where commit A ran (created 3 ssm params) and commit B ran (deleted those 3).
I’ve seen some seemingly weird nuance to how commits are handled where it uses hashes from PR and not solely the squash commit id.
is this an issue with the policy or something deeper in spacelift?
spacelift isn’t checking out the code in a pr, and performing a merge from the target branch. it’s just using the code from the pr branch directly. so it’s easy to get out-of-sync.
i’ve setup all my repos to require merge commits, and require branch to be up-to-date before merging. it forces spacelift to re-run the stack after anything is merged, and before merging anything new
if you have access to the feedback portal, thumbs-up this request, https://portal.feedback.us.pendo.io/app/#/case/351390
The problem with requiring updates is an active repo will slow down the entire process of merging PRs.
Forcing the update will force all admin and affected stacks to run. This could take some time with limited runners available.
yep. but i don’t know any other way. and mostly, i want to require updates anyways, on a terraform project with state. exactly because two different prs may otherwise step on the same resource, and display an invalid/incomplete plan
and if i’m actually retrieving the plan file for the apply step, then the apply ought to fail since the plan will be different than what it wants to do now. and that puts the whole pipeline into a weird place that needs immediate attention to straighten it out
i mean, either way, i have a failed apply, or an apply that succeeded but didn’t do what it said it would do in the review step
@johncblandii heard this complaint before too, but agree with @loren - don’t know another way. Our default is requiring branches be up to date before merging, and squash commits (not merge commits). Using Mergify, you can automatically update PRs when they fall behind. Isn’t the problem with “holding up the other PRs” less about the “hours in the day” and more about developers having to manually update their branches and wait again for spacelift? If that were fully automated, and auto merge on approval and status checks passing, things would just flow through.
pull_request_rules:
- name: automatically update pull requests behind main
conditions:
- base=main
- "#commits-behind>0"
actions:
update:
Alternatively, I would love to explore merge queues
The merge queue provides the same benefits as the Require branches to be up to date before merging branch protection, but does not require a pull request author to update their pull request branch and wait for status checks to finish before trying to merge.
i like the idea of using mergify to help. i wonder if i can figure out something similar for self-hosted gitlab…
huh. i always wondered what the point of the merge queue feature was. first use case i’ve heard of that actually makes sense
(same here)
It’s funny how a relatable problem statement often can make features so obvious
right? i do still think spacelift needs to fix the “checkout” problem. they store the plan from the propose step, and then use it again during the track step. if the plan file is wrong, and not updated throughout the merge queue, there will still be problems
and what if two prs step on the same resource? what is actually getting reviewed/approved? i dunno. requiring up to date branches before merge just still seems like the only way to stop apply-time errors from clogging the pipeline
We ended up using GHA + Spacelift so we could better detect affected components and stacks.
https://github.com/cloudposse/github-action-atmos-affected-trigger-spacelift
I wonder if this would make it easier to implement, since there’s a documented way for how to do it with GHA.
on:
merge_group:
and what if two prs step on the same resource? what is actually getting reviewed/approved?
Hmm… so it would kick it back if the space lift plan fails (negative status check), but you’re right if it succeeds, you’re not necessarily applying what was approved.
A GitHub action to trigger spacelift for all atmos affected stacks
will definitely check into the merge queue
If you do, let us know how it goes!