SweetOps #geodesic for August, 2019

Discussions related to https://github.com/cloudposse/geodesic

Archive: https://archive.sweetops.com/geodesic/

2019-08-01

Tega McKinney

Is it possible override the terraform backend s3 key? I created the following top-level folder structure:

root
-- vpc
-- -- terraform.envrc

I would like the key to be : root/vpc/terraform.tfstate so I set TF_CLI_INIT_BACKEND_CONFIG_KEY=root/vpc/terraform.tfstate.

This does not override the default cause with the key ends up being vpc/terraform.tfstate. I sit possible to override the key?

Tega McKinney

02:17:11 PM

Got it. I needed to remove ENV TF_BUCKET_PREFIX_FORMAT="basename-pwd" from my Dockerfile so it would use the full directory path afer /conf.

Erik Osterman (Cloud Posse)

03:05:14 PM

You got it!

Tega McKinney

03:09:08 AM

Ran into another issue where I have created additional root-dns entries that have a hyphen in the name. The label is removing the hyphen from the stage however I assumed the regex_replace_chars should not replace hyphens. Is there something else that could be stripping the hyphen?

Tega McKinney

03:37:12 AM

Got it. Took some digging to realize was on an older version on terraform-null-label from reference-architecture. Updated to 0.11.1 and it works perfectly.

Erik Osterman (Cloud Posse)

04:20:25 PM

ah sweet!

Erik Osterman (Cloud Posse)

04:20:34 PM

yes, we had a bug I think related to that

2019-08-02

2019-08-04

Tega McKinney

12:18:45 PM

Regarding helmfiles, is multi-stage dockerfiles the current approach? How does that relate to /templates/conf/helmfiles/... in reference-architectures?

Erik Osterman (Cloud Posse)

04:58:04 AM

great you ask

Erik Osterman (Cloud Posse)

04:58:18 AM

no - we’re using remote helmfiles pinned to releases now

Erik Osterman (Cloud Posse)

04:58:47 AM

It looks like this:

Erik Osterman (Cloud Posse)

04:58:49 AM

# Ordered list of releases.
# Terraform-module-like URLs for importing a remote directory and use a file in it as a nested-state file.
# The nested-state file is locally checked-out along with the remote directory containing it.
# Therefore all the local paths in the file are resolved relative to the file.

helmfiles:
  - path: git::<https://github.com/cloudposse/helmfiles.git@releases/reloader.yaml?ref=0.47.0>
  - path: git::<https://github.com/cloudposse/helmfiles.git@releases/cert-manager.yaml?ref=0.47.0>
  - path: git::<https://github.com/cloudposse/helmfiles.git@releases/prometheus-operator.yaml?ref=0.47.0>
  - path: git::<https://github.com/cloudposse/helmfiles.git@releases/cluster-autoscaler.yaml?ref=0.47.0>
  - path: git::<https://github.com/cloudposse/helmfiles.git@releases/kiam.yaml?ref=0.51.2>
  - path: git::<https://github.com/cloudposse/helmfiles.git@releases/external-dns.yaml?ref=0.47.0>
  - path: git::<https://github.com/cloudposse/helmfiles.git@releases/teleport-ent-auth.yaml?ref=0.47.0>
  - path: git::<https://github.com/cloudposse/helmfiles.git@releases/teleport-ent-proxy.yaml?ref=0.47.0>
  - path: git::<https://github.com/cloudposse/helmfiles.git@releases/aws-alb-ingress-controller.yaml?ref=0.47.0>
  - path: git::<https://github.com/cloudposse/helmfiles.git@releases/kube-lego.yaml?ref=0.47.0>
  - path: git::<https://github.com/cloudposse/helmfiles.git@releases/nginx-ingress.yaml?ref=0.47.0>
  - path: git::<https://github.com/cloudposse/helmfiles.git@releases/heapster.yaml?ref=0.47.0>
  - path: git::<https://github.com/cloudposse/helmfiles.git@releases/dashboard.yaml?ref=0.47.0>
  - path: git::<https://github.com/cloudposse/helmfiles.git@releases/forecastle.yaml?ref=0.47.0>
  - path: git::<https://github.com/cloudposse/helmfiles.git@releases/keycloak-gatekeeper.yaml?ref=0.47.0>
  - path: git::<https://github.com/cloudposse/helmfiles.git@releases/fluentd-elasticsearch-aws.yaml?ref=0.47.0>
  - path: git::<https://github.com/cloudposse/helmfiles.git@releases/kubecost.yaml?ref=0.50.0>

2019-08-05

Tega McKinney

01:16:07 PM

Okay; reason I was asking is I attempted to add /helmfiles/ templates to the configs/prod.tf in reference-architecture. Failed to render templates as it was looking for a few env vars that are not setup (KOPS_CLUSTER_NAME and STAGE). That got me thinking those templates are just reference files for after the initial architecture is setup. Is that the general idea?

Erik Osterman (Cloud Posse)

02:57:37 PM

So the Helmfiles will require a ton of settings

Erik Osterman (Cloud Posse)

02:58:26 PM

We do not have those documented. But if you share specific ones I can help.

Erik Osterman (Cloud Posse)

03:58:04 PM

Usually we set STAGE in the Dockerfile. If you only have one cluster in a stage, then you can also set KOPS_CLUSTER_NAME in the Dockerfile so it’s available globally.

Tega McKinney

05:17:31 PM

Yeah i’m planning to do that. It was more of a reference to that value not being set in the ref-architecture when attempting to add it templates in prod.tfvars. Helmfiles would be a post bootstrap step I’m assuming

SweetOps #geodesic

04:00:07 PM

There are no events this week

Cloud Posse

04:02:33 PM

:zoom: Join us for “Office Hours” every Wednesday 11:30AM (PST, GMT-7) via Zoom.

This is an opportunity to ask us questions about geodesic, get live demos and learn from others using it. Next one is Aug 14, 2019 11:30AM.
Register for Webinar
#office-hours (our channel)

Tim Jones

05:31:03 PM

Hi! I’ve been reviewing the terraform-root-modules resource in order to help bring a little order to the chaos AWS I’ve inherited but I’m having trouble understanding where real human users are managed. I see that the aws/usersseems to be set up for this, but it seems incomplete or something, the welcome.txt references a username var that doesn’t exist, and isn’t used as a template source as far as I can see anyways.

Tega McKinney

05:47:10 PM

@Tim Jones Have you seen the reference-architectures repo? Its leverages the root-modules. It creates your admin users using aws/users root-module

Tim Jones

08:52:20 AM

@Tega McKinney yes but it’s been very much broken with the release of Terraform v0.12

daveyu

11:20:05 PM

I’m having trouble building from geodesic with terraform 0.11 since 0.117.0

daveyu

11:21:34 PM

is there a better way to use both 0.11 and 0.12?

Erik Osterman (Cloud Posse)

11:21:51 PM

Yes, but I am afk

Erik Osterman (Cloud Posse)

11:21:59 PM

@Andriy Knysh (Cloud Posse) can you show Dave

Erik Osterman (Cloud Posse)

11:26:23 PM

Basically install terraform_0.11@cloudposse

Erik Osterman (Cloud Posse)

11:26:41 PM

And write “use terraform 0.11”

Erik Osterman (Cloud Posse)

11:26:52 PM

In your .envrc

daveyu

11:28:06 PM

where/when do I install 0.11? before geodesic 0.117.0, apk add terraform_0.11@cloudposse worked in the Dockerfile

Erik Osterman (Cloud Posse)

11:28:33 PM

Yep, add that to your Dockerfile

Erik Osterman (Cloud Posse)

11:29:24 PM

This way we can support multiple concurrent major/minor versions

daveyu

11:38:59 PM

Yes, but during docker build:

ERROR: unsatisfiable constraints:
  terraform-0.12.0-r0:
    breaks: world[terraform=0.11.14-r0]
The command '/bin/sh -c apk add terraform_0.11@cloudposse terraform@cloudposse==0.11.14-r0' returned a non-zero

Erik Osterman (Cloud Posse)

11:45:55 PM

Show me your Dockerfile

Erik Osterman (Cloud Posse)

11:46:17 PM

You need to remove the second package there

daveyu

11:48:44 PM

dockerfile

Erik Osterman (Cloud Posse)

11:55:55 PM

RUN apk add terraform_0.11@cloudposse==0.11.14-r0 is what you want

Erik Osterman (Cloud Posse)

11:56:58 PM

the long & short of it is that since we upgraded to the alpine:3.10 series, theres been no new 0.11 release, so no package for 0.11 was built under terraform.

Erik Osterman (Cloud Posse)

11:57:14 PM

however, we explicitly build a terraform_0.11 package and a terraform_0.12 package

Erik Osterman (Cloud Posse)

11:57:33 PM

like a python2 and python3 package

Erik Osterman (Cloud Posse)

11:58:00 PM

and from alpine:3.10 , the terraform package will be 0.12.x

Erik Osterman (Cloud Posse)

11:58:42 PM

behind the scenes, we’re installing a symlink to /usr/local/terraform/x.y/bin/terraform that points to /usr/local/bin/terraform-x.y

Erik Osterman (Cloud Posse)

11:59:16 PM

that way when we write use terraform 0.11, we can set PATH=/usr/local/terraform/0.11/bin:$PATH and it will automatically find the correct version of terraform

Erik Osterman (Cloud Posse)

11:59:19 PM

without changing code

Erik Osterman (Cloud Posse)

11:59:33 PM

and not using alias

Andriy Knysh (Cloud Posse)

12:21:21 AM

@dave.yu this is what we used in Dockerfile to install both TF 0.11 and 0.12 under geodesic 0.117

Andriy Knysh (Cloud Posse)

12:21:24 AM

# Install terraform 0.11 for backwards compatibility
RUN apk add terraform_0.11@cloudposse

# Install terraform 0.12
RUN apk add terraform_0.12@cloudposse terraform@cloudposse==0.12.3-r0

Andriy Knysh (Cloud Posse)

12:22:02 AM

then, for the modules that use TF 0.12, we use

use envrc
use terraform 0.12
use tfenv

Andriy Knysh (Cloud Posse)

12:22:23 AM

and for the modules that use TF 0.11

use envrc
use terraform 0.11
use tfenv

2019-08-06

daveyu

04:25:31 PM

thanks @Andriy Knysh (Cloud Posse) @Erik Osterman (Cloud Posse) RUN apk add terraform_0.11@cloudposse was the trick (instead of RUN apk add terraform_0.11@cloudposse terraform@cloudposse==0.11.14-r0)

2019-08-07

Erik Osterman (Cloud Posse)

06:11:29 PM

#office-hours hours starting in 15m https://zoom.us/meeting/register/dd2072a53834b30a7c24e00bf0acd2b8

2019-08-08

Tega McKinney

06:30:07 PM

Has any configured kiam beyond the helmfiles defaults for cross account resource access from kops pods? I’m running into a situation where my cross account policies are not allowing me access and I’m thinking maybe kiam is not properly setup to assume roles

Tega McKinney

07:35:51 PM

I think it’s starting to make sense now. I hadn’t realized kiam-server roles were not setup and no sts:AssumeRole on the masters were configured either.

FYI - this TGIK by Joe Beda helped in that understanding. https://www.youtube.com/watch?v=vgs3Af_ew3c

TGI Kubernetes 070: Assuming AWS roles with kube2iam/kiam

Tega McKinney

02:39:52 AM

@Erik Osterman (Cloud Posse) Any reasons that with the kiam setup, it is essentially using the master nodes role vs creating a kiam-server specific role and allowing it to establish the trust relationship with pod roles?

Erik Osterman (Cloud Posse)

04:40:02 PM

no, but what you describe is a better practice

Erik Osterman (Cloud Posse)

04:40:24 PM

basically, there should be a separate node pool strictly for kiam-server and assumed roles

Tega McKinney

03:44:42 AM

I did not go as far as a separate node pool however I did create the kiam-server assume-role and set assume-role arg on kiam-server instead of it detecting the node role.

Jeremy G (Cloud Posse)

08:15:55 PM

Right now we run the kiam-server on the masters, and we treat the master role created by kops as the kaim-server role. As far as I can see, there is not much added security or convenience in creating a separate kiam-server role until you get to the point of creating separate instances for the kiam servers and giving them instance roles for the kiam-server. In our configuration, anything on the master nodes can directly assume any pod role that kiam can authorize. With a separate kiam-server role, this is still the case, it’s just that there would be an extra intermediate step of assuming the kiam-server role.

Jeremy G (Cloud Posse)

08:20:37 PM

To answer your question @Tega McKinney, the reason we treat the master role like it is the kiam server role is because it is a lot easier. While we will likely do it eventually, it is going to be a lot of work to separate out the kiam server role from the master role in all of our Terraform.

2019-08-09

Erik Osterman (Cloud Posse)

04:41:20 PM

have not tried to accomplish cross account kiam

Erik Osterman (Cloud Posse)

04:41:30 PM

generally we try never to cross account boundaries

2019-08-11

2019-08-12

SweetOps #geodesic

04:00:02 PM

There are no events this week

Cloud Posse

04:04:12 PM

:zoom: Join us for “Office Hours” every Wednesday 11:30AM (PST, GMT-7) via Zoom.

This is an opportunity to ask us questions about geodesic, get live demos and learn from others using it. Next one is Aug 21, 2019 11:30AM.
Register for Webinar
#office-hours (our channel)

2019-08-15

Ryan

11:01:10 PM

Hello all, I’m playing with geodesic and it doesn’t seem to be generating the dynamic cluster correctly. I get output that is example.foo.bar.

$ export CLUSTER_NAME=test.myco.com

$ docker run -e CLUSTER_NAME \
-e DOCKER_IMAGE=cloudposse> -e DOCKER_IMAGE=cloudposse/${CLUSTER_NAME} \
-e DOCKER_TAG=dev \
cloudposse/geodesic:latest -c new-projec> -e DOCKER_TAG=dev \
> cloudposse/geodesic:latest -c new-project | tar -xv -C .
Building project for example.foo.bar...
./
example.foo.bar/Dockerfile
example.foo.bar/Makefile
example.foo.bar/conf/
example.foo.bar/conf/.gitignore

I’m looking through various source, it’s pretty challenging to piece it all together (feels over abstracted). So wondering if I’m missing something.

Ryan

11:02:06 PM

Also new to Terraform so the concepts for how it’s organized is a bit confusing atm

2019-08-16

Erik Osterman (Cloud Posse)

04:44:09 PM

Hey Ryan - sorry - our docs are really out of date.

Erik Osterman (Cloud Posse)

04:45:16 PM

Here’s an example of how we use it: https://github.com/cloudposse/testing.cloudposse.co

cloudposse/testing.cloudposse.co

Example Terraform Reference Architecture that implements a Geodesic Module for an Automated Testing Organization in AWS - cloudposse/testing.cloudposse.co

2019-08-19

SweetOps #geodesic

04:00:07 PM

There are no events this week

Cloud Posse

04:04:22 PM

:zoom: Join us for “Office Hours” every Wednesday 11:30AM (PST, GMT-7) via Zoom.

This is an opportunity to ask us questions about geodesic, get live demos and learn from others using it. Next one is Aug 28, 2019 11:30AM.
Register for Webinar
#office-hours (our channel)

2019-08-22

oscar

07:38:53 PM

Any thoughts on why I receive the following error:

 ✗ . (none) backend ⨠ terraform init
Copying configuration from "git::<https://github.com/cloudposse/terraform-root-modules.git//aws/tfstate-backend?ref=tags/0.35.1>"...

Error: Can't populate non-empty directory

The target directory . is not empty, so it cannot be initialized with the
-from-module=... option.

Latest Geodesic.

oscar

07:39:41 PM

 ✗ . (none) backend ⨠ ls -la
total 16
drwxr-xr-x 2 root root 4096 Aug 22 15:00 .
drwxr-xr-x 3 root root 4096 Aug 22 09:49 ..
-rw-r--r-- 1 root root  380 Aug 22 15:29 .envrc
-rw-r--r-- 1 root root  122 Aug 22 15:00 Makefile
 ⧉  oscar
 ✗ . (none) backend ⨠ cat .envrc
# Import the remote module
export TF_CLI_INIT_FROM_MODULE="git::<https://github.com/cloudposse/terraform-root-modules.git//aws/tfstate-backend?ref=tags/0.35.1>"
export TF_CLI_PLAN_PARALLELISM=2

source <(tfenv)

use terraform 0.12
use tfenv

Andriy Knysh (Cloud Posse)

08:02:26 PM

add export TF_MODULE_CACHE=.module

joshmyers

08:02:44 PM

@oscar It has been discussed here a while back, check through history, as @Andriy Knysh (Cloud Posse) says ^^

joshmyers

08:02:58 PM

hey @Andriy Knysh (Cloud Posse)

oscar

08:03:14 PM

Thanks both, I did actually have a scan through 0.12 and geodesic channel but couldn’t find it.

Andriy Knysh (Cloud Posse)

08:03:46 PM

and in Makefile.tasks, change it to:

Andriy Knysh (Cloud Posse)

08:03:55 PM

-include ${TF_MODULE_CACHE}/Makefile

deps:
	mkdir -p ${TF_MODULE_CACHE}
	terraform init

## Reset this project
reset:
	rm -rf ${TF_MODULE_CACHE}

oscar

08:04:05 PM

Ah you know I can see it now

oscar

08:04:08 PM

19th July

joshmyers

08:04:39 PM

https://github.com/cloudposse/geodesic/pull/500

Direnv with Terraform 0.12 by osterman · Pull Request #500 · cloudposse/geodesic

what Use an empty cache folder to initialize module why Terraform 0.12 no longer allows initialization of folders with even dot files =( Example of how to use direnv with terraform 0.12 usage e…

oscar

08:06:10 PM

Thanks just had a read of that

oscar

08:06:22 PM

Makes sense. I must have missed this this afternoon!

oscar

08:19:37 PM

is source <(tfenv) required?

oscar

08:21:18 PM

@Andriy Knysh (Cloud Posse) I get this from the new Makefile

 ✗ . (none) backend ⨠ make
Makefile:4: *** missing separator.  Stop.
 ⧉  oscar
 ✗ . (none) backend ⨠ make reset
Makefile:4: *** missing separator.  Stop.
 ⧉  oscar
 ✗ . (none) backend ⨠ make deps
\Makefile:4: *** missing separator.  Stop.

Erik Osterman (Cloud Posse)

08:59:15 PM

Makefile:4: *** missing separator. Stop. is usually causes by spaces (replace with tabs)

2019-08-23

oscar

08:16:38 AM

Thanks =] All working now

2019-08-26

SweetOps #geodesic

04:00:08 PM

There are no events this week

Cloud Posse

04:01:56 PM

:zoom: Join us for “Office Hours” every Wednesday 11:30AM (PST, GMT-7) via Zoom.

This is an opportunity to ask us questions about geodesic, get live demos and learn from others using it. Next one is Sep 04, 2019 11:30AM.
Register for Webinar
#office-hours (our channel)

2019-08-27

joshmyers

02:52:20 PM

How are folks doing vanity domains using geodesic / ref arch ?

joshmyers

02:52:46 PM

Currently the parent dns name (foo.com) lives in the root account, and by default, users don’t have admin in root

joshmyers

02:53:07 PM

So no one can actually delegate example.foo.com down to example.prod.foo.com

joshmyers

02:56:51 PM

Easily solvable but wondering what folks are doing

Erik Osterman (Cloud Posse)

03:17:30 PM

by design, the branded domain is provisioned in the root account, since it will contain references to the service discovery domain in any account.

Erik Osterman (Cloud Posse)

03:17:43 PM

e.g. corp/shared account, prod account, data account, etc.

Erik Osterman (Cloud Posse)

03:17:59 PM

branded = vanity

Erik Osterman (Cloud Posse)

03:20:37 PM

terraform {
  required_version = "~> 0.12.0"

  backend "s3" {}
}

provider "aws" {
  version = "~> 2.17"

  assume_role {
    role_arn = var.aws_assume_role_arn
  }
}

variable "aws_assume_role_arn" {
  type = string
}

data "aws_route53_zone" "ourcompany_com" {
  # Note: The zone name is the domain name plus a final dot
  name = "ourcompany.com."
}

# allow_overwrite lets us take over managing entries that are already there
# use sparingly in live domains unless you know what's what
resource "aws_route53_record" "apex" {
  allow_overwrite = true
  zone_id         = data.aws_route53_zone.ourcompany_com.zone_id
  type            = "A"
  ttl             = 300
  records         = ["1.2.3.4"]
}

resource "aws_route53_record" "www" {
  allow_overwrite = true
  zone_id         = data.aws_route53_zone.ourcompany_com.zone_id
  type            = "CNAME"
  ttl             = 300
  records         = ["app.prod.ourcompany.org"]

joshmyers

03:23:36 PM

Right, but using the ref arch, even admin users in the root account don’t have access to do anything to route53 resources

Alex Siegman

03:56:44 PM

They should if they assume the right role.

joshmyers

04:31:18 PM

Which role is that?

Jeremy G (Cloud Posse)

07:06:11 AM

namespace-root-admin. In the ref arch you pick a namespace for all your accounts, like cpco, and then the role in your prod environment is cpco-prod-admin. So you also get a cpco-root-admin role that you can assume to do whatever you need to do in the root account.

2019-08-28

oscar

07:33:34 AM

Yes similar to above I would create a R53 role in root account and assume the role using a second provider block and then for the r53 record resource use the root account t provider.

joshmyers

08:05:29 AM

Why not add route53 perms to the root admin group that already exists?

joshmyers

08:59:50 AM

Currently the users in root admin group, in the root account can only do IAM stuff for their own users

joshmyers

09:48:29 AM

Ah, nevermind me

oscar

10:21:33 AM

What was the issue/thing for future reference?

joshmyers

10:23:29 AM

Me being stupid and missing the admin role

Erik Osterman (Cloud Posse)

06:35:38 PM

#office-hours starting now! join us here https://zoom.us/s/508587304

2019-08-30

oscar

09:32:40 AM

@Erik Osterman (Cloud Posse) what is to stop a team just having one Geodesic module and then having nested .envrc in /conf to simulate global variables? The only disadvantage I spot is the swapping of AWS-crednentials which could be easily solved.

Erik Osterman (Cloud Posse)

05:40:27 PM

It can work that way

Erik Osterman (Cloud Posse)

05:40:34 PM

Nothing stopping anyone.

Erik Osterman (Cloud Posse)

05:41:03 PM

basically it just comes down to our convention; how to manage multiple different tools pinned to different versions in different stages.

Erik Osterman (Cloud Posse)

05:41:21 PM

so the problem with the one container is pretty much all tools have to be the same version

Erik Osterman (Cloud Posse)

05:41:34 PM

or you need to use a version manager for every tool

Erik Osterman (Cloud Posse)

05:41:57 PM

most OSes don’t make it easy to have multiple versions of software installed at the same time

Erik Osterman (Cloud Posse)

05:42:20 PM

terraform is one tool that needs to be versioned

Erik Osterman (Cloud Posse)

05:42:46 PM

but helm is very strict about versions of client/server matching. not pinning it means you force upgrading everywhere.

Erik Osterman (Cloud Posse)

05:43:50 PM

recently upgraded alpine which upgraded git. then we saw the helm-git provider break because some flags it depends on were broken

Erik Osterman (Cloud Posse)

05:44:09 PM

recently we upgraded variant and that broke some older scripts

Erik Osterman (Cloud Posse)

05:44:38 PM

my point is just that having one shell for all environments makes it difficult to be experimental while not also breaking prod toolchain

Erik Osterman (Cloud Posse)

05:45:48 PM

also, 99.9% of companies don’t worry about this. i basically see most companies operate infrastructure in a monorepo and don’t try to strictly version their toolchain the way we do in a way that allows versions of software to be promoted. we’ve just been really strict about it

oscar

09:33:31 AM

e.g. /conf/prod/.envrc (all the account specific vars)

BAU: /conf/prod/eu-west-1/terraform/my_project/* /conf/prod/eu-west-2/terraform/my_project/*

oscar

09:35:03 AM

Likewise an interesting conversation came up internally yesterday:

In the same way we would promote code in git dev –> master… how would one promote new variables etc along Geodesic /conf? Or new version pegging in the .envrc from-module line? The only solution to make it easy is have all the Geodesic modules in one repo so that it would be easily spotted if you missed updating one environment in the PR.

oscar

09:35:11 AM

I feel I have not seen the Cloudposse way

Erik Osterman (Cloud Posse)

05:46:23 PM

it’s not the way we do it, true… but that doesn’t make it wrong

Erik Osterman (Cloud Posse)

05:46:31 PM

most companies do structure it this way

Erik Osterman (Cloud Posse)

05:46:36 PM

this is the way terraform recommends it

Erik Osterman (Cloud Posse)

05:46:40 PM

this is the way terragrunt recommends it

Erik Osterman (Cloud Posse)

05:47:05 PM

at cloudposse, we’ve taken the convention of “share nothing” down to the repo

Erik Osterman (Cloud Posse)

05:47:11 PM

which opens up awesome potential

Erik Osterman (Cloud Posse)

05:47:20 PM

it means the git history isn’t even shared between stages

Erik Osterman (Cloud Posse)

05:47:28 PM

it means webhooks aren’t shared between stages

Erik Osterman (Cloud Posse)

05:47:40 PM

it means github team permissions aren’t shared between stages

Erik Osterman (Cloud Posse)

05:47:42 PM

it means one PR can never modify more than one stage at a time

Erik Osterman (Cloud Posse)

05:48:33 PM

forcing the convention that you strictly test before rollout to production

Erik Osterman (Cloud Posse)

05:48:47 PM

the mono repo convention requires self-discipline, but it doesn’t enforce it.

tamsky

05:36:15 AM

Nominating this thread (and Erik’s additional “we don’t share…” statements from later this same day) for pinned status.

joshmyers

12:16:45 PM

but in geodesic you have all your env vars because you are wrapping geodesic in your env specific container, no?

oscar

12:50:42 PM

Not sure I follow.. but normally you would have acme.stage.com but I wonder why not have: acme.com

and then: /conf/ ../dev/.envrc (with equivalent variables of Dockerfile) /prod/.envrc (with equivalent variables of Dockerfile)

oscar

12:51:20 PM

It’s a bit weird but something like that came up in a meeting when I was introducing Geodesic to another team and I didn’t really have an answer other than “that isn’t the way”, but it could work without losing any features so…?

Erik Osterman (Cloud Posse)

05:49:21 PM

yep, you can totally do it this way

Erik Osterman (Cloud Posse)

05:49:30 PM

where you have one folder per stage

Erik Osterman (Cloud Posse)

05:49:37 PM

we do something very similar for regions

Erik Osterman (Cloud Posse)

05:49:58 PM

/conf/us-east-1/eks and /conf/us-west-2/eks

Erik Osterman (Cloud Posse)

05:50:18 PM

where the us-east-1 folder has an .envrc file with export REGION=us-east-1

Erik Osterman (Cloud Posse)

05:50:27 PM

this could just as well be modified to

Erik Osterman (Cloud Posse)

05:50:36 PM

/conf/prod/us-east-1/eks

Erik Osterman (Cloud Posse)

05:50:56 PM

where the /conf/prod/.envrc file has STAGE=prod

Jeremy G (Cloud Posse)

07:16:06 AM

It’s a matter of “what are you trying to solve for?” It is, in fact, tedious to roll out changes to 4 environments under the reference architecture, in that you need to checkout, branch, commit, pull request, merge, and release 4 times. With everything in 1 repo you can do less with Git, but you still need to apply changes in 4 folders, but now your PRs could be for any environment, so following the evolution of just one environment becomes much harder. Having a 1-to-1 mapping of Geodesic shells to environments just makes it a lot easier to keep everything straight.

Tega McKinney

12:07:36 PM

@Erik Osterman (Cloud Posse) Given the example above, do your REGION and AWS_REGION environment variables not get overwritten when you cd into eks directory?

I have a similar structure but my environment variables are being overwritten.

I have a structure with /conf/<tenant>/eu-west-1/platform-dns where I set /conf/<tenant>/eu-west-1/.envrc to REGION=eu-west-1 however region is being overwritten back to eu-central-1 when I change directory into /conf/<tenant>/eu-west-1/platform-dns

Jeremy G (Cloud Posse)

12:02:42 AM

@Tega McKinney You need to put source_up as the first line of your .envrc file in order to pick up setting in parent directories. Otherwise direnv only uses the settings in the current directory to override whatever is in your environment, which means you get different results if you

cd /conf
cd eu-west-1
cd platform-dns

than if you just

cd /conf
cd eu-west-1/platform-dns

oscar

01:17:44 PM

More of a probing question than ‘how to do this’. Curious to what other’s think, specifically Erik and Jeremy

Alex Siegman

02:33:58 PM

The first thing I know would be “broken” is role assumption to AWS. It’d be really easy to assume role in dev, do some stuff, then not be able to apply a different stage because you’re still the the dev account.

I think the separation can be healthy even if it’s not DRY. It’s like the age-old “do we deploy centralized logging per stage or put all stages in one place” kind of argument. There’s pros and cons to both, but if you’re trying to prevent leakage between environments, why not prevent it at the tool level too

The separation between stages also allows you to test changes to your tool chain in safe environments, rather than break the production one.

Erik Osterman (Cloud Posse)

05:51:18 PM

I think we could even get around the role assumption piece

Erik Osterman (Cloud Posse)

05:51:24 PM

the role assumption works on ENVs as well

Erik Osterman (Cloud Posse)

05:51:39 PM

so if you assume the role in the proper folder, it assume the proper role

Erik Osterman (Cloud Posse)

05:51:59 PM

but if the user then goes to /conf/dev while being assumed as prod, GOOD LUCK!!!

Erik Osterman (Cloud Posse)

05:52:27 PM

again, most companies aren’t strict about enforcing this stuff. they get away with it. we just make sure it’s that much harder

oscar

06:37:51 PM

So this can be done easily and safely. You have aws profile set in the conf .envrc and it just assumes the role based on that, but my company uses AD to authenticate to AWS so AWS vault doesnt work so it is done external. Bit vague and can go info more technical detail if anyone curious.

Erik Osterman (Cloud Posse)

06:38:34 PM

we’ve use it with SSO too (okta)

Erik Osterman (Cloud Posse)

06:38:47 PM

You have aws profile set in the conf .envrc and it just assumes the role based on that

Erik Osterman (Cloud Posse)

06:38:55 PM

it’s just hard to enforce what happens if they leave a folder

Erik Osterman (Cloud Posse)

06:39:22 PM

you need to ensure that in transition from /conf/prod to /conf/dev they still don’t have their prod role assumed

oscar

06:41:45 PM

That’s true.

oscar

06:42:00 PM

O Rly how do yo do SSO (azure ad) and aws cault???

Erik Osterman (Cloud Posse)

06:50:41 PM

you don’t use aws-vault

Erik Osterman (Cloud Posse)

06:50:45 PM

aws-vault is one way to do auth

Erik Osterman (Cloud Posse)

06:50:50 PM

aws-okta is another

Erik Osterman (Cloud Posse)

06:50:55 PM

aws-keycloak

Erik Osterman (Cloud Posse)

06:52:04 PM

basically, a different cli is used

Erik Osterman (Cloud Posse)

06:52:04 PM

https://github.com/Versent/saml2aws

Versent/saml2aws

CLI tool which enables you to login and retrieve AWS temporary credentials using a SAML IDP - Versent/saml2aws

Erik Osterman (Cloud Posse)

06:52:21 PM

https://github.com/segmentio/aws-okta

segmentio/aws-okta

aws-vault like tool for Okta authentication. Contribute to segmentio/aws-okta development by creating an account on GitHub.

oscar

07:15:45 PM

Ahhh yes. Thank you. I’ve been using aws azure sso login from npm

oscar

07:27:36 PM

Which of these do you use and prefer btw?

Erik Osterman (Cloud Posse)

09:44:30 PM

I have only used aws-okta and it works really well

joshmyers

02:38:56 PM

Yup, makes CI/CD a bit cleaner too

Alex Siegman

02:44:53 PM

I also like that the toolchain follows the same “TRUST THE ENVIRONMENT” we yell to our ~~~devs~~~elves for 12-factor style apps, but maybe not everybody follows 12-factor app style.

Erik Osterman (Cloud Posse)

05:54:58 PM

on the one hand, I’m jealous of companies that take the monorepo approach to infrastructure. you can easily bring up and tear down your entire environment. You can open PRs that fix problems across all environments in one fell swoop. You can easily setup cross account peering because you control all accounts in a single session. You can do all sorts of automation.

oscar

06:39:51 PM

And this is what my team encouraged me to do. They were saying “why have N conf directories with all the little components when you can have 3: application, aws account, and shared services”

My answer was: That’s not the way. This very monolithic. We want flexibility.

Then they said; We dont want flexibility. We want to ensure all environments are the same. What if someone forgets to update the dev project etc

Erik Osterman (Cloud Posse)

06:42:48 PM

ya, guess they just want to optimize for different things.

Erik Osterman (Cloud Posse)

06:47:34 PM

these are often opinions strongly held by an organization. they are often influenced by how they got to where they are today. these strongly held opinions are not easily changed because most of the people who were involved in getting the organization to where they are today were the ones who made them.

just like it’s difficult for us (cloudposse) to get our head around how we would manage infrastructure in that way. It’s not an uncommon belief and many do, just we see all the problems that go along with that too.

Erik Osterman (Cloud Posse)

06:48:12 PM

one thing i struggle with is a companies “prod” account almost never equals their “staging” account

Erik Osterman (Cloud Posse)

06:48:22 PM

they might run 4 clusters in 4 regions in prod

Erik Osterman (Cloud Posse)

06:48:27 PM

but they run one region in staging

Erik Osterman (Cloud Posse)

06:48:46 PM

they run multi demo environments in staging, but none in prod

Erik Osterman (Cloud Posse)

06:49:12 PM

they run shared services in one account (like Jenkins, Keycloak, etc), yet don’t have another “staging” account for those shared services

Erik Osterman (Cloud Posse)

06:49:40 PM

so I instead argue we want to ensure the same code pinned at some version runs in some account

Erik Osterman (Cloud Posse)

06:49:46 PM

we want some assurances that that was tested

Erik Osterman (Cloud Posse)

06:49:54 PM

but the way it runs in a particular stage isn’t the same

oscar

07:17:35 PM

Mmm it makes sense. This is deffo a topic for next wednesday. I’ll try to prep some more specific examples and files structures so we can all cross examine.

Erik Osterman (Cloud Posse)

07:23:00 PM

yea for sure

Erik Osterman (Cloud Posse)

07:23:24 PM

also, willing to explore this in a deeper session

Erik Osterman (Cloud Posse)

07:23:35 PM

i’d like to offer this as an alternative strategy for companies who prefer it

Erik Osterman (Cloud Posse)

07:24:10 PM

it’ll definitely appeal more to the terragrunt crowd (as well)

Erik Osterman (Cloud Posse)

05:55:59 PM

but this also is freggin scary. i think it’s optimizing for the wrong use-case where you start from scratch. i think it’s better to optimize for day to day operations and stability.

Erik Osterman (Cloud Posse)

05:56:15 PM

also, what i struggle with is where do you draw the line?

Erik Osterman (Cloud Posse)

05:56:39 PM

i’m sure most of the engineers would agree “share nothing” is the right approach

Erik Osterman (Cloud Posse)

05:57:22 PM

but despite that, tools, state buckets, repos, dns zones, accounts, webhooks, ci/cd, etc are all shared.

Erik Osterman (Cloud Posse)

05:59:02 PM

we’ve taken the (relatively speaking) extreme position to truly share nothing.

Erik Osterman (Cloud Posse)

05:59:11 PM

we don’t share the tools, they are in different containers.

Erik Osterman (Cloud Posse)

05:59:18 PM

we don’t share the state buckets, they are in different accounts.

Erik Osterman (Cloud Posse)

05:59:28 PM

we don’t share the repos, each account corresponds to a repo

Erik Osterman (Cloud Posse)

05:59:40 PM

we don’t share DNS zones. each account has it’s own service discovery domain

Erik Osterman (Cloud Posse)

05:59:55 PM

we don’t share webhooks, because each account has it’s own repo

Erik Osterman (Cloud Posse)

06:00:18 PM

we don’t share CI/CD (for infra) because each account has it’s own atlantis, and each atlantis receives it’s own webhooks

Erik Osterman (Cloud Posse)

06:00:24 PM

etc…

Andriy Knysh (Cloud Posse)

06:01:18 PM

yes, all of that looks like very difficult to setup and maintain from the start, but in the end it’s much easier to manage security and access without jumping through many hoops (and still having holes)

Andriy Knysh (Cloud Posse)

06:03:35 PM

in that share nothing architecture, the only point of access to an account is to add a user to the identity account and allow it to assume a role with permissions to access resources in the other accounts

Andriy Knysh (Cloud Posse)

06:05:26 PM

note that we still share all the code (from terraform modules, helmfiles, and the root-modules catalog) in all account repos, so no repetition there (we load them dynamically using tags with semantic versioning)

Andriy Knysh (Cloud Posse)

06:05:47 PM

we just have different settings (vars, ENvs, etc.) for each account

Andriy Knysh (Cloud Posse)

06:16:37 PM

as you can see for example here https://github.com/cloudposse/testing.cloudposse.co/tree/master/conf/ecs, there is no code in the account repos, just settings (not secrets)

cloudposse/testing.cloudposse.co

Example Terraform Reference Architecture that implements a Geodesic Module for an Automated Testing Organization in AWS - cloudposse/testing.cloudposse.co

Andriy Knysh (Cloud Posse)

06:17:08 PM

all code (logic) is shared and open (if it’s your company secret, make the repo private)

oscar

06:41:28 PM

Yes that example is similar to how were doing it now. We have a few levels of abstraction going on. I can go over this next wednesday for 10-15 minutes d