SweetOps #kubernetes for February, 2020

Archive: https://archive.sweetops.com/kubernetes/

2020-02-02

Chris Fowles

So i had a thought on the train this morning when I was thinking of writing some yoman generators for templating out gitops repos for flux so that teams don’t need to know exactly where to put what in which file.

Chris Fowles

10:52:03 PM

And then i thought “you know what’s really good at templating yaml? helm!”

Chris Fowles

10:53:12 PM

Is it crazy to use helm template to generate repos for flux (which will create flux helm crds)? I can’t see anything really stupid at face value - but feels like the kind of idea that requires a sanity check

Zachary Loeber

11:41:33 PM

I wouldn’t think so. Jenkins-x does exactly that if you opt to not use tiller for deployments I believe.

2020-02-03

Chris Fowles

05:42:13 AM

writing helm charts and wanting some instant feedback ended up with this to check my rendered templates: watch -d 'helm template my-chart-dir | kubeval && helm template my-chart-dir'

Erik Osterman (Cloud Posse)

05:48:44 AM

nice trick

2020-02-04

Roderik van der Veer

04:48:49 PM

I’m a bit stuck with GKE persistent disks that cannot be found:

│  Type     Reason              Age   From                     Message                                                                                                                                                                                                                                                                                │
│  ----     ------              ----  ----                     -------                                                                                                                                                                                                                                                                                │
│  Normal   Scheduled           73s   default-scheduler        Successfully assigned coral-firefly/coral-firefly-ipfs-ipfs-0 to gke-shared-europe-main-3d862732-mq45                                                                                                                                                                                  │
│  Warning  FailedAttachVolume  73s   attachdetach-controller  AttachVolume.Attach failed for volume "pvc-c018de57-476a-11ea-af39-42010a8400a9" : GCE persistent disk not found: diskName="gke-shared-europe-9030-pvc-c018de57-476a-11ea-af39-42010a8400a9" zone="europe-west1-b"    

Roderik van der Veer

04:49:18 PM

while the disk is just there in the google console

Roderik van der Veer

04:49:48 PM

https://www.dropbox.com/s/nkyv4uhrad4p6zw/Screenshot%202020-02-04%2017.49.43.png?dl=0

Roderik van der Veer

04:50:25 PM

Not sure where to look for an error message i can actually use

Roderik van der Veer

04:51:01 PM

the pvc even says that it is bound

Roderik van der Veer

04:51:33 PM

ipfs-storage-coral-firefly-ipfs-ipfs-0               Bound    pvc-c018de57-476a-11ea-af39-42010a8400a9   10Gi       RWO            standard       27m

Roderik van der Veer

04:54:06 PM

anyone have an idea where to look?

Roderik van der Veer

04:55:53 PM

hmmm, seems like provisioning too many disks at the same time is the culprit. is there a way to configure that in k8s, helm or helmfile?

Erik Osterman (Cloud Posse)

05:39:12 PM

Are you running helmfiles serially?

Roderik van der Veer

10:21:22 PM

i recently moved to a lot more parallel to speed it up (we deploy clusters + services as a service, it needs to be as fast as possible)

2020-02-06

rms1000watt

11:31:01 PM

https://www.reddit.com/r/Terraform/comments/axp3tv/run_terraform_under_kubernetes_using_an_operator/ <— nice!!!!

@Erik Osterman (Cloud Posse) do you have a recommendation for using a k8s operator to manage aws resources? https://github.com/amazon-archives/aws-service-operator has been archived and uses cfn under the covers. Lol. Have you had luck with this or anything else?

Cameron Boulton

10:26:10 PM

@rms1000watt @Erik Osterman (Cloud Posse) I’m late to the convo but:

I think aws-service-operator was just renamed (old one archived), not that the project was canceled: https://github.com/aws/aws-service-operator-k8s
Crossplane looks interesting: https://crossplane.io/

aws/aws-service-operator-k8s

The AWS Service Operator (ASO) manages AWS services from Kubernetes - aws/aws-service-operator-k8s

Crossplane

The open source multicloud control plane.

rms1000watt

07:40:02 PM

crossplane looks interesting for sure

Erik Osterman (Cloud Posse)

12:23:22 AM

You see this too? https://github.com/hashicorp/terraform-k8s/ https://www.hashicorp.com/blog/creating-workspaces-with-the-hashicorp-terraform-operator-for-kubernetes/

hashicorp/terraform-k8s

Terraform Operator for Kubernetes. Contribute to hashicorp/terraform-k8s development by creating an account on GitHub.

Creating Workspaces with the HashiCorp Terraform Operator for Kubernetes attachment image

We are pleased to announce the alpha release of HashiCorp Terraform Operator for Kubernetes. The new Operator lets you define and create infrastructure as code natively in Kubernetes by making calls to Terraform Cloud.

Erik Osterman (Cloud Posse)

12:23:48 AM

(official one by hashicorp - but only works in combination with terraform cloud)

Erik Osterman (Cloud Posse)

11:41:15 PM

@rms1000watt not from first hand accounts, however, there have been a number of new operators to come out to address this lately

Erik Osterman (Cloud Posse)

11:41:36 PM

I did see the aws-service-operator was deprecated the other day when I was checking something else out.

Erik Osterman (Cloud Posse)

11:41:49 PM

sec - I think we talked about it recently in office hours

Erik Osterman (Cloud Posse)

11:42:59 PM

https://sweetops.slack.com/archives/CHDR1EWNA/p1579117464023000

kubeform/kubeform

Kubernetes CRDs for Terraform providers. Contribute to kubeform/kubeform development by creating an account on GitHub.

Erik Osterman (Cloud Posse)

11:43:38 PM

What’s interesting “outwardly speaking” regarding kubeform is that it is by the people behind appscode

Erik Osterman (Cloud Posse)

11:43:55 PM

appscode is building operators for managing all kinds of durable/pesistent services under kubernetes

Erik Osterman (Cloud Posse)

11:44:23 PM

like https://kubedb.com/

KubeDB

KubeDB by AppsCode simplifies and automates routine database tasks such as provisioning, patching, backup, recovery, failure detection, and repair for various popular databases on private and public clouds

Erik Osterman (Cloud Posse)

11:44:52 PM

I don’t have any first hand accounts though of using it or other appscode services

rms1000watt

11:45:33 PM

Ah, very nice reference

Erik Osterman (Cloud Posse)

11:45:34 PM

Rancher has 2 projects they are working on.

Erik Osterman (Cloud Posse)

11:45:46 PM

Both are alpha grade and don’t know how practical

Erik Osterman (Cloud Posse)

11:45:48 PM

https://github.com/rancher/terraform-controller

rancher/terraform-controller

Use K8s to Run Terraform. Contribute to rancher/terraform-controller development by creating an account on GitHub.

Erik Osterman (Cloud Posse)

11:46:14 PM

https://github.com/rancher/terraform-operator

rancher/terraform-controller

Use K8s to Run Terraform. Contribute to rancher/terraform-controller development by creating an account on GitHub.

Erik Osterman (Cloud Posse)

11:46:24 PM

ohhhhh

Erik Osterman (Cloud Posse)

11:46:27 PM

they renamed it

Erik Osterman (Cloud Posse)

11:46:28 PM

never mind.

rms1000watt

11:46:33 PM

My feelings were hurt by rancher’s product in 2016. I’m sure they’ve improved vastly since then though, lol.

rms1000watt

11:46:40 PM

Yeah, II’m curious about the kubeform one

rms1000watt

11:46:44 PM

will probably give that a try

rms1000watt

11:47:13 PM

very nice references man

Erik Osterman (Cloud Posse)

11:47:18 PM

https://github.com/danisla/terraform-operator

danisla/terraform-operator

Kubernetes custom controller for operating terraform - danisla/terraform-operator

rms1000watt

11:47:37 PM

this is why cloudposse slack is best-in-class

rms1000watt

11:47:42 PM

Yeah, I saw that one too

rms1000watt

11:47:49 PM

Just have no clue what people are actually usinig

Erik Osterman (Cloud Posse)

11:49:35 PM

Please report back if you get a chance to prototype/poc one of them.

Erik Osterman (Cloud Posse)

11:49:47 PM

I’ve wanted to do this for some time.

Erik Osterman (Cloud Posse)

11:50:18 PM

I think it could simplify deployments in many ways by using the standard deployment process with kubernetes

rms1000watt

11:50:20 PM

this is actually a part of a bigger discussion around ephemeral environments

Erik Osterman (Cloud Posse)

11:50:26 PM

rms1000watt

11:50:38 PM

all the helmfile stuff is super straight forward

rms1000watt

11:50:46 PM

it’s the infrastructure beyond it tho

Erik Osterman (Cloud Posse)

11:50:53 PM

so while I love terraform, terraform is like a database migration tool that doesn’t support transactions

Erik Osterman (Cloud Posse)

11:51:06 PM

so when stuff breaks, no rollbacks

rms1000watt

11:51:13 PM

heh, yea

Erik Osterman (Cloud Posse)

11:51:16 PM

while webapps are usually trivial to rollback

Erik Osterman (Cloud Posse)

11:51:35 PM

so coupling these two might cause more instability where before there was none.

Erik Osterman (Cloud Posse)

11:51:58 PM

also, this is what was kind of nice (perhaps) with the aws-service-operator approach deploying cloudformation since cloudformation fails more gracefully

Erik Osterman (Cloud Posse)

11:52:34 PM

did you listen to this weeks office hours recording?

rms1000watt

09:52:06 PM

i havent

Erik Osterman (Cloud Posse)

11:53:16 PM

I think it’s relevant to the ephemeral preview environments conversation, which is also one of the reasons were exploring terraform workspaces more and more as one of the tricks to help solve it.

rms1000watt

09:52:18 PM

yeah, i was considering terraform workspaces

2020-02-07

johncblandii

02:52:56 PM

Anyone address kubelet-extra-args with eks node groups?

johncblandii

03:18:39 PM

https://github.com/aws/containers-roadmap/issues/596

[EKS] [request]: Managed Node Groups Custom Userdata support · Issue #596 · aws/containers-roadmap

Community Note Please vote on this issue by adding a reaction to the original issue to help the community and maintainers prioritize this request Please do not leave "+1" or "me to…

johncblandii

03:38:18 PM

in short, i’m trying to build docker images on our new jenkins stack built 100% with node groups now and the docker builds fail due to networking

johncblandii

03:38:20 PM

[0mErr:1 <http://security.debian.org/debian-security> buster/updates InRelease
  Temporary failure resolving 'security.debian.org'

johncblandii

04:04:54 PM

FYI: https://github.com/aws/containers-roadmap/issues/596

johncblandii

05:47:24 PM

@Erik Osterman (Cloud Posse) @Andriy Knysh (Cloud Posse) I thought we could set the extra args for enabling the docker bridge, but it seems like that’s not how the https://github.com/awslabs/amazon-eks-ami/blob/master/files/bootstrap.sh#L341 is setup

awslabs/amazon-eks-ami

Packer configuration for building a custom EKS AMI - awslabs/amazon-eks-ami

Erik Osterman (Cloud Posse)

05:48:59 PM

Are you doing docker builds under jenkins on kubernetes?

johncblandii

05:49:07 PM

yuppers

johncblandii

05:49:34 PM

planning to look at img, etc again, but the new jenkins needs to match the old for now

Erik Osterman (Cloud Posse)

05:49:44 PM

aha

Erik Osterman (Cloud Posse)

05:50:04 PM

yea, b/c I think getting away from dnd or mounting docker sock is a necessity

Erik Osterman (Cloud Posse)

05:50:38 PM

there’s seemingly a dozen ways now to build images without docker daemon.

Erik Osterman (Cloud Posse)

05:51:04 PM

are you mounting the host docker socket?

johncblandii

05:51:06 PM

agreed. that’s the goal in an upcoming sprint

johncblandii

05:51:08 PM

yuppers

johncblandii

05:51:16 PM

they build fine. they can’t communicate externally

johncblandii

05:51:50 PM

[0mErr:1 <http://security.debian.org/debian-security> buster/updates InRelease
  Temporary failure resolving 'security.debian.org'

johncblandii

05:52:51 PM

simple pipeline:

pipeline {
   agent {
       label 'xolvci'
   }

   stages {
      stage('docker') {
         steps {
            container('jnlp-slave') {
                writeFile file: 'Dockerfile', text: '''FROM openjdk:8-jre
                    RUN apt update && apt upgrade -y && apt install -y libtcnative-1'''
                script {
                    docker.build('jbtest:latest')
                }
            }
         }
      }
   }
}

Erik Osterman (Cloud Posse)

05:52:57 PM

hrmmm but they can talk to other containers in the cluster?

johncblandii

05:53:11 PM

seems so

Erik Osterman (Cloud Posse)

05:53:45 PM

do you also have artifactory?

Erik Osterman (Cloud Posse)

05:54:08 PM

(thinking you could set HTTP_PROXY env)

Erik Osterman (Cloud Posse)

05:54:26 PM

and get caching to boot!

johncblandii

05:54:48 PM

yes on artifactory

Erik Osterman (Cloud Posse)

05:55:05 PM

artifactory runnign in the cluster?

johncblandii

05:55:13 PM

not yet

johncblandii

05:56:36 PM

what would i set HTTP_PROXY to? cluster endpoint?

Erik Osterman (Cloud Posse)

05:57:11 PM

some other service in the cluster (assuming that the docker build can talk to services in the cluster)

Erik Osterman (Cloud Posse)

05:57:19 PM

e.g. deploy squid proxy or artifactory

Erik Osterman (Cloud Posse)

05:57:59 PM

that said, I did not have this problem when I did my prototype on digital ocean, so I am guessing whatever you are encountering is due to the EKS networkign

johncblandii

05:58:17 PM

yeah, it is def’ eks. it isn’t an issue when using rancher either

Erik Osterman (Cloud Posse)

05:58:41 PM

Erik Osterman (Cloud Posse)

05:58:56 PM

is rancher using ENIs?

johncblandii

05:59:10 PM

not sure. just tinkered w/ it mostly

johncblandii

06:00:57 PM

might can downgrade to 0.10.0 since it supports inline in userdata.tpl

johncblandii

06:46:30 PM

worst day to fool w/ getting disparate nodes on a cluster.

johncblandii

06:46:33 PM

lol

Erik Osterman (Cloud Posse)

06:53:30 PM

yea, trying to get this working with managed node groups might be wishful thinking

johncblandii

08:39:03 PM

yeah, i ditched that. i’m back for this group to a custom one with https://github.com/cloudposse/terraform-aws-eks-cluster

cloudposse/terraform-aws-eks-cluster

Terraform module for provisioning an EKS cluster. Contribute to cloudposse/terraform-aws-eks-cluster development by creating an account on GitHub.

johncblandii

08:39:15 PM

AWS support said it literally is not possibly with node groups

Andriy Knysh (Cloud Posse)

08:50:06 PM

are you back to using terraform-aws-eks-workers?

johncblandii

08:50:25 PM

for this one group of workers, yes

johncblandii

08:50:38 PM

it isn’t connecting, though

johncblandii

08:51:13 PM

i say that and it connects

johncblandii

08:51:14 PM

lol

johncblandii

08:51:52 PM

it wouldn’t work with the --kubelet-extra-args='--node-labels=node_type=cloudbees-jobs', though

johncblandii

08:52:00 PM

specifically:

bootstrap_extra_args          = "--enable-docker-bridge=true --kubelet-extra-args='--node-labels=node_type=cloudbees-jobs'"

johncblandii

08:55:11 PM

manually tweaked the launch template to not use = and that seemed to work for the bridge

johncblandii

08:55:16 PM

trying again w/ the extra args

johncblandii

08:57:21 PM

ugh…yup. frickin’ =

johncblandii

08:57:32 PM

bootstrap_extra_args          = "--enable-docker-bridge true --kubelet-extra-args '--node-labels=node_type=cloudbees-jobs'"

johncblandii

08:57:44 PM

johncblandii

08:57:52 PM

bootstrap_extra_args          = "--enable-docker-bridge=true --kubelet-extra-args='--node-labels=node_type=cloudbees-jobs'"

johncblandii

08:57:59 PM

^ booooo

Andriy Knysh (Cloud Posse)

09:01:18 PM

have you seen this https://kubedex.com/90-days-of-aws-eks-in-production/

90 days of AWS EKS in Production - kubedex.com attachment image

Come and read 90 days of AWS EKS in Production on Kubedex.com. The number one site to Discover, Compare and Share Kubernetes Applications.

Andriy Knysh (Cloud Posse)

09:06:20 PM

and yes, you can connect managed nodes (terraform-aws-eks-node-group), unmanaged nodes (terraform-aws-eks-workers) and Fargate nodes (terraform-aws-eks-fargate-profile) to the same cluster (terraform-aws-eks-cluster)

Andriy Knysh (Cloud Posse)

09:06:47 PM

an example here https://github.com/cloudposse/terraform-aws-eks-fargate-profile/blob/master/examples/complete/main.tf

cloudposse/terraform-aws-eks-fargate-profile

Terraform module to provision an EKS Fargate Profile - cloudposse/terraform-aws-eks-fargate-profile

johncblandii

09:07:35 PM

yeah, got managed and unmanaged now. waiting on fargate in us-west-2

johncblandii

09:07:59 PM

i saw a link to that 90 days earlier. gonna peep

johncblandii

09:13:20 PM

annnnnnnnd the builds are working

johncblandii

09:13:32 PM

phew. come on saturday!

johncblandii

09:13:39 PM

come here

2020-02-10

rms1000watt

11:12:15 PM

Thought experiment..

Is there a fundamental difference between slow+smart rolling deployment vs. canary deployment?

rms1000watt

11:15:00 PM

Curious if I can hack the health checks of a rolling deployment to convert the behavior into a canary deployment

Erik Osterman (Cloud Posse)

11:40:29 PM

Canary can be restricted (e.g. with feature flags) to a segment of users.

Erik Osterman (Cloud Posse)

11:41:05 PM

of course, it could be defined that that segment is just some arbitrary % of users based on the number of updated pods

Erik Osterman (Cloud Posse)

11:41:12 PM

however, that can lead to inconsistent behavior

Erik Osterman (Cloud Posse)

11:41:59 PM

I think it’s hard to generalize for all use-cases, but some kind of “canary” style rolling deployment can be achieved using “max unavailable”

rms1000watt

11:42:56 PM

people always harp on canaries around “well you gotta make sure you’re running health checks on the right stuff.. p99, queue depth, DD metrics”

Erik Osterman (Cloud Posse)

11:43:36 PM

ya, btw, you’ve seen flagger, right?

Erik Osterman (Cloud Posse)

11:43:51 PM

https://github.com/weaveworks/flagger

weaveworks/flagger

Progressive delivery Kubernetes operator (Canary, A/B Testing and Blue/Green deployments) - weaveworks/flagger

Erik Osterman (Cloud Posse)

11:44:11 PM

Flagger is the first controller to automate the way canaries “should” be done

rms1000watt

11:44:13 PM

Oh yeah, this is the one istio integrates with

Erik Osterman (Cloud Posse)

11:44:15 PM

with prometheus metrics

Erik Osterman (Cloud Posse)

11:44:57 PM

it supports nginx-ingress too

Erik Osterman (Cloud Posse)

11:44:58 PM

https://docs.flagger.app/usage/nginx-progressive-delivery

rms1000watt

11:45:03 PM

hmm.. still requires 2 services. Not a deal breaker. Just interesting.

rms1000watt

11:45:22 PM

in a perfect world, I can still just helmfile apply and win

Erik Osterman (Cloud Posse)

11:45:39 PM

Erik Osterman (Cloud Posse)

11:46:02 PM

fwiw, we’re doing some blue/green deployments with helmfile

Erik Osterman (Cloud Posse)

11:46:26 PM

and using hooks to query kube api to determine the color

rms1000watt

11:49:16 PM

Ah, nice

rms1000watt

11:49:27 PM

that is pretty interesting

rms1000watt

11:43:51 PM

sounds familiar, but I’ll just say no

2020-02-11

Julian Gindi

07:09:47 PM

Has anyone run into issues with downtime when nginx pods are terminated? I am testing our downtime during rollouts and are noticing a small blip with our nginx pods. Our normal API pods cause no downtime when rolled. Anyone have experience with this?

kskewes

07:40:58 PM

Ingress-nginx? That project has recently changed termination behavior. Details in change log.

Julian Gindi

07:41:36 PM

So not actually ingress-nginx, we are just using an nginx container behind ambassador. This is a legacy port to Kubernetes from an ECS system

Julian Gindi

07:41:53 PM

Our ingress comes through Ambassador

kskewes

07:50:22 PM

Okay. Hard to say, having a preStop sleep and termination grace should allow single requests enough time to finish. Websocket clients will need to reconnect though (onto new pod). So just server side isn’t enough.

Julian Gindi

05:51:59 PM

Yeah I’ll have to play around with termination grace periods a bit more I think

Julian Gindi

05:52:00 PM

Thanks!

Erik Osterman (Cloud Posse)

08:37:23 PM

Also make sure to tune maxUnavailable

2020-02-12

2020-02-13

grv

08:17:24 PM

Hey all, I have ran into this problem before and wanted to know if someone has a good solution for it. I run kube clusters using KOPS. For infrastructure (VPC), I create them using cloudposse terraform scripts, route 53 stuff and all using tf as well. Then I create kube cluster using KOPS. While everything works fine, I always run into trouble when deleting kube clusters

grv

08:26:21 PM

never mind, I found solution to my own problem

Alex Siegman

08:56:56 PM

I use kops, but haven’t been in the habit of deleting clusters; out of curiosity was it just some setting or config somewhere?

grv

09:17:15 PM

so, when I setup my infra (vpc, route 53 etc) using old tf versions, there were some documented issues in github that the kops delete cluster was trying to delete everything, but then used to fail cz resources like VPC was created using Terraform. In an ideal scenario, kops delete should not even look at those resources

grv

09:18:03 PM

Whatever new stuff i put in (vpc etc) using Cloudposse, is working well (kops delete only deleting resources such as security groups, nodes, masters and other instance groups) and not touching VPC config, Elastic IP’s and all

grv

09:19:13 PM

My guess is, whatever we put up using standard terraform earlier, has incorrect tags or something attached, becuase those resources like vpc, does come in the list opf resources to delete when i do kops delete cluster without the --yes flag, to validate what all will be deleted

johncblandii

10:36:11 PM

Has anyone seen their a pod not be able to talk to itself?

Context: Jenkins on K8s runs perfectly fine for building images, but the rtDockerPush call does not allow a connection.

java.net.ConnectException: connect(..) failed: Permission denied
Also:   hudson.remoting.Channel$CallSiteStackTrace: Remote call to JNLP4-connect connection from ip-10-60-177-129.us-west-2.compute.internal/10.60.177.129:34956
		at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1741)
		at hudson.remoting.UserRequest$ExceptionResponse.retrieve(UserRequest.java:356)
		at hudson.remoting.Channel.call(Channel.java:955)
		at org.jfrog.hudson.pipeline.common.docker.utils.DockerAgentUtils.getImageIdFromAgent(DockerAgentUtils.java:292)
		at org.jfrog.hudson.pipeline.common.executors.DockerExecutor.execute(DockerExecutor.java:64)
		at org.jfrog.hudson.pipeline.declarative.steps.docker.DockerPushStep$Execution.run(DockerPushStep.java:96)
		at org.jfrog.hudson.pipeline.declarative.steps.docker.DockerPushStep$Execution.run(DockerPushStep.java:71)
		at org.jenkinsci.plugins.workflow.steps.AbstractSynchronousNonBlockingStepExecution$1$1.call(AbstractSynchronousNonBlockingStepExecution.java:47)
		at hudson.security.ACL.impersonate(ACL.java:290)
		at org.jenkinsci.plugins.workflow.steps.AbstractSynchronousNonBlockingStepExecution$1.run(AbstractSynchronousNonBlockingStepExecution.java:44)
		at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
		at java.util.concurrent.FutureTask.run(FutureTask.java:266)
		at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
		at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
Caused: io.netty.channel.AbstractChannel$AnnotatedConnectException: connect(..) failed: Permission denied: /var/run/docker.sock

johncblandii

02:28:00 PM

Ok…had a permission issue. We set fsGroup to 995 on our old cluster. It is 994 on the new one. All is well.

Now to get away from using the docker socket all together.

2020-02-14

2020-02-16

roth.andy

11:36:51 PM

I got tired of reading the stupid guide every time I wanted to spin up Rancher on my local machine so I made a thing!

https://github.com/RothAndrew/k8s-cac/tree/master/docker-desktop/rancher

RothAndrew/k8s-cac

Configuration As Code for various kubernetes setups - RothAndrew/k8s-cac

Zachary Loeber

12:21:38 AM

Nifty. I’ll give it a whirl (as I’ve been looking for an excuse to load me up some rancher). Thanks for sharing

RothAndrew/k8s-cac

Configuration As Code for various kubernetes setups - RothAndrew/k8s-cac

Zachary Loeber

01:14:50 PM

got it installed the other night in minikube but the rancher chart defaults to using a loadbalancer IP that doesn’t work for me. I’ll tinker with it more. Either way, thx for sharing.

roth.andy

01:33:19 PM

hmm. For me (Docker Desktop on Mac) it created an Ingress mapped to localhost, which worked great.

roth.andy

01:33:33 PM

I don’t think I did anything special for it to do that.

2020-02-17

2020-02-19

Karoline Pauls

12:16:27 PM

What’s your approach to memory and cpu requests vs limits? I set requests equal to limits for both cpu and mem and forget about it. I’d like to hear from people who use more elaborate strategies.

dalekurt

04:53:03 PM

@Karoline Pauls You’re trying to determine what those values should initially, or how much headroom to between request and limits?

Karoline Pauls

04:57:27 PM

@dalekurt I generally run a single pod, which i give 1 CPU, and do yes | xargs -n1 -P <number of worker threads/processes per pod + some num just to be mean> -I {UNUSED} curl <https://servicename.clustername.companyname.com/something_heavy> and see how much memory it takes, then give it 1 CPU and this much memory per pod, both in limits and requests.

Karoline Pauls

04:57:52 PM

Was just wondering if people do something smart with these settings.

dalekurt

05:45:30 PM

That’s pretty good. I would be interested in hearing other methods

Zachary Loeber

06:25:26 PM

I’d give https://github.com/FairwindsOps/goldilocks a whirl. It uses a controller to deploy the VPA in recommendation mode on labeled namespaces’ deployments to attempt to calculate required resources for pods.

FairwindsOps/goldilocks

Get your resource requests “Just Right”. Contribute to FairwindsOps/goldilocks development by creating an account on GitHub.

Erik Osterman (Cloud Posse)

06:31:38 PM

Neat! Hadn’t seen goldilocks before

dalekurt

07:18:11 PM

@Zachary Loeber If you are in NY or ever in NY I owe you a beer

Zachary Loeber

07:21:51 PM

ha, maybe if I actually authored the thing. I’m just a toolbag good sir. One tip with the chart since we are talking about it, don’t sweat setting up VPA ahead of time (there isn’t a good chart for it anyway), just use the InstallVPA: true value and it will run the deployment for the VPA for you. Also, the github repo docs are a bit out of date. The actual helm repo for goldilocks is stable and at https://charts.fairwinds.com/stable

Discover & launch great Kubernetes-ready apps

kskewes

04:25:12 AM

We always set same value for memory as not compressible or shareable. Headroom depends on app peak usage then healthy bit more as memory is cheap. CPU moved everything to burst with 3x. Because CFS issue causing high P99, limit unnecessary throttling, drive higher usage of worker nodes, spikey nature of requests.

2020-02-20

Andreas

04:01:39 PM

What are you guys using for multi cluster monitoring?

Pierre Humberdroz

04:32:43 PM

Prometheus. 1x Cluster Internal Prometheus for each cluster -> External Prometheus capturing all other prometheus via federation.

Zachary Loeber

08:07:59 PM

Interesting article on pros/cons of various kube cluster architecture strategies: https://learnk8s.io/how-many-clusters

tomkinson

04:37:41 AM

Hi all. Any ideas on why we get basically a 503 trying to expose the port for WebStomp with RabbitMQ? It has us perplexed. Here is the YAML;

apiVersion: extensions/v1beta1 kind: Ingress metadata: annotations:

field.cattle.io/creatorId: user-vnwtv field.cattle.io/ingressState: ‘{“c3RvbXAvcmFiYml0bXEtaGEvc3RvbXAudGV4dHJhcy5jb20vL3N0b21wLzE1Njc0”:””}’

field.cattle.io/publicEndpoints: ‘[{“addresses”80,”protocol”“rabbitmq-ha“rabbitmq-hastomp.ourdomain.com”,”path”:”/stomp”,”allNodes”:true}]’ nginx.org/websocket-services: rabbitmq-ha-prod creationTimestamp: “2020-01-29T0540Z” generation: 2 labels: cattle.io/creator: norman name: stomp namespace: rabbitmq-ha resourceVersion: “1653424” selfLink: /apis/extensions/v1beta1/namespaces/rabbitmq-ha/ingresses/stomp uid: f15178c6-3d10-4202-adbd-d9d1cc123bc5 spec: rules:

host: stomp.ourdomain.com http: paths:
- backend: serviceName: rabbitmq-ha-prod servicePort: 15674 path: /stomp status: loadBalancer: ingress:
- ip: xxx.xx.xxx.x

It looks as though the RMQ http API used by our NodeJS code is fine as I see exchanges being created. So there is service running on that port.

Erik Osterman (Cloud Posse)

04:44:38 AM

Can you make this markdown or a snippet instead?

tomkinson

04:49:06 AM

Ya my bad. The Yaml file went all wonky on mobile which is where I posted this from. I’ll post the Yaml from Mac in a snippet

tomkinson

04:51:38 AM

I was looking at https://www.rabbitmq.com/web-stomp.html and notice it says if enabling the webstomp plugin use port 15674, also seems with TLS and SSL they are using 15673 not 15674. But maybe I am confused if this plugin is required

tomkinson

04:38:04 AM

Using Rancher btw

roth.andy

02:52:45 PM

How long have you been using Rancher? How do you like it?

tomkinson

05:52:44 PM

Using Rancher for over 18months. Mixed reviews. Lot of open issues. Support is scattered. Product is getting better now though I feel they try to focus on their own products like k3s

tomkinson

05:53:03 PM

Getting SSL was like pulling teeth.

2020-02-21

Karoline Pauls

12:10:05 PM

https://v1-14.docs.kubernetes.io/docs/concepts/workloads/controllers/statefulset/#parallel-pod-management wow, this makes statefulsets almost like deployments-but-with-provisioned-storage

Roderik van der Veer

04:24:39 PM

Anyone managed to get nginx-ingress on GKE to work when both exposing custom TCP and UDP ports? It complains about LoadBalancer not being able to mix protocols

Erik Osterman (Cloud Posse)

05:16:40 PM

Alright, time to level up. Inviting @scorebot !

scorebot

05:16:42 PM

@scorebot has joined the channel

scorebot

05:16:43 PM

Thanks for adding me emojis used in this channel are now worth points.

scorebot

05:16:44 PM

Wondering what I can do? try @scorebot help

btai

05:26:17 PM

@scorebot help

scorebot

05:26:18 PM

You can ask me things like @scorebot my score - Shows your points @scorebot winning - Shows Leaderboard @scorebot medals - Shows all Slack reactions with values @scorebot = 40pts - Sets value of reaction

Erik Osterman (Cloud Posse)

07:15:46 PM

anyone using kudo?

Zachary Loeber

07:37:50 PM

No but I did look into it out of curiosity. It has a krew plugin and only a few actual implementations but looks fairly promising for scaffolding out new operators (or that’s what I think its supposed to be used for….)

Zachary Loeber

07:38:39 PM

last time I checked it out there were only 2 apps in the repo

Zachary Loeber

07:39:05 PM

looks like the may have added a few though. You looking to make an operator?

Erik Osterman (Cloud Posse)

07:40:35 PM

we’re looking into defining a CRD to describe an opinionated “microservice” that meets a specific customer’s needs

Erik Osterman (Cloud Posse)

07:41:24 PM

we’re looking at it less as a way to share existing operators - so don’t mind if there’s a limited selection.

Zachary Loeber

07:44:05 PM

That’s sort of what drove me towards it before. Sorry for my non-answer but I definitely have interest in it if anyone else pipes up on this one.

Erik Osterman (Cloud Posse)

07:44:44 PM

hey - if it’s on your radar, it’s added validation in my

Zachary Loeber

07:45:06 PM

It just seem that there should be an easier way to create CRDs right?

Zachary Loeber

07:45:48 PM

I’ll revisit and try to deploy the spark operator one for the heck of it, I have a partial need for such a thing anyway

Zachary Loeber

07:46:48 PM

(spark on kube is awful migraine inducing yuckiness of technology….)

Erik Osterman (Cloud Posse)

07:48:54 PM

it’s really early in our exploration. basically, we want to be able to define something like

buildpack: microservice   # the type of application lifecycle
deployment: { strategy: rolling }
public: true 
healthchecks:
  liveness: /healthz
  readiness: /readyz
replias: [2, 10]
port: 8080
resources: { limits: { memory: 2g, cpu: 300m }, requests: { memory: … }}
variables:              # standard environment variables 
  DEBUG: false
environments: [ preview, prod, staging ]  
addons:
  - redis       # or memcache
  - postgres    # or mysql
  - s3-bucket
  

and it will automatically configure our pipelines for a type of microservice and setup the backing services.

we want to be untethered to the CI/CD technology (E.g. use codefresh if we want to), use helm or helmfile if we want to, and not be vendor locked into some rancherish solution - which albeit is nice - locks us into technology decisions which we don’t get to make.

Erik Osterman (Cloud Posse)

07:49:43 PM

we’d like to be able to use Elasticache Redis in production, while a simple redis container in preview environments.

Erik Osterman (Cloud Posse)

07:50:09 PM

we’d like to have the settings to connect to these automatically populated in the namespace so the service can just “mount” them as envs

Zachary Loeber

07:52:11 PM

so will the addons be then deployed via service catalog then (or will you abstract calls to sc rather)?

Erik Osterman (Cloud Posse)

07:53:06 PM

something like that. so we can deploy the operator in dev (for preview environments for example), and it would use containers for postgres and redis

Erik Osterman (Cloud Posse)

07:53:22 PM

or we deploy the same operator in staging and then it uses RDS and elasticache

Erik Osterman (Cloud Posse)

07:53:25 PM

etc

Erik Osterman (Cloud Posse)

07:53:53 PM

really just riffing out loud right now. very early stages.

Zachary Loeber

07:55:01 PM

right right, I’m following. I so wish I were in a position to offer actual help on this but it’s been on my mind as well. It seems that complex helm charts simply isn’t enough for a holistic solution

Erik Osterman (Cloud Posse)

07:55:25 PM

it’s a building block. i still stand by helm.

Erik Osterman (Cloud Posse)

07:55:57 PM

i don’t want something that (we didn’t write - so to say) that tells me how to do everything

Zachary Loeber

07:56:00 PM

crossplane does some abstraction of elements for cluster deployments but doesn’t address the microservice/team needs

Erik Osterman (Cloud Posse)

07:56:24 PM

because the minute some customer says they want X instead of Y the entire solution is null and void

Zachary Loeber

07:56:43 PM

I don’t NOT stand by helm, it serves a purpose for certain

Erik Osterman (Cloud Posse)

07:57:03 PM

so the operator-as-a-pattern is interesting for these reasons

Erik Osterman (Cloud Posse)

07:57:28 PM

but then maybe not as powerful?

Zachary Loeber

08:00:47 PM

I don’t know about less powerful, but anything that requires an operator to function inherently makes it less maneuverable? Maybe not the right word.

Zachary Loeber

08:00:56 PM

That being said, have you done anything with OLM at all?

Zachary Loeber

08:01:13 PM

https://github.com/operator-framework/operator-lifecycle-manager

operator-framework/operator-lifecycle-manager

A management framework for extending Kubernetes with Operators - operator-framework/operator-lifecycle-manager

Zachary Loeber

08:18:53 PM

Hey, so right off the top, kubernetes 15.x seems to be expected to install an operator that I want to run:

Zachary Loeber

08:22:43 PM

though it seems operator specific, zookeeper installed without issue (because of course dumb ol’ zookeeper will work….)

Alex Siegman

08:28:05 PM

Huh, so you’re looking to basically make a CRD version of monochart?

Zachary Loeber

08:29:33 PM

I imagine it as a more contextually aware version of it but I’m probably not seeing the whole picture

Alex Siegman

08:38:36 PM

I’ve been wanting to build a kubernetes operator just as a learning experience, but I haven’t had a proper idea that also seems feasible for my level of programming knowledge~

Alex Siegman

08:38:56 PM

everytime i think i have an idea, i find a better way to do it with built in stuff

Erik Osterman (Cloud Posse)

10:26:30 PM

yea, we always finding something else that works well enough.

Erik Osterman (Cloud Posse)

10:26:41 PM

I’m always looking for more glue though

Chris Fowles

10:29:08 PM

this thread is basically my own inner monologue echoed back at me

Chris Fowles

11:13:02 PM

we’re never going to get away from go-template+sprig though are we

Erik Osterman (Cloud Posse)

07:15:57 PM

https://kudo.dev/

roth.andy

07:35:47 PM

@scorebot winning

scorebot

07:35:48 PM

:sports_medal: Leader Board of Office Karma :sports_medal:

erik: 110 scorebot: 40 briantai35: 40 marco803: 25 eshepelyuk: 25 zloeber: 5

See scores in your team dashboard

2020-02-23

2020-02-25

curious deviant

11:21:25 AM

Hello Kubernetes Experts,

We have a K8s Deployment that scales up in response to the number of messages in a queue using HPA. There is a requirement to have the HPA scale up the pods however the scaling down must be handled by some custom code. Is it possible to configure the HPA to only handle the scale-up ?

Zachary Loeber

02:52:57 PM

https://cloud.google.com/kubernetes-engine/docs/best-practices/enterprise-multitenancy#checklist <- A worthy checklist regardless of the cloud vendor

Best practices for enterprise multi-tenancy

Zachary Loeber

03:02:39 PM

@curious deviant I’m reading it that you don’t want autoscaling, you want scale up. If you have the scale down being handled via custom code what is preventing you from wrapping that all into a controller of your own to handle both? I know that you can disable cluster autoscale down with an annotation now, not certain about HPA though

curious deviant

05:59:40 PM

Thank you for your response. Writing out custom code to handle both is definitely an option. I just wanted to be sure I wasn’t missing out on anything that’s already available OOB today. This helps.

Zachary Loeber

06:01:15 PM

its a great question either way. I’m still digging into it a little out of personal curiosity

Alex Siegman

06:56:02 PM

How does HPA get/read the metric to scale up on queue messages?

Alex Siegman

06:56:19 PM

Is there a way to more deterministically say something like “For every 50 messages in the queue, spin up a pod”?

Alex Siegman

06:56:29 PM

That way when it checks, it can spin down to an appropriate amount as the backlog is handled

Zachary Loeber

06:58:17 PM

The prometheus adapter deployment allows for feeding custom metrics into HPA

Zachary Loeber

06:58:42 PM

If your deployment emits the right metrics, you should be able to scale on it I’d think

Zachary Loeber

04:00:40 PM

Anyone read through this nifty guide yet? https://kubernetes-on-aws.readthedocs.io/en/latest/index.html

btai

06:53:49 PM

I understand the nature of k8s is to be able to deploy onto different underlying platforms with ease, but any gotchas for peeps that made the move from kops on AWS to EKS?

Zachary Loeber

07:08:45 PM

@btai Technically, if the hype is to be believed, you just went to an easier to manage cluster . I’m not going to profess to be an aws eks master but I’d say that possibly managing your own storage maybe simplified by using ebs or efs within eks for your persistent volumes. On the other hand, you may see those storage costs go up if you were using local ec2 instance stores in your kops clusters (presented using something like minio) for high disk IO driven workloads and now are using EKS EBS presented storage instead.

Zachary Loeber

07:09:25 PM

I may be way off though and welcome corrections by smarter people in this channel

Hemanth

08:30:26 PM

anyone running kubernetes on raspberry pi’s here ? just have few newbie questions - my intention is just for the fun/kicks on it for weekends

Alex Siegman

10:18:25 PM

There’s a regular attendee of office hours who plays with this a lot, I can’t definitively remember his name though. I think maybe @dalekurt?

dalekurt

10:18:55 PM

Hey

Erik Osterman (Cloud Posse)

11:44:24 PM

And @roth.andy

roth.andy

11:44:50 PM

Sup

Hemanth

12:03:55 AM

Cool, how many pi’s do i need ? raspberry pi 3 or should i got for the raspberry pi 4 ?

roth.andy

12:04:17 AM

I have 6 3B’s. I wish they were 4s

roth.andy

12:05:35 AM

The extra RAM is really nice. Once etcd and kubelet and whatever else you need is running there isn’t much room for anything real

roth.andy

12:06:20 AM

If I could do it over it would be with the 4s with 4gb of ram

Hemanth

12:06:33 AM

Noted

roth.andy

12:07:50 AM

Power Over Ethernet (PoE) is worth it. Much less hassle than having to figure out USB C power distribution

roth.andy

12:09:14 AM

I went with kubeadm so I could have multi-master, but now that k3s supports multi master I would use that

roth.andy

12:09:33 AM

https://github.com/alexellis/k3sup

alexellis/k3sup

from Zero to KUBECONFIG in < 1 min . Contribute to alexellis/k3sup development by creating an account on GitHub.

Hemanth

12:11:04 AM

Thanks for the link

kskewes

06:44:39 AM

I’ve got 7 rock64 4gb and in the end back using 3 as masters and workers as VM with KVM (x86). I would just VM the lot next time (I tend to go in circles from full virt to metal and back)

roth.andy

06:53:55 PM

https://www.engadget.com/2020/02/27/35-raspberry-pi-4-double-ram-2-GB/?guccounter=1

The $35 Raspberry Pi 4 now comes with double the RAM

The same price as the original when it launched in 2012.

2020-02-26

Mahesh

09:28:00 AM

Hello all, I would like to set nologin in one of Deploy agent Pod so that no one can do kubectl exec --- Is there a way we can achieve this without having to look at k8s RBAC? I want to avoid k8s RBAC because we use EKS platform and not using RBAC extensively yet.

2020-02-27

Karoline Pauls

05:13:14 PM

Does anyone know how cluster-autoscaler finds out about expected taints of specific instance groups? The terraform examples i know just add --register-with-taints=ondemand=true:NoSchedule to the kubelet args and in order to know about that, cluster autoscaler would have to scrape the instance setup script. Does cluster-autoscaler even take node taints and pod tolerations into account?

Erik Osterman (Cloud Posse)

04:30:11 PM

I don’t know the answer to this, but maybe you could share what you want to accomplish?

Zachary Loeber

04:30:46 PM

sry, have a hard time remembering to start threads

Zachary Loeber

04:30:48 PM

afaik autoscaler does not auto-apply node taints from the last time I looked into it on aks clusters.

[10:29 AM] but I was not being extremely exhaustive on what I was doing either, I found that using taints for steering deployments was not super intuitive and leaned into using node affinities to force preferences for workload node pools/groups instead. [10:30 AM] annotations seemed to carry over to autoscaled nodes just fine

but I was not being extremely exhaustive on what I was doing either, I found that using taints for steering deployments was not super intuitive and leaned into using node affinities to force preferences for workload node pools/groups instead.

annotations seemed to carry over to autoscaled nodes just fine

Karoline Pauls

12:10:35 PM

https://github.com/kubernetes/autoscaler/issues/2434#issuecomment-592494391 https://github.com/kubernetes/autoscaler/issues/2434#issuecomment-592546084

Karoline Pauls

12:10:53 PM

answers there

Erik Osterman (Cloud Posse)

07:21:11 PM

The reason I want to understand the use-case is that it sounds like it might be related to working with spot instances.

Erik Osterman (Cloud Posse)

07:21:13 PM

https://github.com/pusher/k8s-spot-rescheduler/

pusher/k8s-spot-rescheduler

Tries to move K8s Pods from on-demand to spot instances - pusher/k8s-spot-rescheduler

Erik Osterman (Cloud Posse)

07:24:26 PM

there’s another controller to manage the node taints - just struggling to remember what it was called

Erik Osterman (Cloud Posse)

08:56:36 PM

https://github.com/aws/aws-node-termination-handler is the other one I was thinking off

aws/aws-node-termination-handler

A Kubernetes Daemonset to gracefully handle EC2 instance shutdown - aws/aws-node-termination-handler

Karoline Pauls

11:12:51 AM

@Erik Osterman (Cloud Posse) So, since i didn’t want to play with webhooks (however threy’re called), i decided to:

• add a taint and a label to ondemand nodes

• add a label to spot nodes (unused but is there for completeness) Now, pods that must run on ondemand have:

• a toleration for the ondemand taint

• a nodeSelector targetting the ondemand label (except for daemonsets) Pods that do not have these, won’t schedule on ondemand.

Karoline Pauls

11:13:28 AM

(the “webhooks” i meant are called admission control)

Karoline Pauls

11:17:08 AM

I know about base ondemand capacity but I think that making sure critical workloads are scheduled on the ondemand nodes with a rescheduler would be “eventually consistent”.

Karoline Pauls

11:19:41 AM

also my ondemand nodes are 4 times smaller than the spot ones, they’re just to run critical things like cluster-autoscaler, alb ingress controller, metrics server, ingress merge (is it still necessary even?), job schedulers…

Erik Osterman (Cloud Posse)

03:48:49 PM

That’s a nice solution @Karoline Pauls! I like the approach and it makes a lot of sense the way you did it.

Karoline Pauls

03:49:18 PM

thanks

2020-02-28

Zachary Loeber

04:27:52 PM

afaik autoscaler does not auto-apply node taints from the last time I looked into it on aks clusters.

Zachary Loeber

04:29:58 PM

Zachary Loeber

04:30:18 PM

annotations seemed to carry over to autoscaled nodes just fine

Zachary Loeber

04:32:01 PM

Is anyone out there building up managed kube clusters, running a workload, then tearing them down as part of a deployment pipeline?

Erik Osterman (Cloud Posse)

04:34:34 PM

@btai

btai

05:32:04 PM

@Zachary Loeber i’m doing similar. but not just running a workload. our applications live on ephemeral clusters, when we are upgrading to a new k8s version we spin up a new cluster (i.e. 1.11 -> 1.12, CVEs) and for other things like helm3 migrations

Zachary Loeber

05:39:58 PM

@btai So you spin up the cluster, baseline config it, then kick off some deployment pipelines to shove code into it?

btai

05:46:48 PM

yep, spin up cluster with baseline “cluster services” which include logging daemonset, monitoring, etc. then kick off deployment pipelines to deploy our stateless applications to the new cluster. (and we do a route53 cutover once that is all done)

Zachary Loeber

06:59:03 PM

so do you prestage the vpcs and such?

Zachary Loeber

06:59:08 PM

thanks for your time btw

btai

07:48:43 PM

@Zachary Loeber the cluster and its vpc get provisioned together. During the provisioning stage, it gets vpc peered with other VPCs (rds vpc for example). I don’t mind chatting about this at all. We have a unique setup and I sometimes wish I had more peeps to bounce ideas off of. While its been great for doing things like CVE patches and cluster version upgrades (esp the ones that aren’t exactly push buttom i.e 1.11 -> 1.12), it definitely requires more creativity sometimes because many turnkey solutions for things we want to implement don’t usually take into account k8s cluster that aren’t long lived.