#kubernetes (2020-2)

kubernetes

Archive: https://archive.sweetops.com/kubernetes/

2020-02-16

roth.andy avatar
roth.andy

I got tired of reading the stupid guide every time I wanted to spin up Rancher on my local machine so I made a thing!

https://github.com/RothAndrew/k8s-cac/tree/master/docker-desktop/rancher

RothAndrew/k8s-cac

Configuration As Code for various kubernetes setups - RothAndrew/k8s-cac

Zachary Loeber avatar
Zachary Loeber

Nifty. I’ll give it a whirl (as I’ve been looking for an excuse to load me up some rancher). Thanks for sharing

RothAndrew/k8s-cac

Configuration As Code for various kubernetes setups - RothAndrew/k8s-cac

2020-02-14

2020-02-13

grv avatar

Hey all, I have ran into this problem before and wanted to know if someone has a good solution for it. I run kube clusters using KOPS. For infrastructure (VPC), I create them using cloudposse terraform scripts, route 53 stuff and all using tf as well. Then I create kube cluster using KOPS. While everything works fine, I always run into trouble when deleting kube clusters

grv avatar

never mind, I found solution to my own problem

Alex Siegman avatar
Alex Siegman

I use kops, but haven’t been in the habit of deleting clusters; out of curiosity was it just some setting or config somewhere?

grv avatar

so, when I setup my infra (vpc, route 53 etc) using old tf versions, there were some documented issues in github that the kops delete cluster was trying to delete everything, but then used to fail cz resources like VPC was created using Terraform. In an ideal scenario, kops delete should not even look at those resources

grv avatar

Whatever new stuff i put in (vpc etc) using Cloudposse, is working well (kops delete only deleting resources such as security groups, nodes, masters and other instance groups) and not touching VPC config, Elastic IP’s and all

grv avatar

My guess is, whatever we put up using standard terraform earlier, has incorrect tags or something attached, becuase those resources like vpc, does come in the list opf resources to delete when i do kops delete cluster without the --yes flag, to validate what all will be deleted

:--1:1
johncblandii avatar
johncblandii

Has anyone seen their a pod not be able to talk to itself?

Context: Jenkins on K8s runs perfectly fine for building images, but the rtDockerPush call does not allow a connection.

[java.net](http://java.net).ConnectException: connect(..) failed: Permission denied
Also:   hudson.remoting.Channel$CallSiteStackTrace: Remote call to JNLP4-connect connection from ip-10-60-177-129.us-west-2.compute.internal/10.60.177.129:34956
		at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1741)
		at hudson.remoting.UserRequest$ExceptionResponse.retrieve(UserRequest.java:356)
		at hudson.remoting.Channel.call(Channel.java:955)
		at org.jfrog.hudson.pipeline.common.docker.utils.DockerAgentUtils.getImageIdFromAgent(DockerAgentUtils.java:292)
		at org.jfrog.hudson.pipeline.common.executors.DockerExecutor.execute(DockerExecutor.java:64)
		at org.jfrog.hudson.pipeline.declarative.steps.docker.DockerPushStep$Execution.run(DockerPushStep.java:96)
		at org.jfrog.hudson.pipeline.declarative.steps.docker.DockerPushStep$Execution.run(DockerPushStep.java:71)
		at org.jenkinsci.plugins.workflow.steps.AbstractSynchronousNonBlockingStepExecution$1$1.call(AbstractSynchronousNonBlockingStepExecution.java:47)
		at hudson.security.ACL.impersonate(ACL.java:290)
		at org.jenkinsci.plugins.workflow.steps.AbstractSynchronousNonBlockingStepExecution$1.run(AbstractSynchronousNonBlockingStepExecution.java:44)
		at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
		at java.util.concurrent.FutureTask.run(FutureTask.java:266)
		at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
		at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
Caused: io.netty.channel.AbstractChannel$AnnotatedConnectException: connect(..) failed: Permission denied: /var/run/docker.sock
johncblandii avatar
johncblandii

Ok…had a permission issue. We set fsGroup to 995 on our old cluster. It is 994 on the new one. All is well.

Now to get away from using the docker socket all together.

2020-02-12

2020-02-11

Julian Gindi avatar
Julian Gindi

Has anyone run into issues with downtime when nginx pods are terminated? I am testing our downtime during rollouts and are noticing a small blip with our nginx pods. Our normal API pods cause no downtime when rolled. Anyone have experience with this?

kskewes avatar
kskewes

Ingress-nginx? That project has recently changed termination behavior. Details in change log.

Julian Gindi avatar
Julian Gindi

So not actually ingress-nginx, we are just using an nginx container behind ambassador. This is a legacy port to Kubernetes from an ECS system

Julian Gindi avatar
Julian Gindi

Our ingress comes through Ambassador

kskewes avatar
kskewes

Okay. Hard to say, having a preStop sleep and termination grace should allow single requests enough time to finish. Websocket clients will need to reconnect though (onto new pod). So just server side isn’t enough.

Julian Gindi avatar
Julian Gindi

Yeah I’ll have to play around with termination grace periods a bit more I think

Julian Gindi avatar
Julian Gindi

Thanks!

1
Erik Osterman avatar
Erik Osterman

Also make sure to tune maxUnavailable

2020-02-10

rms1000watt avatar
rms1000watt

Thought experiment..

Is there a fundamental difference between slow+smart rolling deployment vs. canary deployment?

rms1000watt avatar
rms1000watt

Curious if I can hack the health checks of a rolling deployment to convert the behavior into a canary deployment

Erik Osterman avatar
Erik Osterman

Canary can be restricted (e.g. with feature flags) to a segment of users.

Erik Osterman avatar
Erik Osterman

of course, it could be defined that that segment is just some arbitrary % of users based on the number of updated pods

Erik Osterman avatar
Erik Osterman

however, that can lead to inconsistent behavior

Erik Osterman avatar
Erik Osterman

I think it’s hard to generalize for all use-cases, but some kind of “canary” style rolling deployment can be achieved using “max unavailable”

rms1000watt avatar
rms1000watt

people always harp on canaries around “well you gotta make sure you’re running health checks on the right stuff.. p99, queue depth, DD metrics”

Erik Osterman avatar
Erik Osterman

ya, btw, you’ve seen flagger, right?

:--1:1
Erik Osterman avatar
Erik Osterman
weaveworks/flagger

Progressive delivery Kubernetes operator (Canary, A/B Testing and Blue/Green deployments) - weaveworks/flagger

Erik Osterman avatar
Erik Osterman

Flagger is the first controller to automate the way canaries “should” be done

rms1000watt avatar
rms1000watt

Oh yeah, this is the one istio integrates with

Erik Osterman avatar
Erik Osterman

with prometheus metrics

Erik Osterman avatar
Erik Osterman

it supports nginx-ingress too

rms1000watt avatar
rms1000watt

hmm.. still requires 2 services. Not a deal breaker. Just interesting.

rms1000watt avatar
rms1000watt

in a perfect world, I can still just helmfile apply and win

Erik Osterman avatar
Erik Osterman

Erik Osterman avatar
Erik Osterman

fwiw, we’re doing some blue/green deployments with helmfile

Erik Osterman avatar
Erik Osterman

and using hooks to query kube api to determine the color

rms1000watt avatar
rms1000watt

Ah, nice

rms1000watt avatar
rms1000watt

that is pretty interesting

rms1000watt avatar
rms1000watt

sounds familiar, but I’ll just say no

2020-02-07

johncblandii avatar
johncblandii

Anyone address kubelet-extra-args with eks node groups?

johncblandii avatar
johncblandii
[EKS] [request]: Managed Node Groups Custom Userdata support · Issue #596 · aws/containers-roadmap

Community Note Please vote on this issue by adding a :–1: reaction to the original issue to help the community and maintainers prioritize this request Please do not leave "+1" or "me to…

johncblandii avatar
johncblandii

in short, i’m trying to build docker images on our new jenkins stack built 100% with node groups now and the docker builds fail due to networking

johncblandii avatar
johncblandii
[0mErr:1 <http://security.debian.org/debian-security> buster/updates InRelease
  Temporary failure resolving '[security.debian.org](http://security.debian.org)'
johncblandii avatar
johncblandii

@Erik Osterman @aknysh I thought we could set the extra args for enabling the docker bridge, but it seems like that’s not how the https://github.com/awslabs/amazon-eks-ami/blob/master/files/bootstrap.sh#L341 is setup

awslabs/amazon-eks-ami

Packer configuration for building a custom EKS AMI - awslabs/amazon-eks-ami

Erik Osterman avatar
Erik Osterman

Are you doing docker builds under jenkins on kubernetes?

johncblandii avatar
johncblandii

yuppers

johncblandii avatar
johncblandii

planning to look at img, etc again, but the new jenkins needs to match the old for now

Erik Osterman avatar
Erik Osterman

aha

Erik Osterman avatar
Erik Osterman

yea, b/c I think getting away from dnd or mounting docker sock is a necessity

Erik Osterman avatar
Erik Osterman

there’s seemingly a dozen ways now to build images without docker daemon.

Erik Osterman avatar
Erik Osterman

are you mounting the host docker socket?

johncblandii avatar
johncblandii

agreed. that’s the goal in an upcoming sprint

johncblandii avatar
johncblandii

yuppers

johncblandii avatar
johncblandii

they build fine. they can’t communicate externally

johncblandii avatar
johncblandii
[0mErr:1 <http://security.debian.org/debian-security> buster/updates InRelease
  Temporary failure resolving '[security.debian.org](http://security.debian.org)'
johncblandii avatar
johncblandii

simple pipeline:

pipeline {
   agent {
       label 'xolvci'
   }

   stages {
      stage('docker') {
         steps {
            container('jnlp-slave') {
                writeFile file: 'Dockerfile', text: '''FROM openjdk:8-jre
                    RUN apt update && apt upgrade -y && apt install -y libtcnative-1'''
                script {
                    docker.build('jbtest:latest')
                }
            }
         }
      }
   }
}
Erik Osterman avatar
Erik Osterman

hrmmm but they can talk to other containers in the cluster?

johncblandii avatar
johncblandii

seems so

Erik Osterman avatar
Erik Osterman

do you also have artifactory?

Erik Osterman avatar
Erik Osterman

(thinking you could set HTTP_PROXY env)

Erik Osterman avatar
Erik Osterman

and get caching to boot!

johncblandii avatar
johncblandii

yes on artifactory

Erik Osterman avatar
Erik Osterman

artifactory runnign in the cluster?

johncblandii avatar
johncblandii

not yet

johncblandii avatar
johncblandii

what would i set HTTP_PROXY to? cluster endpoint?

Erik Osterman avatar
Erik Osterman

some other service in the cluster (assuming that the docker build can talk to services in the cluster)

Erik Osterman avatar
Erik Osterman

e.g. deploy squid proxy or artifactory

Erik Osterman avatar
Erik Osterman

that said, I did not have this problem when I did my prototype on digital ocean, so I am guessing whatever you are encountering is due to the EKS networkign

johncblandii avatar
johncblandii

yeah, it is def’ eks. it isn’t an issue when using rancher either

Erik Osterman avatar
Erik Osterman

ok

Erik Osterman avatar
Erik Osterman

is rancher using ENIs?

johncblandii avatar
johncblandii

not sure. just tinkered w/ it mostly

johncblandii avatar
johncblandii

might can downgrade to 0.10.0 since it supports inline in userdata.tpl

johncblandii avatar
johncblandii

worst day to fool w/ getting disparate nodes on a cluster.

johncblandii avatar
johncblandii

lol

Erik Osterman avatar
Erik Osterman

yea, trying to get this working with managed node groups might be wishful thinking

johncblandii avatar
johncblandii

yeah, i ditched that. i’m back for this group to a custom one with https://github.com/cloudposse/terraform-aws-eks-cluster

cloudposse/terraform-aws-eks-cluster

Terraform module for provisioning an EKS cluster. Contribute to cloudposse/terraform-aws-eks-cluster development by creating an account on GitHub.

johncblandii avatar
johncblandii

AWS support said it literally is not possibly with node groups

aknysh avatar
aknysh

are you back to using terraform-aws-eks-workers?

johncblandii avatar
johncblandii

for this one group of workers, yes

johncblandii avatar
johncblandii

it isn’t connecting, though

johncblandii avatar
johncblandii

i say that and it connects

johncblandii avatar
johncblandii

lol

johncblandii avatar
johncblandii

it wouldn’t work with the --kubelet-extra-args='--node-labels=node_type=cloudbees-jobs', though

johncblandii avatar
johncblandii

specifically:

bootstrap_extra_args          = "--enable-docker-bridge=true --kubelet-extra-args='--node-labels=node_type=cloudbees-jobs'"
johncblandii avatar
johncblandii

manually tweaked the launch template to not use = and that seemed to work for the bridge

johncblandii avatar
johncblandii

trying again w/ the extra args

johncblandii avatar
johncblandii

ugh…yup. frickin’ =

johncblandii avatar
johncblandii
bootstrap_extra_args          = "--enable-docker-bridge true --kubelet-extra-args '--node-labels=node_type=cloudbees-jobs'"
johncblandii avatar
johncblandii

^ bananadance

johncblandii avatar
johncblandii
bootstrap_extra_args          = "--enable-docker-bridge=true --kubelet-extra-args='--node-labels=node_type=cloudbees-jobs'"
johncblandii avatar
johncblandii

^ booooo

aknysh avatar
aknysh
90 days of AWS EKS in Production - kubedex.com attachment image

Come and read 90 days of AWS EKS in Production on Kubedex.com. The number one site to Discover, Compare and Share Kubernetes Applications.

aknysh avatar
aknysh

and yes, you can connect managed nodes (terraform-aws-eks-node-group), unmanaged nodes (terraform-aws-eks-workers) and Fargate nodes (terraform-aws-eks-fargate-profile) to the same cluster (terraform-aws-eks-cluster)

:--1:1
aknysh avatar
aknysh
cloudposse/terraform-aws-eks-fargate-profile

Terraform module to provision an EKS Fargate Profile - cloudposse/terraform-aws-eks-fargate-profile

johncblandii avatar
johncblandii

yeah, got managed and unmanaged now. waiting on fargate in us-west-2

johncblandii avatar
johncblandii

i saw a link to that 90 days earlier. gonna peep

johncblandii avatar
johncblandii

annnnnnnnd the builds are working

fast_parrot1
johncblandii avatar
johncblandii

phew. come on saturday!

johncblandii avatar
johncblandii
09:13:39 PM

2020-02-06

rms1000watt avatar
rms1000watt

https://www.reddit.com/r/Terraform/comments/axp3tv/run_terraform_under_kubernetes_using_an_operator/ <— nice!!!!

@Erik Osterman do you have a recommendation for using a k8s operator to manage aws resources? https://github.com/amazon-archives/aws-service-operator has been archived and uses cfn under the covers. Lol. Have you had luck with this or anything else?

Cameron Boulton avatar
Cameron Boulton

@rms1000watt @Erik Osterman I’m late to the convo but:

  1. I think aws-service-operator was just renamed (old one archived), not that the project was canceled: https://github.com/aws/aws-service-operator-k8s
  2. Crossplane looks interesting: https://crossplane.io/
aws/aws-service-operator-k8s

The AWS Service Operator (ASO) manages AWS services from Kubernetes - aws/aws-service-operator-k8s

Crossplane

The open source multicloud control plane.

Erik Osterman avatar
Erik Osterman

@rms1000watt not from first hand accounts, however, there have been a number of new operators to come out to address this lately

1
Erik Osterman avatar
Erik Osterman

I did see the aws-service-operator was deprecated the other day when I was checking something else out.

1
Erik Osterman avatar
Erik Osterman

sec - I think we talked about it recently in office hours

Erik Osterman avatar
Erik Osterman
kubeform/kubeform

Kubernetes CRDs for Terraform providers. Contribute to kubeform/kubeform development by creating an account on GitHub.

Erik Osterman avatar
Erik Osterman

What’s interesting “outwardly speaking” regarding kubeform is that it is by the people behind appscode

Erik Osterman avatar
Erik Osterman

appscode is building operators for managing all kinds of durable/pesistent services under kubernetes

Erik Osterman avatar
Erik Osterman
KubeDB

KubeDB by AppsCode simplifies and automates routine database tasks such as provisioning, patching, backup, recovery, failure detection, and repair for various popular databases on private and public clouds

Erik Osterman avatar
Erik Osterman

I don’t have any first hand accounts though of using it or other appscode services

rms1000watt avatar
rms1000watt

Ah, very nice reference

Erik Osterman avatar
Erik Osterman

Rancher has 2 projects they are working on.

Erik Osterman avatar
Erik Osterman

Both are alpha grade and don’t know how practical

Erik Osterman avatar
Erik Osterman
rancher/terraform-controller

Use K8s to Run Terraform. Contribute to rancher/terraform-controller development by creating an account on GitHub.

Erik Osterman avatar
Erik Osterman
rancher/terraform-controller

Use K8s to Run Terraform. Contribute to rancher/terraform-controller development by creating an account on GitHub.

Erik Osterman avatar
Erik Osterman

ohhhhh

Erik Osterman avatar
Erik Osterman

they renamed it

Erik Osterman avatar
Erik Osterman

never mind.

rms1000watt avatar
rms1000watt

My feelings were hurt by rancher’s product in 2016. I’m sure they’ve improved vastly since then though, lol.

rms1000watt avatar
rms1000watt

Yeah, II’m curious about the kubeform one

rms1000watt avatar
rms1000watt

will probably give that a try

rms1000watt avatar
rms1000watt

very nice references man

Erik Osterman avatar
Erik Osterman
danisla/terraform-operator

Kubernetes custom controller for operating terraform - danisla/terraform-operator

rms1000watt avatar
rms1000watt

this is why cloudposse slack is best-in-class

1
rms1000watt avatar
rms1000watt

Yeah, I saw that one too

rms1000watt avatar
rms1000watt

Just have no clue what people are actually usinig

Erik Osterman avatar
Erik Osterman

Please report back if you get a chance to prototype/poc one of them.

Erik Osterman avatar
Erik Osterman

I’ve wanted to do this for some time.

Erik Osterman avatar
Erik Osterman

I think it could simplify deployments in many ways by using the standard deployment process with kubernetes

rms1000watt avatar
rms1000watt

this is actually a part of a bigger discussion around ephemeral environments

Erik Osterman avatar
Erik Osterman

ya

rms1000watt avatar
rms1000watt

all the helmfile stuff is super straight forward

rms1000watt avatar
rms1000watt

it’s the infrastructure beyond it tho

Erik Osterman avatar
Erik Osterman

so while I love terraform, terraform is like a database migration tool that doesn’t support transactions

:100:1
1
Erik Osterman avatar
Erik Osterman

so when stuff breaks, no rollbacks

rms1000watt avatar
rms1000watt

heh, yea

Erik Osterman avatar
Erik Osterman

while webapps are usually trivial to rollback

Erik Osterman avatar
Erik Osterman

so coupling these two might cause more instability where before there was none.

Erik Osterman avatar
Erik Osterman

also, this is what was kind of nice (perhaps) with the aws-service-operator approach deploying cloudformation since cloudformation fails more gracefully

Erik Osterman avatar
Erik Osterman

did you listen to this weeks office hours recording?

rms1000watt avatar
rms1000watt

i havent

Erik Osterman avatar
Erik Osterman

I think it’s relevant to the ephemeral preview environments conversation, which is also one of the reasons were exploring terraform workspaces more and more as one of the tricks to help solve it.

rms1000watt avatar
rms1000watt

yeah, i was considering terraform workspaces

2020-02-04

Roderik van der Veer avatar
Roderik van der Veer

I’m a bit stuck with GKE persistent disks that cannot be found:

│  Type     Reason              Age   From                     Message                                                                                                                                                                                                                                                                                │
│  ----     ------              ----  ----                     -------                                                                                                                                                                                                                                                                                │
│  Normal   Scheduled           73s   default-scheduler        Successfully assigned coral-firefly/coral-firefly-ipfs-ipfs-0 to gke-shared-europe-main-3d862732-mq45                                                                                                                                                                                  │
│  Warning  FailedAttachVolume  73s   attachdetach-controller  AttachVolume.Attach failed for volume "pvc-c018de57-476a-11ea-af39-42010a8400a9" : GCE persistent disk not found: diskName="gke-shared-europe-9030-pvc-c018de57-476a-11ea-af39-42010a8400a9" zone="europe-west1-b"    
Roderik van der Veer avatar
Roderik van der Veer

while the disk is just there in the google console

Roderik van der Veer avatar
Roderik van der Veer

Not sure where to look for an error message i can actually use

Roderik van der Veer avatar
Roderik van der Veer

the pvc even says that it is bound

Roderik van der Veer avatar
Roderik van der Veer
ipfs-storage-coral-firefly-ipfs-ipfs-0               Bound    pvc-c018de57-476a-11ea-af39-42010a8400a9   10Gi       RWO            standard       27m
Roderik van der Veer avatar
Roderik van der Veer

anyone have an idea where to look?

Roderik van der Veer avatar
Roderik van der Veer

hmmm, seems like provisioning too many disks at the same time is the culprit. is there a way to configure that in k8s, helm or helmfile?

Erik Osterman avatar
Erik Osterman

Are you running helmfiles serially?

Roderik van der Veer avatar
Roderik van der Veer

i recently moved to a lot more parallel to speed it up (we deploy clusters + services as a service, it needs to be as fast as possible)

2020-02-03

Chris Fowles avatar
Chris Fowles

writing helm charts and wanting some instant feedback ended up with this to check my rendered templates: watch -d 'helm template my-chart-dir | kubeval && helm template my-chart-dir'

Erik Osterman avatar
Erik Osterman

nice trick

2020-02-02

Chris Fowles avatar
Chris Fowles

So i had a thought on the train this morning when I was thinking of writing some yoman generators for templating out gitops repos for flux so that teams don’t need to know exactly where to put what in which file.

Chris Fowles avatar
Chris Fowles

And then i thought “you know what’s really good at templating yaml? helm!”

Chris Fowles avatar
Chris Fowles

Is it crazy to use helm template to generate repos for flux (which will create flux helm crds)? I can’t see anything really stupid at face value - but feels like the kind of idea that requires a sanity check

Zachary Loeber avatar
Zachary Loeber

I wouldn’t think so. Jenkins-x does exactly that if you opt to not use tiller for deployments I believe.

    keyboard_arrow_up