SweetOps #kubernetes for November, 2020

Archive: https://archive.sweetops.com/kubernetes/

2020-11-01

2020-11-02

Matt Gowie

Anyone ever have a zombie k8s resource problem? We have a resource that we cannot delete (the delete processes, but the resource doesn’t go anywhere). Any debugging / troubleshooting tips to get at the internals of k8s to address that? More info in thread.

Matt Gowie

04:48:53 PM

I have an ingress resource which was deployed via Helm. The helm chart has been uninstalled, but the ingress resource is still around. It was connected to the v1 alb-ingress-controller + external-dns.

A team member has been banging her head against this for a day or so now and she hasn’t been able to delete the resource. We’ve uninstalled our ingress controller and external-dns and neither of them seem to be the culprit that is leaving it around.

When we try to destroy the ingress using kubectl, the command succeeds, but the resource isn’t actually removed.

Matt Gowie

04:49:41 PM

Definitely a problem specific to this single cluster as a helm uninstall of a similar chart in another cluster has no problems.

Matt Gowie

04:51:31 PM

Really looking for advice on how folks would investigate the internals here — Not looking for a bullet proof solution to this specific issue.

kskewes

04:54:56 PM

Sometimes there is a finalizer that gets stuck, can happen for namespaces etc.

Matt Gowie

04:58:15 PM

That ingress does have a finalizers entry in the metadata. How do you typically deal with that? I’ll start googling around that.

Matt Gowie

04:59:24 PM

Ah I delete the finalizers block and that should do the trick?

Matt Gowie

04:59:29 PM

Giving that a shot now.

Matt Gowie

05:04:47 PM

And that did the trick — Awesome. Thank you @kskewes!

kskewes

05:08:25 PM

Oh awesome good job. Sorry was on mobile and juggling stuff.

Andrew Nazarov

07:42:11 AM

We had some troubles with finalazers as well. Couldn’t remove the resource - it was recreated automatically right after the deletion. Cool, that you’ve already found a solution.

Andrew Nazarov

07:45:47 AM

It was a third-party resource which troubled us and the solution was to patch it like:

kubectl patch crd/restoresessions.stash.appscode.com -p '{"metadata":{"finalizers":[]}}' --type=merge

Matt Gowie

02:58:31 PM

Yeah, I did a very similar patch. It’s impressive because I had removed the underlying CRD which was responsible for finalizing it… and yet it still was sticking around.

2020-11-03

Roderik van der Veer

07:22:00 AM

What is the goto PVC -> S3 backup solution (helm chart) nowadays?

Erik Osterman (Cloud Posse)

08:33:31 PM

is s3 strictly the requirement? or is the requirement to backup the PV

Erik Osterman (Cloud Posse)

08:34:07 PM

The reason I ask is native snapshots is a feature (beta) of 1.17

https://kubernetes.io/blog/2019/12/09/kubernetes-1-17-feature-cis-volume-snapshot-beta/

Roderik van der Veer

08:34:27 PM

Not a requirement at all, just need a backup of the files off the cluster

Erik Osterman (Cloud Posse)

08:35:31 PM

There’s also Velero by VMware. https://github.com/vmware-tanzu/velero

vmware-tanzu/velero

Backup and migrate Kubernetes applications and their persistent volumes - vmware-tanzu/velero

Erik Osterman (Cloud Posse)

08:35:36 PM

(I haven’t tried either one)

Roderik van der Veer

08:35:43 PM

Found stash but it had some weird licensing. We deploy clusters on demand and need some “I messed up” backups

2020-11-04

11:03:47 AM

Qovery Engine – open-source multi-cloud deployment - https://github.com/Qovery/engine

Qovery/engine

Deploy your apps on any Cloud providers in just a few seconds - Qovery/engine

11:04:44 AM

So this will deploy kubernetes in your clouds, which deploy worker nodes, which deploy containers on nodes

Qovery/engine

Deploy your apps on any Cloud providers in just a few seconds - Qovery/engine

11:05:13 AM

Now we just need an automated deploy tool to deploy quovery lol

Shreyank Sharma

03:50:38 PM

Hello Experts.

we are running Kubernetes in AWS deployed using Kops, for backup purpose we are using Velero with Restic integration, I am new to Velero. And I wanted to know under what condition Velero will take EBS snapshot and and under what condition Velero will backup using Restic repo. for PV’s. Because we are having multiple PV’s and all the PV’s are annotated with kubectl annotate pod/<pod-name> [backup.Velero.io/backup-volumes=](http://backup.Velero.io/backup-volumes=)<pvcname>, but some PV’s are backed-up using EBS Snapshot and some are backed up using Restic.

Thank you.

Erik Osterman (Cloud Posse)

05:13:06 PM

fwiw, kops is constantly snapshotting etcd and backing it up to S3

Erik Osterman (Cloud Posse)

05:13:35 PM

aha, you want to do PV snapshots, so ignore my comment.

Erik Osterman (Cloud Posse)

08:30:18 PM

https://kubernetes.io/blog/2019/12/09/kubernetes-1-17-feature-cis-volume-snapshot-beta/

Kubernetes 1.17 Feature: Kubernetes Volume Snapshot Moves to Beta

Authors: Xing Yang, VMware & Xiangqian Yu, Google The Kubernetes Volume Snapshot feature is now beta in Kubernetes v1.17. It was introduced as alpha in Kubernetes v1.12, with a second alpha with breaking changes in Kubernetes v1.13. This post summarizes the changes in the beta release. What is a Volume Snapshot? Many storage systems (like Google Cloud Persistent Disks, Amazon Elastic Block Storage, and many on-premise storage systems) provide the ability to create a “snapshot” of a persistent volume.

Matt Gowie

04:54:03 PM

Hey @Andriy Knysh (Cloud Posse) @Erik Osterman (Cloud Posse) — Low priority, so take your time in getting back, but I see you guys use {{event.tags.cluster_name}} a bunch in your DD monitors.yaml. I’m not finding that variable available in the message content for my monitor, but my metrics/events do have that tag. Did you folks have to do something specific to enable more variables in scope of that message content? I’m struggling with that right now.

Erik Osterman (Cloud Posse)

05:10:45 PM

See the PR from yesterday

Erik Osterman (Cloud Posse)

05:11:33 PM

https://github.com/cloudposse/terraform-datadog-monitor/pull/20

Update Datadog monitors by aknysh · Pull Request #20 · cloudposse/terraform-datadog-monitor

what Update datadog monitors In the example. refactor the monitors into separate YAML files per category why When monitor type = query alert, Datadog does not look into the event tags, so we can…

Matt Gowie

05:12:18 PM

Ha got it. Thanks man.

Andriy Knysh (Cloud Posse)

05:13:12 PM

the tags are only available when monitors types are metrics or event

Andriy Knysh (Cloud Posse)

05:13:31 PM

for query type monitors, not available

Matt Gowie

05:35:03 PM

That’s what I gathered as well, but wasn’t sure if ya’ll were enabling something I was missing. Glad I caught right after you learned the same thing.

Eric Berg

10:12:44 PM

Both of our ingresses are set up like this: ELB -> service -> pods. Does the ingress just pass the request to the service and let the service determine which node the pod will run on? I’m trying to get requests that come in to a node to pass the request to a service pod on the same node.

mfridh

10:14:58 PM

This is new actually, finally coming to Kubernetes; https://kubernetes.io/docs/concepts/services-networking/service-topology/

Service Topology

FEATURE STATE: Kubernetes v1.17 [alpha] Service Topology enables a service to route traffic based upon the Node topology of the cluster. For example, a service can specify that traffic be preferentially routed to endpoints that are on the same Node as the client, or in the same availability zone. Introduction By default, traffic sent to a ClusterIP or NodePort Service may be routed to any backend address for the Service.

Eric Berg

10:16:04 PM

That suggests that it’s not (easily) possible, currently.

mfridh

10:16:54 PM

Nah, it does seem a little cryptic but actually read on… https://kubernetes.io/docs/concepts/services-networking/service-topology/#prefer-node-local-zonal-then-regional-endpoints

Service Topology

mfridh

10:22:54 PM

The EndpointSlice Controller: This controller maintains EndpointSlices for Services and the Pods they reference. This is controlled by the EndpointSlice feature gate. It has been enabled by default since Kubernetes 1.18. So… 1.18 ? or maybe 1.19 is required … damn, now I’m unsure.
Kube-Proxy: When kube-proxy is configured to use EndpointSlices, it can support higher numbers of Service endpoints. This is controlled by the EndpointSliceProxying feature gate on Linux and WindowsEndpointSliceProxying on Windows. It has been enabled by default on Linux since Kubernetes 1.19

mfridh

10:23:14 PM

Have this on TODO, hadn’t yet tried it.

Eric Berg

10:25:26 PM

Well, this gives me good info to go on. Thanks. We’re at EKS 1.17 and I noticed that there are a few changes to get to 1.18 that I have to take a deeper look at. I’ll add this to the pile.

mfridh

10:26:38 PM

this seems like 1.18 is good to go

mfridh

10:26:44 PM

To enable service topology, enable the ServiceTopology and EndpointSlice feature gate for all Kubernetes components:

mfridh

10:26:50 PM

both of those are enabled default on 1.18

mfridh

10:27:23 PM

the second one I mentioned for kube-proxy (EndpointSliceProxying) isn’t mentioned as needed as far as I can tell.

Eric Berg

03:38:15 PM

A little more research led me to understand that, in my current config (load balancer -> service), the path to the service pod that handles the request is already optimized, since the ingress-nginx controller populates the LB with the service pod IP:PORT configs, so leaving the LB, the requests go right to the service pods on whatever node the ELB chooses. If my understanding is correct, then I don’t need to implement any additional affinity or routing. Does that sound right to yoU?

mfridh

10:22:35 PM

If the pods are immediately in the LB then I guess you’re fine :).

mfridh

02:10:12 AM

I recall way back a really nifty example kubernetes service someone built… it was a simple http server responding back with the request headers, content etc, including extra detailed info about the current pod which responded…. anyone have a clue which one it was?.. podinfo of course… my brain suddenly woke up from being teased enough

rei

06:01:07 AM

Take a look at this chart: https://gitlab.com/nicosingh/medium-deploy-eks-cluster-using-terraform-sample-app/-/tree/master/helm

helm · master · Nico Singh / medium-deploy-eks-cluster-using-terraform-sample-app

GitLab.com

rei

06:01:37 AM

It is a simple deployment with the whois docker container

Eric Berg

03:46:49 PM

Is it called echo-server?

rei

03:47:31 PM

It’s the whoami container: https://hub.docker.com/r/containous/whoami

rei

03:50:21 PM

You will something like this:

Hostname: sample-app-6c66c9c587-ssknb
IP: 127.0.0.1
IP: 10.aaa.aaa.aaa
RemoteAddr: 10.bbb.bbb.bbb:51158
GET / HTTP/1.1
Host: <insert-domain-here.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.111 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9,de;q=0.8
Sec-Fetch-Dest: document
Sec-Fetch-Mode: navigate
Sec-Fetch-Site: none
Sec-Fetch-User: ?1
Upgrade-Insecure-Requests: 1
X-Amzn-Trace-Id: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
X-Forwarded-For: 10.ccc.ccc.ccc, 10.ddd.ddd.ddd
X-Forwarded-Host: <insert-domain-here.com>
X-Forwarded-Port: 443
X-Forwarded-Proto: https
X-Original-Forwarded-For: 10.ccc.ccc.ccc
X-Real-Ip: 10.ccc.ccc.ccc
X-Request-Id: <some-id>
X-Scheme: https

mfridh

10:20:30 PM

Yeah that’s a good one, I remember that one too from way back. Thanks. Helps when you want to see how middleware, ingress or proxies behave.

2020-11-05

2020-11-07

2020-11-09

rei

09:35:57 PM

https://sweetops.slack.com/archives/CB6GHNLG0/p1604957717165500

Hi, I am interested in knowing how do you organize your IaaC. looking for ideas. Currently we are building our new k8s based infrastructure, thus requiring Terraform, helm, helmfiles and gitlab ci. which is a good pattern to combine all this elements? monorepo? repo with submodules? script/makefile magic? what if the helmfiles and charts repos also contain stuff for the infra and main application?

2020-11-12

mfridh

08:38:38 PM

crossposting as might be relevant to kube too. Hope it’s useful to someone.

Sort of related to the Docker pull rate limit recently, although that wasn’t the primary reason I did it, I gathered some info and ideas from all over and set this sucker up to make my local k3d and argocd workflow bareable…. 1 minute 45 seconds instead of 12 minutes now for tearing down and bringing back up my full experimental localhost Kube stack.

https://github.com/frimik/local-docker-registry-proxy

Padarn

02:26:56 AM

Is anyone using https://github.com/kubernetes-sigs/external-dns with a private cluster?

If so, I assume this will only work using AWS CNI? So that the DNS can resolve to the private vpc IP, which (some) other places can then route too

kubernetes-sigs/external-dns

Configure external DNS servers (AWS Route53, Google CloudDNS and others) for Kubernetes Ingresses and Services - kubernetes-sigs/external-dns

Erik Osterman (Cloud Posse)

02:28:00 AM

We have used extrnal-dns with calico too

kubernetes-sigs/external-dns

Configure external DNS servers (AWS Route53, Google CloudDNS and others) for Kubernetes Ingresses and Services - kubernetes-sigs/external-dns

Erik Osterman (Cloud Posse)

02:28:11 AM

Here’s our #helmfile for it https://github.com/cloudposse/helmfiles/tree/master/releases/external-dns

cloudposse/helmfiles

Comprehensive Distribution of Helmfiles for Kubernetes - cloudposse/helmfiles

Padarn

02:29:29 AM

thanks - how does this work? what IPs does the DNS point to?

Erik Osterman (Cloud Posse)

03:24:44 AM

it’s the IPs of the Service

Erik Osterman (Cloud Posse)

03:25:13 AM

and a service can be of type load balancer which is internal or external

Erik Osterman (Cloud Posse)

03:25:21 AM

the details really depend a lot on your setup

Padarn

03:36:37 AM

ahh got it, thank you

rei

07:38:20 AM

We have this type of setup running. Private eks cluster with internal NLB. NLB connects to ingress-nginx which forwards to the corresponding application ingress object. Cert-manager with dns01 and external-dns for pointing domains to the NLB ALIAS record

rei

07:39:16 AM

Based mostly on cloudposse Terraform modules and helmfiles. Although the helmfiles do need tweaking

rei

07:44:07 AM

The last module solves the chicken-egg problem to map serviceAccounts with IAM service roles and the corresponding policies

cloudposse/terraform-aws-vpc-peering

Terraform module to create a peering connection between two VPCs in the same AWS account. - cloudposse/terraform-aws-vpc-peering

cloudposse/terraform-aws-eks-cluster

Terraform module for provisioning an EKS cluster. Contribute to cloudposse/terraform-aws-eks-cluster development by creating an account on GitHub.

cloudposse/terraform-aws-tfstate-backend

Terraform module that provision an S3 bucket to store the terraform.tfstate file and a DynamoDB table to lock the state file to prevent concurrent modifications and state corruption. - cloudposse…

Padarn

01:12:50 AM

Wow thanks a lot. Really useful

2020-11-13

Shreyank Sharma

05:57:40 PM

Am trying cluster migration in AWS, Both k8s clusters are in same region. Cluster 1 : Deployed 2 Application with PV reclaim policy one as Delete and another as Retain, and annotated so it will take Restic backup. Cluster 2: Restored those 2 applications, worked fine.

again Cluster 1: Deployed same 2 application with Reclaim policy as Delete and Retain but not annotated so it took snapshot when i backup. Cluster 2: Restore did not work as PV volume is failed to attach with the following Warning FailedAttachVolume pod/<pod-name> AttachVolume.Attach failed for volume "pvc-<id>" : Error attaching EBS volume "vol-<id>" to instance "i-<instance-id>": "UnauthorizedOperation: You are not authorized to perform this operation.

So, Snapshot restore feature will work in the same AWS region or am only getting this error????

2020-11-16

rei

10:25:52 AM

Are there any advantages on placing stuff like cert-manager, cluster-autoscaler , external-dns , aws-load-balancer-controller into the kube-system namespace, or isolate all this stuff into it’s own namespace? Almost all tutorials and example code use as default for this services/controllers the kube-system namespace, however there is are advantages on splitting everything into namespaces.

btai

06:47:10 PM

I think for quality of life/debugging purposes, a query via namespaces will be a tad bit faster than query via label within the kube-system. depending on the cluster size, the initial amount of pods running in kube-system can already be pretty high. for example, I like that our logging provider I can just query via namespace:cert-manager as opposed to namespace:kube-system label:app=?? or service=?? where you possibly have to remember how each 3rd party helm chart labels their deployments etc.

rei

09:14:42 PM

This is also my thinking. I like more the idea of grouping everything into its own namespace and there is even no limit in the number .

rei

09:16:38 PM

What make think about this was the following descriptios for the cluster-autoscaler extra argument:

extraArgs:
  [...]
  # If true cluster autoscaler will never delete nodes with pods from kube-system (except for DaemonSet or mirror pods)
  skip-nodes-with-system-pods: "false"

Apparently the kube-system pods are actually handled differently…

2020-11-17

Amit Karpe

03:00:01 PM

https://awscontainerdayk8s.splashthat.com/

AWS Container Day: Kubernetes Edition

Join us for AWS Container Day, a fully live, virtual day of sessions all about Amazon EKS and Kubernetes at AWS, hosted by Containers from the Couch. At this Day Zero KubeCon event, the AWS Kubernetes team will be discussing new launches, demoing products and features, covering best practices, and answering your questions live on Twitch. If you have a question before the event, send us a note at [email protected]!

joey

07:03:52 PM

https://github.com/hashicorp/terraform-provider-aws/issues/13643 anyone have any workarounds for this?

EKS Node Group Resource Tag Inheritance · Issue #13643 · hashicorp/terraform-provider-aws

Community Note Please vote on this issue by adding a reaction to the original issue to help the community and maintainers prioritize this request Please do not leave "+1" or other comme…

Erik Osterman (Cloud Posse)

05:50:34 AM

@Jeremy G (Cloud Posse) what did we end up doing for this?

EKS Node Group Resource Tag Inheritance · Issue #13643 · hashicorp/terraform-provider-aws

Community Note Please vote on this issue by adding a reaction to the original issue to help the community and maintainers prioritize this request Please do not leave "+1" or other comme…

Jeremy G (Cloud Posse)

06:06:36 AM

@Erik Osterman (Cloud Posse) We have not really found a solution, but we do what we can. We manually tag everything we create, and we sort of got around this related issue by specifying a custom launch template that tags everything it can.

[EKS] [request]: Nodegroup should support tagging ASGs · Issue #608 · aws/containers-roadmap

Community Note Please vote on this issue by adding a reaction to the original issue to help the community and maintainers prioritize this request Please do not leave "+1" or "me to…

joey

06:30:14 AM

doh, i was trying to avoid using a custom launch template but i suppose i could not be lazy and do that

joey

06:30:16 AM

thanks Jeremy and Erik

Jeremy G (Cloud Posse)

09:02:04 PM

@joey If you use our terraform-aws-eks-node-group module to create your node group, it will make a launch template which propagates tags if you set the resources_to_tag variable.

cloudposse/terraform-aws-eks-node-group

Terraform module to provision an EKS Node Group. Contribute to cloudposse/terraform-aws-eks-node-group development by creating an account on GitHub.

joey

09:19:42 PM

thank you i got aws_launch_template working

joey

09:20:12 PM

although i’m peeved that you can’t use instance_market_options and spot with aws_launch_template

joey

09:37:15 PM

conveniently i’m primarily using this in a one-off use case where i have to use ondemand instances for now anyways, but i wanted to keep a dev environment that was nearly identical running with spot instances

2020-11-18

06:26:19 PM

Best way to manage EKS?

2020-11-19

organicnz

12:18:26 PM

Hi guys, how could I fix this base64 decoding? It spews out this gibberish from Jenkins’ pod :)

printf $(kubectl get secret --namespace default jenkins-160573443 -o jsonpath="{.data.jenkins-admin-password}" | base64 --decode);echo
�K��ly�޷�jg���u�ں"�ϵ�N{߯5��#��

Matt Gowie

03:08:31 PM

You can try this:

export SECRET_NAME=$(kubectl get secrets -n namespace | grep pod-name- | cut -d ' ' -f 1)
kubectl describe secrets $SECRET_NAME -n namespace 

organicnz

05:56:01 PM

Drops bunch of tokens and jenkins-password: 10 bytes in bytes

https://pastebin.com/ZfnGaPkm

export SECRET_NAME=$(kubectl get secrets -n default | grep jenkins-1605xxxxxx-54 - Pastebin.com

Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.

organicnz

06:27:26 PM

kubectl get secret shows up jenkins-password in plain text, however, Jenkins doesn’t digest password with the user profile name :) pastebin.com/65bMbuED

organicnz

06:36:17 PM

Got help from Bitnami folks, turns out base64 decoding is buggy on macOS 11.0.1 - have to fix it then. For the testing purposes, I used that suggested website for decoding a password string. Thanks for your help mate :)

“Hi @organicnz, I’m not really sure what is happening there. When the chart is deployed, some notes are shown with information about the commands that you need to execute to obtain the credentials. You can obtain those notes again by executing: helm get notes YOUR_INSTALLATION_NAME You will find a command like the following one: echo Password: $(kubectl get secret –namespace default YOUR_INSTALLATION_NAME -o jsonpath=“{.data.jenkins-password}” | base64 –decode) If you want to do what the command above is doing in a more manual way, you can also get the secret and output as yaml and you will get a plain-text string that is base64-encoded. You need to decode it to get the actual password. If the implementation of your base64 utility is buggy, just for testing purposes, you can use an online tool like this one: https://emn178.github.io/online-tools/base64_decode.html Again, this is just for testing purposes, you should never share your password on the internet.”

kubectl get secret spews out a gibberish from Jenkins · Issue #4444 · bitnami/charts

Hey guys, how to fix this base64 decoding? It spews out this gibberish from bitnami/jenkins pod :) printf $(kubectl get secret –namespace default jenkins-1605736819 -o jsonpath="{.data.jenkin…

Base64 Decode Online

Base64 online decode function

Matt Gowie

08:59:05 PM

Hey folks — I want to switch a project off of DataDog log management to ELK due to cost. I’m looking for the best resources to do that — Any recommendations?

I’ve asked this here lightly before and I know the Cloud Posse approach is FluentD => Firehose => ElasticSearch. I’d like to implement something similar with FluentBit > FluentD (project is running Fargate so smaller sidecar containers + aws-for-fluentbit is attractive), but before I dive into implementing all of that I figured I should ask what’re the best resources / OSS / possible terraform modules I should pick up to accomplish this with the least amount of pain.

Erik Osterman (Cloud Posse)

04:45:11 PM

Make sure to evaluate the cost of ES. The amount of storage is limited by the instance size and it’s easy to get up to 500/mo for a cluster that barely stores a week of data and has HA. Then consider how many you want? You don’t want all logs going to the same cluster. Let’s say you have 4 accounts and 4 small clusters with HA. My point is it’s very easy to end up paying thousands of dollars a month for ES and the tight integration with APM and Metrics you get with Datadog is lacking. You’re also on the hook for developing and keeping it all up to date. I am curious if the cost/benefit really works out in favor of ES

Matt Gowie

02:01:38 AM

Yeah, valid point. I believe I’m only going to run HA and more than a day of log retention in production environments. The client is single tenant environment per customer (enterprise customers), so really it’ll be expensive anyway we run it AFAICT.

The problem I / the client had with DataDog logs was that we have 3 clusters up right now and the logs cost alone for the month is $1K / cluster, which is pretty insane. I think anyway we do it should be cheaper than that.

Vugar

11:21:09 AM

I guess as long as you can make sure that indexes are granular it should make it easy to snapshot indexes to S3 buckets… and remove these afterwards… But then logs are not available for Kibana… so not sure if these will be of any use at all?

Vugar

11:22:05 AM

AWS has ultrawarm storage of ES now… but as far as I know you can’t snapshot your indexes while on utlrawarm - but you better to doublecheck I could be wrong on this one

Vugar

11:24:45 AM

I wonder if sending these directly to S3 and performing searches via Athena will make it a bit more reasonable in terms of pricing…? But I am not sure if QA and Dev teams will like it… Dear @Erik Osterman (Cloud Posse) have you had any experience with Athena being used for log queries instead of indexing in ES and having all nice features of Kibana?

2020-11-20

2020-11-21

2020-11-22

Amit Karpe

06:18:32 AM

While upgrading EKS from 1.15 to 1.18 using eks module, do we have to upgrade step by step like 1.15 ==> 1.16 ==> 1.17 ==> 1.18 or I can directly modify cluster_version = "1.18" and tf apply will do all magic?

aaratn

06:25:01 AM

I dont think so

aaratn

06:26:04 AM

What I usually do is update on console and bump terraform to that version

Amit Karpe

06:26:59 AM

What is will pros/corn to use tf to build EKS and use EKS console to upgrade? Any unexpected side effect?

Aleksandr Fofanov

06:27:29 AM

I’m not sure if EKS will allow to do that, but you should upgrade one minor version at a time, because control plain components are only backward-compatible with the previous minor version, so you can’t really jump two versions ahead. https://kubernetes.io/docs/setup/release/version-skew-policy/ Even if EKS allows to do that, they will for sure upgrade one minor version at a time under the hood.

Aleksandr Fofanov

06:28:02 AM

Also, I would check if all of your cluster resources are compatible with the new version before upgrading.

Amit Karpe

06:29:25 AM

What kind of cluster resources? How you check? Do your create new EKS or sandbox EKS to test upgrade work or not and then you do it on the main eks cluster?

Aleksandr Fofanov

06:32:20 AM

Update: EKS won’t allow that.
Because Amazon EKS runs a highly available control plane, you can update only one minor version at a time. See Kubernetes Version and Version Skew Support Policy for the rationale behind this requirement. Therefore, if your current version is 1.16 and you want to upgrade to 1.18, then you must first upgrade your cluster to 1.17 and then upgrade it from 1.17 to 1.18. If you try to update directly from 1.16 to 1.18, then the update version command throws an error. from here: https://docs.aws.amazon.com/eks/latest/userguide/update-cluster.html

Updating an Amazon EKS cluster Kubernetes version - Amazon EKS

When a new Kubernetes version is available in Amazon EKS, you can update your cluster to the latest version.

Amit Karpe

06:35:04 AM

I have impression that EKS module will manage upgrade for us. i.e. Instead of running tf apply for 3 times. If we declare cluster_version = "1.18" then EKS module will do it same as 1.15 ==> 1.16 ==> 1.17 ==> 1.18 Do you think my assumption is correct?

Aleksandr Fofanov

06:35:35 AM

Nope, I don’t think that it will do that for you.

aaratn

06:36:02 AM

You can always check source of module if there auto upgrade magic in it ( which I am mostly sure isn’t the case)

Amit Karpe

06:36:59 AM

Thank you @aaratn & @Aleksandr Fofanov I think my assumption is wrong!!!

Aleksandr Fofanov

06:38:24 AM

@Amit Karpe Regarding how to find resources which uses deprecated APIs before upgrading.

Aleksandr Fofanov

06:38:26 AM

https://blog.doit-intl.com/kubernetes-how-to-automatically-detect-and-deal-with-deprecated-apis-f9a8fc23444c

Kubernetes: How to automatically detect and deal with deprecated APIs attachment image

Instead of doing this manually, we built Kube-No-Trouble to do it for you.

Aleksandr Fofanov

06:39:08 AM

Since you are jumping from 1.15 to 1.16 it’s worth checking if your cluster have any

Aleksandr Fofanov

06:40:50 AM

And also there is pluto tool too

Aleksandr Fofanov

06:40:51 AM

https://www.fairwinds.com/blog/kubernetes-easily-find-deprecated-api-versions-with-pluto

Kubernetes Solved: Easily Find Deprecated API Versions with Pluto attachment image

Pluto is an open source utility to help users easily find deprecated Kubernetes API versions in the Infrastructure-as-Code repositories and Helm releases.

tim.j.birkett

11:50:41 AM

Pluto helped us in the 1.15 -> 1.16 migration. With Helm, this plugin helped too: https://github.com/hickeyma/helm-mapkubeapis - When you upgrade to 1.16 your running Kubernetes deployments will be transposed in-place and everything keeps running. The problem comes when you try to make a deployment (or rollback) and it fails halting that applications pipeline, which is of course, the end of the world.

hickeyma/helm-mapkubeapis

This is a Helm plugin which map deprecated or removed Kubernetes APIs in a release to supported APIs - hickeyma/helm-mapkubeapis

2020-11-23

tim.j.birkett

11:52:47 AM

Did anybody else get hit by the EKS AMI - https://github.com/awslabs/amazon-eks-ami/releases/tag/v20201112 - that was fun

Release AMI Release v20201112 · awslabs/amazon-eks-ami

[RECALLED] AMI Release v20201112 amazon-eks-gpu-node-1.18-v20201112 amazon-eks-gpu-node-1.17-v20201112 amazon-eks-gpu-node-1.16-v20201112 amazon-eks-gpu-node-1.15-v20201112 amazon-eks-arm64-node-1…

pjaudiomv

12:43:48 PM

im totally using that ami

Release AMI Release v20201112 · awslabs/amazon-eks-ami

[RECALLED] AMI Release v20201112 amazon-eks-gpu-node-1.18-v20201112 amazon-eks-gpu-node-1.17-v20201112 amazon-eks-gpu-node-1.16-v20201112 amazon-eks-gpu-node-1.15-v20201112 amazon-eks-arm64-node-1…

tim.j.birkett

01:30:53 PM

So, it doesn’t impact the service of applications if you run mutiple instances on different nodes… But, what you’ll see is application restarts, nodes randomly become NotReady . If you have a fair bit of monitoring of things like replica counts, kube node status you’ll get a bit of noise. Along with perhaps, developers claiming that they should move back to Beanstalk, Fargate, Cloud Provider X as AWS should not have let a buggy AMI get out…

tim.j.birkett

01:32:02 PM

The latest AMI works fine: https://github.com/awslabs/amazon-eks-ami/releases/tag/v20201117

Release AMI Release v20201117 · awslabs/amazon-eks-ami

AMI Release v20201117 amazon-eks-gpu-node-1.18-v20201117 amazon-eks-gpu-node-1.17-v20201117 amazon-eks-gpu-node-1.16-v20201117 amazon-eks-gpu-node-1.15-v20201117 amazon-eks-arm64-node-1.18-v202011…

btai

10:34:27 PM

yeah i was bit by that bug but the 20201117 amis are fine though

2020-11-24

Vugar

11:27:27 AM

Greetings! I was wondering if anyone had any chance to play with crossplane? Would you know if it is somewhat comparable to TF cloud operator?

Crossplane

Manage any infrastructure your applications need directly from Kubernetes

tim.j.birkett

04:26:23 PM

Do you have a link to TF cloud operator?

Crossplane

Manage any infrastructure your applications need directly from Kubernetes

tim.j.birkett

04:26:56 PM

I’m thinking of looking at crossplane. It looks pretty good.

Vugar

06:58:14 PM

Yep here is TF cloud operator link: https://github.com/hashicorp/terraform-k8s

hashicorp/terraform-k8s

Terraform Cloud Operator for Kubernetes. Contribute to hashicorp/terraform-k8s development by creating an account on GitHub.

Vugar

06:58:57 PM

I wonder if crossplane has anything similar to TF state… haven’t had a chance to look into documentation yet

Hao Wang

10:09:48 PM

pretty neat project, thanks for sharing

Aumkar Prajapati

07:03:15 PM

Hey all, working with creating a new cluster, basically running into an issue where basically the alb-ingress-controller can’t see any subnets on an ingress being created despite those subnets existing, any ideas?

aws-load-balancer-controller-5d96f6c4f6-vq86z controller {"level":"error","ts":1606243773.951048,"logger":"controller","msg":"Reconciler error","controller":"ingress","name":"grafana","namespace":"monitoring","error":"couldn't auto-discover subnets: unable to discover at least one subnet"}

Erik Osterman (Cloud Posse)

07:03:53 PM

(Note the alb-ingress-controller is now the https://github.com/kubernetes-sigs/aws-load-balancer-controller)

kubernetes-sigs/aws-load-balancer-controller

A Kubernetes controller for Elastic Load Balancers - kubernetes-sigs/aws-load-balancer-controller

Aumkar Prajapati

07:04:10 PM

Yup, that’s the one I’m using ^^

Erik Osterman (Cloud Posse)

07:04:27 PM

I think this usually happens if the subnets aren’t tagged properly

Aumkar Prajapati

07:04:36 PM

I used that combined with this guide to set it up https://kubernetes-sigs.github.io/aws-load-balancer-controller/latest/guide/controller/installation/

Aumkar Prajapati

07:05:01 PM

What should the subnets be tagged with? I have 3 public and 3 private subnets, attached to a single vpc that the eks cluster is a part of

Aumkar Prajapati

07:05:51 PM

Here’s the terraform used to create the vpcs / subnets

module "us-prod-eks-vpc" {
  source = "terraform-aws-modules/vpc/aws"

  providers = { aws = aws.us }
  name = "us-prod-eks-vpc"
  cidr = "10.0.0.0/16"

  azs             = ["us-east-1a", "us-east-1b", "us-east-1c"]
  private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
  public_subnets  = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]

  enable_nat_gateway = true
  enable_vpn_gateway = false

  tags = {
    Name = "us-prod-eks-vpc"
    Environment = "prod-us"
    Region = "us-east-1"
  }
}

Erik Osterman (Cloud Posse)

07:05:54 PM

See https://docs.aws.amazon.com/eks/latest/userguide/alb-ingress.html

Application load balancing on Amazon EKS - Amazon EKS

You can load balance application traffic across pods using the AWS Application Load Balancer (ALB). To learn more, see What is an Application Load Balancer? in the Application Load Balancers User Guide . You can share an ALB across multiple applications in your Kubernetes cluster using Ingress groups. In the past, you needed to use a separate ALB for each application. The controller automatically provisions AWS ALBs in response to Kubernetes Ingress objects. ALBs can be used with pods deployed to nodes or to AWS Fargate. You can deploy an ALB to public or private subnets.

Aumkar Prajapati

07:06:46 PM

I’ll take a look, thanks

Aumkar Prajapati

07:31:53 PM

Okay so I did the tags and they indeed were missing from the subnets, added those in, still results in the same error :s

Aumkar Prajapati

07:32:17 PM

  public_subnet_tags = {
    "kubernetes.io/role/elb" = "1"
  }

  private_subnet_tags = {
    "kubernetes.io/role/internal-elb" = "1"
  }

basically were added in

btai

10:29:08 PM

try adding this tag to your VPC

locals {
  tags = {   "kubernetes.io/cluster/${local.name}" = "shared"
  }
}

Emmanuel Gelati

09:03:22 PM

I confirm, using the above tag solved the issue for me too

Erik Osterman (Cloud Posse)

05:14:57 PM

If anyone wants to open up a PR to document this - that would probably save many others grief :-)

Padarn

04:01:36 AM

Hi guys - can someone hep explain to me how port-forward works under the hood. I got a bit confused when I saw that the rbac permissions required need “create” permissions:

 rule {
    apiGroups = [""]
    resources = ["pods/portforward"]
    verbs     = ["get", "list", "create"]
  }

Dhrumil Patel

06:27:14 AM

Hi all, is anyone know any tools that can provide SAML authentication in Kubernetes EKS cluster It may possible using HashiCorp Boundary but I want to explore other tools…

Padarn

06:56:04 AM

does something like https://github.com/heptiolabs/gangway help?

heptiolabs/gangway

An application that can be used to easily enable authentication flows via OIDC for a kubernetes cluster. - heptiolabs/gangway

Padarn

06:57:14 AM

its OpenID based but maybe it can help with your use case

Dhrumil Patel

08:30:03 AM

Thanks I will check…

loren

02:16:00 PM

teleport? https://goteleport.com/blog/kubernetes-sso-saml/

How to Set Up Kubernetes SSO with SAML attachment image

Kubernetes doesn’t support native SAML integration. Learn how to configure SAML single sign on (SSO) for Kubernetes clusters with user impersonation.

Padarn

05:28:12 AM

oh thats nice @loren

Padarn

05:28:16 AM

thanks foor sharing

2020-11-25

Amit Karpe

07:33:29 AM

Any idea, if we corrupt the configmap aws-auth, then how to recover it? Once configmap aws-auth settings got corrupted no one can access EKS Cluster. Any workaround?

Issif

08:23:35 AM

one user can still access to your cluster, it’s the user you used for creation

Issif

08:23:59 AM

for our part, it’s a dedicated user for terraform, as we apply from a CI/CD

Amit Karpe

09:22:39 AM

Thank you. We able to revert the bad changes by EKS creator user

Issif

09:34:18 AM

perfect

2020-11-26

mfridh

08:46:55 PM

Do you guys (iptables or network policy) block the EC2 metadata api or redirect to a metadata proxy for containers to remain “sane” when providing iam roles via the “native” eks service role method?

Erik Osterman (Cloud Posse)

11:03:49 PM

When we were using kiam on kops, we would do this, but haven’t carried it over to EKS.

Erik Osterman (Cloud Posse)

11:04:17 PM

If it would help, I can dig up the rules we used

mfridh

08:48:07 AM

No need Erik. Thanks.

I think the cleanest way is a metadata proxy.

If blocking metadata completely some apps might complain if they expect or need certain metadata. However. In most cases the needs are simple and providing an explicit AWS_REGION environment variable does the trick.

I think if I was to provide some form of multi tenant cluster with EC2 like promises (which I’m not really in the business of doing right now) I might be obliged to provide some emulation of the regular EC2 metadata.

2020-11-27

2020-11-28

Padarn

09:15:37 AM

Hi all - what are some approaches for application config management in kubernetes? A few topics I’m interested in

• dynamic configuration (say for example configuration a feature flag in an app)

• deploying applications that share configuration (or secrets) Just looking for some projects to look into to get a feel for what people are doing

Erik Osterman (Cloud Posse)

04:55:49 PM

Good question for office hours

Erik Osterman (Cloud Posse)

04:56:27 PM

Some ideas are the various operators for managing secrets like external secrets from SSM, ASM, Vault

Erik Osterman (Cloud Posse)

04:57:06 PM

The Reloader project from stakater to automatically roll replica sets when secrets or config maps change

Erik Osterman (Cloud Posse)

04:57:24 PM

#helmfile for managing configuration of helm releases

Vlad Ionescu (he/him)

05:18:49 PM

Dynamic config: LaunchDarkly, Split, or whatever hosted provider. I saw many re-implementations fail spectacularly

Deploying apps sharing the same config: beware creating a distributed monolith! I’d vote for helmfile / ParameterStore / shared ConfigMap.

Erik Osterman (Cloud Posse)

05:21:09 PM

Flagger is an open source alternative for launchdarkly https://github.com/weaveworks/flagger

weaveworks/flagger

Progressive delivery Kubernetes operator (Canary, A/B Testing and Blue/Green deployments) - weaveworks/flagger

Vlad Ionescu (he/him)

05:25:56 PM

Alternative-ish.

Flagger does Blue/Green in all its forms (canary, A/B, mirroring). Let’s take an example with 1% seeing the new feature. That is a random 1% of requests.

With LaunchDarkly (or whatever else feature flagging tool) you get to chose the 1%: instead of random you could show the feature to users whose email ends in [mycorp.com](http://mycorp.com), or the marketing team, or that specific QA team that will test the thing, or users who are on the free plan, or users who opt in to seeing beta features, or the client who this feature is developed for, and so on. You get a lot more control on risk allocation.

Padarn

11:31:54 PM

Thanks guys, a lot of good reading to look into

Padarn

11:32:08 PM

I’ll move it to #office-hours for any follow up

tim.j.birkett

09:17:25 AM

https://github.com/Unleash/unleash - has caught the attention of some developers I work with for feature flagging. I’d be keen to hear from anyone that might have used it? Or, share experience when I have gained it…

I used to work for the transport side of booking.com (rentalcars.com at the time) and they had an in-house A/B experiments framework in use on the frontend (horribly conditional JSPs). I think Optimizely (SaaS) is fairly popular in the front-end “which colour blue” arena.

Unleash/unleash

Unleash is the open source feature toggle service. - Unleash/unleash

Andy Miguel

07:06:13 PM

https://www.youtube.com/watch?v=luZgWGhUqkU

Cloud Posse Explains: Application Config Management in Kubernetes

Padarn

02:21:08 AM

nice!

2020-11-29

loren

03:11:34 AM

Has me nodding along, kube already feels like legacy to me, it will be replaced by something sooner than later (by the founder of tailscale, which just came up in #random)… https://blog.dave.tf/post/new-kubernetes/

A better Kubernetes, from the ground up · blog.dave.tf

If we were to start over from first principles, what would I do differently than k8s?

Matt Gowie

02:48:46 PM

This is what led me to checking out tailscale

A better Kubernetes, from the ground up · blog.dave.tf

If we were to start over from first principles, what would I do differently than k8s?

Matt Gowie

02:49:18 PM

Some interesting ideas in there for sure. But don’t think that they’ll be implemented too quickly. Maybe we’ll see something competitive in a couple years.

loren

03:07:12 PM

i consider that reasonably soon

Matt Gowie

03:07:39 PM

Yeah solid point

#kubernetes (2020-11)

2020-11-01

2020-11-02

2020-11-03

2020-11-04

2020-11-05

2020-11-07

2020-11-09

2020-11-12

2020-11-13

2020-11-16

2020-11-17

2020-11-18

2020-11-19

2020-11-20

2020-11-21

2020-11-22

2020-11-23

2020-11-24

2020-11-25

2020-11-26

2020-11-27

2020-11-28

2020-11-29

2020-11-30