SweetOps #kubernetes for September, 2021

Archive: https://archive.sweetops.com/kubernetes/

2021-09-01

2021-09-02

Carmelo

Hi guys, I am looking for a “user friendly” solution to manage multiple clusters for a customer. In the end I’m between Rancher and Kubesphere, has anyone here used any of these solutions in production?. They are using EKS (AWS). Thanks

Andrew Nazarov

11:17:20 AM

Probably you’ll find this video interesting. It’s a Victor Farcic’s review of Kubesphere

https://www.youtube.com/watch?v=1OOLeCVWTXE

KubeSphere - Kubernetes Platform For Cloud-Native App Management

2021-09-03

Adnan

08:53:39 AM

Hi People, anyone ever had this issue with the AWS ALB Ingress controller:

failed to build LoadBalancer configuration due to failed to resolve 2 qualified subnet with at least 8 free IP Addresses for ALB. Subnets must contains these tags: 'kubernetes.io/cluster/my-cluster-name': ['shared' or 'owned'] and 'kubernetes.io/role/elb': ['' or '1']. See <https://kubernetes-sigs.github.io/aws-alb-ingress-controller/guide/controller/config/#subnet-auto-discovery> for more details.

So there three subnets with the appropriate tagging and many ips I could not yet find the reason why it is complaining about the subnets

11:35:18 AM

Perhaps there aren’t enough free ips in those tagged subnets?

Adnan

11:38:37 AM

there are only few nodes running, and there are thousands of ips i added another tag on the subnets so they look like this now:

kubernetes.io/cluster/my-cluster-name shared
kubernetes.io/role/internal-elb 1
kubernetes.io/role/elb

now the AWS ALB Ingress controller starts successfully and registers the targets in the target group but all my requests to any application in the cluster are timing out

11:51:38 AM

Sounds like the first problem was solved. Nice job!

New problem seems like a misconfiguration in the pod that utilizes this new controller?

Adnan

06:31:45 AM

The kubernetes.io/role/elb tag was missing on the public subnets

2021-09-05

2021-09-07

sheldonh

10:23:31 PM

:helm: New to k8 and helm.

Need to define multiple pieces of my internal app, some based on public helm charts, others just internal containers.

I started with kompose and converted Docker compose files to give me a headstart on what might be contained in k8 yaml schema, but not clear if I need to create my own helm charts or not. I’m since I’m not going to reuse these pieces in other projects, I’m assuming I don’t need helm charts.

If I draw some similarity to Terraform…. would a helm chart be like a terraform module, and the k8 schema yaml be similar to a “root module”? If that parallel applies, then I’d only worry about helm charts when consuming a prebuilt resource or trying to reuse in different places in the company. If it’s a standalone root application definition, I’m assuming I’ll just do this without helm.

How far off am I? #k8newbie

sheldonh

10:39:59 PM

Update: I am reading more on this and see that there are benefits for internal use too that allow using the same deployment with easier templating approach.

Example

helm install --set env=dev --replicates=1

with less templating configs required as it would allow me to set my templating values dynamically. I’m guessing kubectl has this with the overrides file, but perhaps a bit less flexible or easier long term.

sheldonh

10:40:34 PM

Rollbacks also seem to be really smooth, though again i’m guessing kubectl has similar features just be referencing prior source.

sheldonh

10:41:53 PM

Another pro is that you can go beyond the schema of the app and also handle application level configuration. My guess is that that’s where k8 operators would be required to better handle application level configuration actions.

sheldonh

03:31:15 PM

any quick insight on approaching an internal deployment with helm? A lot to learn so making sure I don’t focus on the wrong thing as I try to migrate from terraform ecs infra to kubernetes.

cc @Erik Osterman (Cloud Posse) would welcome any quick insight as this is all new to me from you or your team.

Erik Osterman (Cloud Posse)

03:51:07 PM

yes, a helm chart is a lot like a terraform module in the sense that you bundle up the “complexity” into a package and then expose some inputs as your declarative interface

Erik Osterman (Cloud Posse)

03:51:36 PM

also, we’ve relatively recently released https://github.com/cloudposse/terraform-aws-helm-release

GitHub - cloudposse/terraform-aws-helm-release: Create helm release and common aws resources like an eks iam role attachment image

Create helm release and common aws resources like an eks iam role - GitHub - cloudposse/terraform-aws-helm-release: Create helm release and common aws resources like an eks iam role

Erik Osterman (Cloud Posse)

03:51:52 PM

which we’re using to more easily deploy helm releases to EKS

sheldonh

03:52:30 PM

So if I’m newer to this and I’m basically dealing with a root module/application deployment and need just env flexibility, but not a lot of other flexibility or sharing…. do I still stick to using helm or stick with k8 yaml instead? Where do I spend the effort?

sheldonh

03:52:39 PM

A lot to learn

sheldonh

03:52:43 PM

Gotta narrow the scope

Erik Osterman (Cloud Posse)

06:19:35 PM

there are 2 major camps right now: kustomize and helm

Erik Osterman (Cloud Posse)

06:19:57 PM

I would first master the raw resources, to learn/appreciate the alternative ways then to manage them.

Erik Osterman (Cloud Posse)

06:20:18 PM

then look forward at tools like ArgoCD/Flux - not that you will use them, but understand how they fit in to the picture.

Erik Osterman (Cloud Posse)

06:21:05 PM

will bring up on #office-hours today

sheldonh

08:14:40 PM

thank you. I’ll stick with native k8 schema then as I really have to dial in the basics first and then can dive into others as I go. The less abstractions the better right now as I try to prepare team for such a big jump Doing my best to resist using Pulumi too for now

Erik Osterman (Cloud Posse)

08:46:49 PM

lol, yes, resist the urge until you appreciate the fundamentals and the limitations.

mfridh

11:02:37 PM

All roads lead to jsonnet. . Seems to be for me at least…

mfridh

11:02:56 PM

Grafana Tanka… is pretty awesome to be honest.

azec

10:55:21 PM

Reading https://github.com/cloudposse/terraform-aws-eks-node-group/blob/780163dacd9c892b64b988077a994f6675d8f56d/MIGRATION.md to be able to jump to the module 0.25.0 version (had recent overhaul).

Seems like remote_access_enabled was removed from the module, but not documented in the migration guide to 0.25.0 …

terraform-aws-eks-node-group/MIGRATION.md at 780163dacd9c892b64b988077a994f6675d8f56d · cloudposse/terraform-aws-eks-node-group attachment image

Terraform module to provision an EKS Node Group. Contribute to cloudposse/terraform-aws-eks-node-group development by creating an account on GitHub.

Brad McCoy

02:41:19 AM

Join us for a hands-on lab to implement Argo CD with ApplicationSets the new way of bootstrapping your cluster in Kubernetes. Friday 8:30 AEST

Thursday 3:30 https://community.cncf.io/events/details/cncf-cloud-native-dojo-presents-hands-on-lab-getting-started-with-argocd/

Hands on Lab - Getting started with ArgoCD | CNCF attachment image

I’m attending CNCF Cloud Native Dojo w/ Hands on Lab - Getting started with ArgoCD on Sep 10, 2021

2021-09-08

2021-09-10

Adnan

07:15:33 AM

Hi People, Wanted to ask about experiences upgrading kubernetes eks versions. I recently did an upgrade from 1.19 to 1.20. After the upgrade some of my workloads are experiencing weird high cpu spikes. But correlation does not equal causation so I wanted to ask if anyone here experienced something similar.

2021-09-13

Mithra

09:45:50 PM

Hello all, Can any one help with the Azure Kubernetes service please what if the namespace is accidentally deleted Is there any recovery process (Disaster Recovery). Any inputs from the team please. ~ Thanks much appreciated.

Max Lobur (Cloud Posse)

12:54:23 PM

AFAIK there’s no way to revert this out of the box. What pipeline did you use to get the yamls into that namespace? Usually the expectation is that the pipeline is easily repeatable, that’s why not too many talks around recoveries.

Max Lobur (Cloud Posse)

12:55:40 PM

If you still going to approach the problem from a backup/recovery side, there are couple of cloud-generic projects to achieve what you want:

• https://velero.io/

• https://github.com/pieterlange/kube-backup

GitHub - pieterlange/kube-backup: Kubernetes resource state sync to git attachment image

Kubernetes resource state sync to git - GitHub - pieterlange/kube-backup: Kubernetes resource state sync to git

Emmanuel Gelati

11:53:49 AM

use gitops and problem solved

2021-09-14

2021-09-18

zadkiel

11:41:16 AM

Hey there! I’m trying to go further with my multi tenant cluster and want to show only their namespaces to my teams. I did not find a way to reduce the number of shown namespaces when I do a k get ns. Any idea how I can get this done?

Erik Osterman (Cloud Posse)

04:08:28 PM

https://kubernetes.io/blog/2020/08/14/introducing-hierarchical-namespaces/

Introducing Hierarchical Namespaces

Author: Adrian Ludwin (Google) Safely hosting large numbers of users on a single Kubernetes cluster has always been a troublesome task. One key reason for this is that different organizations use Kubernetes in different ways, and so no one tenancy model is likely to suit everyone. Instead, Kubernetes offers you building blocks to create your own tenancy solution, such as Role Based Access Control (RBAC) and NetworkPolicies; the better these building blocks, the easier it is to safely build a multitenant cluster.

2021-09-19

2021-09-24

Shreyank Sharma

11:26:49 AM

Hello all,

We are using Kubernetes in AWS, deployed using kops. We are using Nginx as our ingress controller, it was working fine for almost 2 years. but recently we started getting 502 bad gateway issues in multiple pods randomly.

ingress log shows 502

[23/Sep/2021:10:53:43 +0000] "GET /service HTTP/2.0" 502 559 "<https://mydomain/>" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Safari/537.36" 4691 0.040 [default-myservice-80] 100.96.13.157:80, 100.96.13.157:80, 100.96.13.157:80 0, 0, 0 0.000, 0.000, 0.000 502, 502, 502 258a09eaaddef85cae2a0c2f706ce06b
..
[error] 1050#1050: *1352377 connect() failed (111: Connection refused) while connecting to upstream, client: CLIENT_IP_HERE , server: [my.domain.com](http://my.domain.com) , request: "GET /index.html HTTP/2.0", upstream: "<http://POD_IP:8080/index.html>", host: "[my.domain.com](http://my.domain.com)", referrer: "<https://my.domain/index.html>"

We tried connecting to pod-ip which gave 502 from ingress pod

www-data@nginx-ingress-controller-664f488479-7cp57:/etc/nginx$ curl 100.96.13.157
curl: (7) Failed to connect to 100.96.13.157 port 80: Connection refused

it showed connection refuced

We monitored tcpdump traffic from the node where the pod gave 502

root@node-ip:/home/admin# tcpdump -i cbr0 dst 100.96.13.157
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on cbr0, link-type EN10MB (Ethernet), capture size 262144 bytes
17:39:16.779950 ARP, Request who-has 100.96.13.157 tell 100.96.13.22, length 28
17:39:16.780207 IP 100.96.13.22.57610 > 100.96.13.157.http: Flags [S], seq 2263585697, win 26883, options [mss 8961,sackOK,TS val 1581767928 ecr 0,nop,wscale 9], length 0
17:39:21.932839 ARP, Reply 100.96.13.22 is-at 0a:58:64:60:0d:16 (oui Unknown), length 28


root@node-ip:/home/admin# ping 100.96.13.157
PING 100.96.13.157 (100.96.13.157) 56(84) bytes of data.
64 bytes from 100.96.13.157: icmp_seq=1 ttl=64 time=0.309 ms
64 bytes from 100.96.13.157: icmp_seq=2 ttl=64 time=0.042 ms
64 bytes from 100.96.13.157: icmp_seq=3 ttl=64 time=0.044 ms

it looks like pods can reach each other, and ping is working,

root@node-ip:/home/admin# tcpdump -i cbr0 src 100.96.13.157
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on cbr0, link-type EN10MB (Ethernet), capture size 262144 bytes
17:39:16.780076 ARP, Reply 100.96.13.157 is-at 0a:58:64:60:0d:9d (oui Unknown), length 28
17:39:16.780175 ARP, Reply 100.96.13.157 is-at 0a:58:64:60:0d:9d (oui Unknown), length 28
17:39:16.780238 IP 100.96.13.157.http > 100.96.13.22.57610: Flags [R.], seq 0, ack 2263585698, win 0, length 0
17:39:21.932808 ARP, Request who-has 100.96.13.22 tell 100.96.13.157, length 28

Here ingress is sending request but it’s been reset,(flag [R.] = RST-ACK in tcp dump) and http request is lost.

we don’t know where this connection is getting lost, we checked our service and pod labels, everything is configured properly. also most of the time my.domain.com is accessible and ISSUE LOOKS INTERMITTENT, is any other place we need to check for logs….?or has anyone experienced the same issue?

Thanks in advance

Steffan

04:56:21 PM

Wondering if anyone know how i can let pods spinned up by jenkins on eks to assume a role on the pod level so that i can give that role a cross account trust to another aws account B (where that role will have access to ECR for account B to pull its images)

Steffan

04:57:03 PM

i dont want to use service accounts because my setup is such that one jenkins serves multiple projects and creating a service account on node level is not something i want

azec

08:57:16 PM

Hello friends!

azec

08:58:17 PM

I am chasing down how to configure clusterDNS: <VALUE> setting for the Kubernetes node-group that is deployed using https://github.com/cloudposse/terraform-aws-eks-node-group/

Particularly this blob: https://github.com/weaveworks/eksctl/pull/550#issuecomment-464623865

GitHub - cloudposse/terraform-aws-eks-node-group: Terraform module to provision an EKS Node Group attachment image

Terraform module to provision an EKS Node Group. Contribute to cloudposse/terraform-aws-eks-node-group development by creating an account on GitHub.

feat: Node-local DNS cache support by mumoshu · Pull Request #550 · weaveworks/eksctl attachment image

TL;DR; This is the smallest change to allow enabling node-local DNS cache on eksctl-created nodes. What Add a new field named clusterDNS that accepts the IP address to the DNS server used for all t…

azec

08:58:29 PM

Anyone who could know this ?

2021-09-27

azec

11:01:59 PM

Hi there! We are using cloudposse module for node-groups for Kubernetes EKS 1.21 We started noticing that few hours after provisioning node-groups as well as corresponding worker IAM roles, these three IAM managed AWS policies start disappearing from the IAM roles:

AmazonEKSWorkerNodePolicy
AmazonEC2ContainerRegistryReadOnly
AmazonEKS_CNI_Policy

azec

11:02:08 PM

Wonder if anyone has noticed similar behavior?

azec

11:02:47 PM

Specifically we are using 0.25.0 version of this module: https://github.com/cloudposse/terraform-aws-eks-node-group/tree/0.25.0

GitHub - cloudposse/terraform-aws-eks-node-group at 0.25.0 attachment image

Terraform module to provision an EKS Node Group. Contribute to cloudposse/terraform-aws-eks-node-group development by creating an account on GitHub.

azec

11:07:46 PM

We are using create_before_destroyed flag set to true …

2021-09-28

azec

05:36:48 PM

It turned out to be our ignore_tags configuration of AWS provider that was triggering some unexpected effects on the node-group resources including IAM Roles for worker nodes.

2021-09-29

Santiago Campuzano

07:12:45 AM

Morning everyone ! this is the 2nd part of my K8S blog post: “Implementing Kubernetes: The Hidden Part of the Iceberg”. I hope you enjoy it!. https://medium.com/gumgum-tech/implementing-kubernetes-the-hidden-part-of-the-iceberg-part-2-d76d21759de0

Implementing Kubernetes: The Hidden Part of the Iceberg — Part 2 attachment image

Kubernetes Scheduling, Resource Management and Autoscaling.

Erik Osterman (Cloud Posse)

06:29:31 PM

nice write up!

Implementing Kubernetes: The Hidden Part of the Iceberg — Part 2 attachment image

Kubernetes Scheduling, Resource Management and Autoscaling.