SweetOps #kubernetes for December, 2021

Archive: https://archive.sweetops.com/kubernetes/

2021-12-07

zadkiel

Octopus Deploy is a Deployment and Operations tool for AWS, Azure, .NET, Java, Kubernetes, Windows and Linux, and a Kubernetes YAML generator

2021-12-10

Michael Holt

11:31:36 PM

Cross posting this here…. probably should have started in this channel

Has anyone had any success in configuring Persistent Storage when running EKS on Fargate? I’ve been banging my head on this problem for a few days now and it seems the only support for persistent storage is EFS, but i’ve been unable to get it to work, every attempt results in a “SetUp failed for volume… …Could not mount”

Zach

12:57:34 PM

Can’t help you with getting it mounted, but yes EFS is the only persistent storage supported for Fargate

2021-12-11

sheldonh

07:29:51 PM

Is kustomize the default go to when all I need is minor change for deployment from staging/local/prod?

I think Helm might be overkill. If I needed help I probably want to try Pulumi as well (still might if I need to combine in k8 +external resources like databases).

Zach

09:15:42 PM

kustomize all the way

sheldonh

09:36:51 PM

I’ve known about 12 Factor apps for a while. When working AWS native my favorite would be cloud-based configuration settings pulling from SSM.

I recently ran into someone who seemed to think that using environment variables as far inferior to config files. Is this more of a kubernetes standard approach to use config file mounts and not use environment variables? To me environment variables versus mounting a config file is different flavors but not really that important. It’s more do you use environment car/config file OR are you loading directly into your app from a parameter store style app (I think like etc consul? Others)

Erik Osterman (Cloud Posse)

10:19:18 PM

So I’m all in favor of envs, but will play advocate

Erik Osterman (Cloud Posse)

10:19:44 PM

• envs are exposed via the /proc filesystem. any process on the system can read those settings

Erik Osterman (Cloud Posse)

10:20:07 PM

• envs are harder to validate (e.g. typo an ENV, you won’t get a warning)

Erik Osterman (Cloud Posse)

10:20:52 PM

• env sprawl: over time, you may end up with hundreds of ENVs as some of our customers have. they have products that have been around for a decade or longer and gone through generations of engineers

Erik Osterman (Cloud Posse)

10:21:13 PM

• what updates the envs? do yo have CD for envs?

Erik Osterman (Cloud Posse)

10:22:17 PM

• if your app still consumes configs, but you are parameterizing it with ENVS, it’s tedious to update both the envs and the config file templating every time you add an env

Erik Osterman (Cloud Posse)

10:23:18 PM

• envs are really only convenient for scalars. serializing structures in YAML/JSON is ugly

Erik Osterman (Cloud Posse)

10:25:12 PM

• ECS is limited on the number of ENVs supported (I don’t remember exactly, but it’s around 250)

Erik Osterman (Cloud Posse)

10:27:35 PM

• ECS task definitions are capped at 64K, meaning if you use a lot of ENVs (or long ENVs), you will hit this limit when you least expect it

sheldonh

06:01:37 AM

nice. great point esp on the scalars. … if I ever see env vars containing json Thank you for the good insight. I prefer using stuff like ssm parameter store and such but learning more on K8 has had me revisiting assumptions. Cheers

2021-12-13

Adnan

03:02:33 PM

Hi People, Did anyone ever hat this problem with a pod?

Status:                    Terminating (lasts 3d21h)
Termination Grace Period:  120s

The “Termination Grace Period” is 120s yet the pod is terminating for 3d21h

Sam

04:12:50 PM

How to fix — Kubernetes namespace deleting stuck in Terminating state attachment image

So AWS launched their hosted Kubernetes called EKS (Elastic Kubernetes Service) last year and we merrily jumped onboard and deployed most…

Adnan

07:55:29 AM

actually it was related to a warm pool of nodes

Sam

12:36:54 PM

Just out of curiosity, what did you end up doing to fix it?

Adnan

12:40:45 PM

For the moment I just deleted the warm pools as they are not that useful at the moment. Now I have to look further into it. I guess the warm pool nodes need more configuration to work properly with k8s

Zach

03:27:24 PM

you might have a stuck finalizer

2021-12-14

sheldonh

06:14:30 PM

Does anyone use json/yaml transformation in a pipeline for Kubernetes, like modifying the yaml directly? Saw this once and I normally observe kustomize/helm and other tools that do the transformation, not a build action. I’m thinking that’s more typical in stuff like dotnet aspnetcore style web settings transformations, but that’s an assumption.

I’m currently assuming the “easy” start for doing transformation on straight yaml without any complexity with helm, is just kustomize (from conversation above) and the overlays/patches it produces. Sound about right?

Erik Osterman (Cloud Posse)

10:49:05 PM

aside from kustomize, we’ve also used yq (jq for YAML). note there are a few projects called yq

sheldonh

10:50:02 PM

nice. I love yq. Used with PowerShell to generate datadog stuff dynamically and blogged on it https://www.sheldonhull.com/working-with-powershell-objects-to-create-yaml/. Was super powerful

Working With Powershell Objects to Create Yaml attachment image

Here’s a walk-through on using PowerShell objects to dynamically generate yaml configuration files.

sheldonh

05:01:56 AM

fyi I found helm to be the perfect fit despite trying to avoid initially. Turns out my Hugo blog templating knowledge was directly transferrable and barely a blip to get up and running and generating full deployment rendering that way even if I’m only using to render for this step.

Woot woot

Erik Osterman (Cloud Posse)

10:01:24 PM

haha, yes, true that.

2021-12-15

2021-12-17

azec

10:24:55 PM

Hey y’all, for all of you running AWS EKS workloads, this just came out: https://github.com/aws-samples/kubernetes-log4j-cve-2021-44228-node-agent

GitHub - aws-samples/kubernetes-log4j-cve-2021-44228-node-agent attachment image

Contribute to aws-samples/kubernetes-log4j-cve-2021-44228-node-agent development by creating an account on GitHub.

steenhoven

06:47:44 AM

Thanks for sharing. Unfortunately it only seems to support Amazon Linux and you can’t disable the patching - monitoring only.

GitHub - aws-samples/kubernetes-log4j-cve-2021-44228-node-agent attachment image

Contribute to aws-samples/kubernetes-log4j-cve-2021-44228-node-agent development by creating an account on GitHub.

2021-12-19

Sherif

07:41:07 PM

I want to start a discussion regarding various Ingress Controllers, and how to choose between them. I have tried a few, but I haven’t had problems with any yet, maybe due to the current scale I am operating at and the limited use-cases -yet-.

AWS ALB Ingress Controller makes perfect sense when you think about the latency and reliability, however, it is not as robust and feature rich. For example, other Ingress Controllers can act as full fledged “API Gateway” (ex solo.io, or Ambassador’s), and some integrate with other components like Argo Rollouts and Flagger to do advanced rollout strategies. An Ingress Controller like Nginx’s is so “hackable” with the snippets annotations and let me do lots of workaround( also exploitable ).

However, with In-cluster Ingress (unlike ALB), you have to carefully spread your workloads across cluster to decrease latency.

I have a problem that i am unable to formulate an opinion without hands on experience on larger clusters ( 500+ node ), and I can’t rely on my in-laboratory benchmarks because it’s too sensitive. Also, I lack the knowledge to correctly benchmark & profile the network latency/overhead by different ingresses (but i am working on that, your help is very welcomed!)

So, what’s your decision flowchart when deciding on Ingress Controller / API Gateway handle north-south traffic ?

I was “triggered” to post these questions after I read @Vlad Ionescu (he/him) tweet btw

Interviewing with a company that cares deeply about latency, runs on k8s, and where the tiny “DevOps team” can barely handle the maintenance effort, and I got this gem of wisdom:

“Why would I use AWS’ ALB when I can do that in HAProxy?”

Oh, you sweet soul

Erik Osterman (Cloud Posse)

10:03:59 PM

@Vlad Ionescu (he/him) I hope that was a trick question!

Interviewing with a company that cares deeply about latency, runs on k8s, and where the tiny “DevOps team” can barely handle the maintenance effort, and I got this gem of wisdom:

“Why would I use AWS’ ALB when I can do that in HAProxy?”

Oh, you sweet soul

Vlad Ionescu (he/him)

11:19:57 PM

@Erik Osterman (Cloud Posse) nope, they were totally serious

Vlad Ionescu (he/him)

11:33:39 PM

On the topic… I don’t have the capacity to write all my thoughts on the subject. Some short notes: • Ingress != API Gateway, in the same way a bike is different from a car. I would not ride a bike to from Amsterdam to London, just like how I would not use a car on a mountain trail. • Integrations, advanced rollout strategies, and hackability are bonuses, not goals. I want an Ingress, so I want to get traffic into my app. If I want to use Argo Rollouts, well, then I have a different problem and I will be looking at different solutions. Also, the latest trend of doing all releases at network level is… not good. App-level and feature flagging, dammit! YEs, network level makes sense, but in very specific usecases. • Maintenance is a huge deal. ALB Ingress Controller (well AWS Load Balancer Controller using ALB in the ip mode :sweat_smile: ) is debatably easy to maintain and upskill people into. If I have AWS engineers, they already know ALBs or it’s easier to teach them ALB than HAProxy configs. If I have HAProxy engineers and I am optimizing for that… wtf am I doing on AWS? There are valid usecases (many even!) but they only appear at huge scale or for very specific workloads. • Traffic path is a huge deal. AWS Load Balancer Controller using ALB in the ip mode goes Internet -> ALB -> Pod. Most other options go Internet -> random node -> ingress or router pod -> maybe some more moving through the cluster -> target pod. Even for the shortest alternative path, you add hops. Latency, chance of failure, change of things going down. Yes, there are alternatives to AWS Load Balancer Controller using ALB in the ip mode that also route directly to the pod, but they are less common and they have other constraints. TL;DR: shit is complex and hard. Choosing the easiest and least-headache-inducing option is a pretty good idea! It’s a requirement for small teams and companies that want to move fast: oh, you spent 2 days configuring nginx while your competitor spent 2 days building a new feature that is a tiny bit slower and nobody notices the slowness? Guess who wins.

Vlad Ionescu (he/him)

11:37:37 PM

I rarely run API Gateways in k8s. It makes clusters huge, it makes them complex, it makes migrations and upgrades harder, and it’s another piece to maintain. I do not want everything in my cluster, I want to take advantage of the hundreds of engineers AWS has working on say API Gateway! Or S3. Why would I want to run S3 in my k8s cluster? It’s kind of like that for small teams or companies that want to move fast. Building products makes money, managing an API Gateway brings GitHub Issues.

That said, sometimes you want some JWT auth with some fancy tooling for 1 app. Perfect fit for someting like Ambassador, Traefik, and a bunch of other tools like that!

Vlad Ionescu (he/him)

11:38:21 PM

profile the network latency/overhead by different ingresses (but i am working on that, your help is very welcomed!) Do not forget about the docs! Most Ingresses or API Gateways should have a network path or pretty good docs about how routing happens and how they work.

Vlad Ionescu (he/him)

11:40:08 PM

I’d love to build a flowchart for this too, but that requires a bunch of time and work

Vlad Ionescu (he/him)

11:40:41 PM

^very short thoughts, some are very abstracted. I also just may be wrong on all this, I am know to be a dum-dum

Zach

04:17:32 AM

These are super helpful points! I started with nginx ingress, migrated to traefik, and have been mulling over whether the AWS LB controller just made more sense

steenhoven

07:31:45 AM

Anyone using Loki for logs?

Zach

01:54:05 PM

yup

steenhoven

02:44:20 PM

Any drawbacks I need to think of? Got it running for 2 environments with dynamo and s3 as backends, but not sure yet

Zach

02:46:11 PM

Don’t use DDB. That was a first-gen solution they had for the indexes. Now you are encouraged to use bolt b shipper and everything goes to s3

2021-12-20

Jas Rowinski

07:57:45 PM

Anyone use topologySpreadConstraints key to evenly distribute across nodes?

I am specifically trying to use this in argo-workflows

topologySpreadConstraints:  # <https://kubernetes.io/docs/concepts/workloads/pods/pod-topology-spread-constraints/>
      - maxSkew: 1
        topologyKey: zone
        whenUnsatisfiable: ScheduleAnyway
        labelSelector:
          matchLabels:
            foo: bar

Pod:

  - name: foo
    inputs:
      parameters:
      - name: foo
    container:
      image: my_container
    metadata:
      labels:
        foo: bar
      command: [sh, -c]

Weird thing is, it gets scheduled but doesn’t actually run. Wondering if anyone else had experience or successfully done it

2021-12-22

steenhoven

02:18:12 PM

Hi, any suggestions for ‘easy to use’ WAF integrations? Nginx ingress + ModSecurity is hard to debug and maintain, AWS WAF demands ACM certificates (so no LetsEncrypt), or any other?

Adnan

02:30:59 PM

cloudflare?

steenhoven

02:33:40 PM

I looked into their docs before, but how does that work in terms of provisioning? We use the nginx-ingress controller and external-dns & cert-manager. I cant figure out if these components can manage all the configuration in Cloudflare.

2021-12-23

zadkiel

06:15:03 PM

https://getmizu.io/

API Traffic viewer for Kubernetes

A simple-yet-powerful API traffic viewer for Kubernetes to help you troubleshoot and debug your microservices. Think TCPDump and Chrome Dev Tools combined.

2021-12-24

zeid.derhally

05:59:39 PM

Hello. We redeployed kiam to our EKS cluster and some pods are still failing to get their tokens. The kiam server logs keeps saying the role doesn’t have the ability to assume the role assigned to the kiam role even though it does. we’ve tried restarting the application pods and kiam pods and no luck, any ideas?

Kiam server keeps showing errors like

{“generation.metadata”“error”,”msg”:”error retrieving credentials: AccessDenied: User: arnsts:assumed-role/dev-appsystem-eks-worker-role/i-03694ff305127dad7 is not authorized to perform: sts:AssumeRole on resource: arniam:role/dev-appkiam-server\n\tstatus code: 403, request id: 004dbe53-5c3b-4f67-9594-0860da0cec21”,”pod.iam.requestedRole”“dev-resourcemanager-role”,”pod.name”“qa-hhh”,”pod.status.ip”“Running”,”resource.version”“2021-12-24T1733Z”}

zeid.derhally

07:09:28 PM

doh, turned out the the wrong arn was passed as to the trust

steenhoven

07:20:41 PM