#kubernetes (2021-12)
Archive: https://archive.sweetops.com/kubernetes/
2021-12-07
Octopus Deploy is a Deployment and Operations tool for AWS, Azure, .NET, Java, Kubernetes, Windows and Linux, and a Kubernetes YAML generator
2021-12-10
Cross posting this here…. probably should have started in this channel
Has anyone had any success in configuring Persistent Storage when running EKS on Fargate? I’ve been banging my head on this problem for a few days now and it seems the only support for persistent storage is EFS, but i’ve been unable to get it to work, every attempt results in a “SetUp failed for volume… …Could not mount”
Can’t help you with getting it mounted, but yes EFS is the only persistent storage supported for Fargate
Has anyone had any success in configuring Persistent Storage when running EKS on Fargate? I’ve been banging my head on this problem for a few days now and it seems the only support for persistent storage is EFS, but i’ve been unable to get it to work, every attempt results in a “SetUp failed for volume… …Could not mount”
2021-12-11
Is kustomize the default go to when all I need is minor change for deployment from staging/local/prod?
I think Helm might be overkill. If I needed help I probably want to try Pulumi as well (still might if I need to combine in k8 +external resources like databases).
I’ve known about 12 Factor apps for a while. When working AWS native my favorite would be cloud-based configuration settings pulling from SSM.
I recently ran into someone who seemed to think that using environment variables as far inferior to config files. Is this more of a kubernetes standard approach to use config file mounts and not use environment variables? To me environment variables versus mounting a config file is different flavors but not really that important. It’s more do you use environment car/config file OR are you loading directly into your app from a parameter store style app (I think like etc consul? Others)
So I’m all in favor of envs, but will play advocate
• envs are exposed via the /proc
filesystem. any process on the system can read those settings
• envs are harder to validate (e.g. typo an ENV, you won’t get a warning)
• env sprawl: over time, you may end up with hundreds of ENVs as some of our customers have. they have products that have been around for a decade or longer and gone through generations of engineers
• what updates the envs? do yo have CD for envs?
• if your app still consumes configs, but you are parameterizing it with ENVS, it’s tedious to update both the envs and the config file templating every time you add an env
• envs are really only convenient for scalars. serializing structures in YAML/JSON is ugly
• ECS is limited on the number of ENVs supported (I don’t remember exactly, but it’s around 250)
• ECS task definitions are capped at 64K, meaning if you use a lot of ENVs (or long ENVs), you will hit this limit when you least expect it
nice. great point esp on the scalars. … if I ever see env vars containing json Thank you for the good insight. I prefer using stuff like ssm parameter store and such but learning more on K8 has had me revisiting assumptions. Cheers
2021-12-13
Hi People, Did anyone ever hat this problem with a pod?
Status: Terminating (lasts 3d21h)
Termination Grace Period: 120s
The “Termination Grace Period” is 120s yet the pod is terminating for 3d21h
Possibly related: How to fix — Kubernetes namespace deleting stuck in Terminating state
So AWS launched their hosted Kubernetes called EKS (Elastic Kubernetes Service) last year and we merrily jumped onboard and deployed most…
actually it was related to a warm pool of nodes
Just out of curiosity, what did you end up doing to fix it?
For the moment I just deleted the warm pools as they are not that useful at the moment. Now I have to look further into it. I guess the warm pool nodes need more configuration to work properly with k8s
you might have a stuck finalizer
2021-12-14
Does anyone use json/yaml transformation in a pipeline for Kubernetes, like modifying the yaml directly? Saw this once and I normally observe kustomize/helm and other tools that do the transformation, not a build action. I’m thinking that’s more typical in stuff like dotnet aspnetcore style web settings transformations, but that’s an assumption.
I’m currently assuming the “easy” start for doing transformation on straight yaml without any complexity with helm, is just kustomize (from conversation above) and the overlays/patches it produces. Sound about right?
aside from kustomize
, we’ve also used yq
(jq
for YAML). note there are a few projects called yq
nice. I love yq. Used with PowerShell to generate datadog stuff dynamically and blogged on it https://www.sheldonhull.com/working-with-powershell-objects-to-create-yaml/. Was super powerful
Here’s a walk-through on using PowerShell objects to dynamically generate yaml configuration files.
fyi I found helm to be the perfect fit despite trying to avoid initially. Turns out my Hugo blog templating knowledge was directly transferrable and barely a blip to get up and running and generating full deployment rendering that way even if I’m only using to render for this step.
Woot woot
haha, yes, true that.
2021-12-15
2021-12-17
Hey y’all, for all of you running AWS EKS workloads, this just came out: https://github.com/aws-samples/kubernetes-log4j-cve-2021-44228-node-agent
Contribute to aws-samples/kubernetes-log4j-cve-2021-44228-node-agent development by creating an account on GitHub.
Thanks for sharing. Unfortunately it only seems to support Amazon Linux and you can’t disable the patching - monitoring only.
Contribute to aws-samples/kubernetes-log4j-cve-2021-44228-node-agent development by creating an account on GitHub.
2021-12-19
I want to start a discussion regarding various Ingress Controllers, and how to choose between them. I have tried a few, but I haven’t had problems with any yet, maybe due to the current scale I am operating at and the limited use-cases -yet-.
AWS ALB Ingress Controller makes perfect sense when you think about the latency and reliability, however, it is not as robust and feature rich. For example, other Ingress Controllers can act as full fledged “API Gateway” (ex solo.io, or Ambassador’s), and some integrate with other components like Argo Rollouts and Flagger to do advanced rollout strategies. An Ingress Controller like Nginx’s is so “hackable” with the snippets annotations and let me do lots of workaround( also exploitable ).
However, with In-cluster Ingress (unlike ALB), you have to carefully spread your workloads across cluster to decrease latency.
I have a problem that i am unable to formulate an opinion without hands on experience on larger clusters ( 500+ node ), and I can’t rely on my in-laboratory benchmarks because it’s too sensitive. Also, I lack the knowledge to correctly benchmark & profile the network latency/overhead by different ingresses (but i am working on that, your help is very welcomed!)
So, what’s your decision flowchart when deciding on Ingress Controller / API Gateway handle north-south traffic ?
I was “triggered” to post these questions after I read @Vlad Ionescu (he/him) tweet btw
Interviewing with a company that cares deeply about latency, runs on k8s, and where the tiny “DevOps team” can barely handle the maintenance effort, and I got this gem of wisdom:
“Why would I use AWS’ ALB when I can do that in HAProxy?”
Oh, you sweet soul
@Vlad Ionescu (he/him) I hope that was a trick question!
Interviewing with a company that cares deeply about latency, runs on k8s, and where the tiny “DevOps team” can barely handle the maintenance effort, and I got this gem of wisdom:
“Why would I use AWS’ ALB when I can do that in HAProxy?”
Oh, you sweet soul
@Erik Osterman (Cloud Posse) nope, they were totally serious
On the topic… I don’t have the capacity to write all my thoughts on the subject. Some short notes:
• Ingress != API Gateway, in the same way a bike is different from a car. I would not ride a bike to from Amsterdam to London, just like how I would not use a car on a mountain trail.
• Integrations, advanced rollout strategies, and hackability are bonuses, not goals. I want an Ingress, so I want to get traffic into my app. If I want to use Argo Rollouts, well, then I have a different problem and I will be looking at different solutions. Also, the latest trend of doing all releases at network level is… not good. App-level and feature flagging, dammit! YEs, network level makes sense, but in very specific usecases.
• Maintenance is a huge deal. ALB Ingress Controller (well AWS Load Balancer Controller using ALB in the ip
mode :sweat_smile: ) is debatably easy to maintain and upskill people into. If I have AWS engineers, they already know ALBs or it’s easier to teach them ALB than HAProxy configs. If I have HAProxy engineers and I am optimizing for that… wtf am I doing on AWS? There are valid usecases (many even!) but they only appear at huge scale or for very specific workloads.
• Traffic path is a huge deal. AWS Load Balancer Controller using ALB in the ip
mode goes Internet -> ALB -> Pod. Most other options go Internet -> random node -> ingress or router pod -> maybe some more moving through the cluster -> target pod. Even for the shortest alternative path, you add hops. Latency, chance of failure, change of things going down. Yes, there are alternatives to AWS Load Balancer Controller using ALB in the ip
mode that also route directly to the pod, but they are less common and they have other constraints.
TL;DR: shit is complex and hard. Choosing the easiest and least-headache-inducing option is a pretty good idea! It’s a requirement for small teams and companies that want to move fast: oh, you spent 2 days configuring nginx while your competitor spent 2 days building a new feature that is a tiny bit slower and nobody notices the slowness? Guess who wins.
I rarely run API Gateways in k8s. It makes clusters huge, it makes them complex, it makes migrations and upgrades harder, and it’s another piece to maintain. I do not want everything in my cluster, I want to take advantage of the hundreds of engineers AWS has working on say API Gateway! Or S3. Why would I want to run S3 in my k8s cluster? It’s kind of like that for small teams or companies that want to move fast. Building products makes money, managing an API Gateway brings GitHub Issues.
That said, sometimes you want some JWT auth with some fancy tooling for 1 app. Perfect fit for someting like Ambassador, Traefik, and a bunch of other tools like that!
profile the network latency/overhead by different ingresses (but i am working on that, your help is very welcomed!)
Do not forget about the docs! Most Ingresses or API Gateways should have a network path or pretty good docs about how routing happens and how they work.
I’d love to build a flowchart for this too, but that requires a bunch of time and work
^very short thoughts, some are very abstracted. I also just may be wrong on all this, I am know to be a dum-dum
These are super helpful points! I started with nginx ingress, migrated to traefik, and have been mulling over whether the AWS LB controller just made more sense
Anyone using Loki for logs?
yup
Any drawbacks I need to think of? Got it running for 2 environments with dynamo and s3 as backends, but not sure yet
Don’t use DDB. That was a first-gen solution they had for the indexes. Now you are encouraged to use bolt b shipper and everything goes to s3
2021-12-20
Anyone use topologySpreadConstraints
key to evenly distribute across nodes?
I am specifically trying to use this in argo-workflows
topologySpreadConstraints: # <https://kubernetes.io/docs/concepts/workloads/pods/pod-topology-spread-constraints/>
- maxSkew: 1
topologyKey: zone
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
foo: bar
Pod:
- name: foo
inputs:
parameters:
- name: foo
container:
image: my_container
metadata:
labels:
foo: bar
command: [sh, -c]
Weird thing is, it gets scheduled but doesn’t actually run. Wondering if anyone else had experience or successfully done it
2021-12-22
Hi, any suggestions for ‘easy to use’ WAF integrations? Nginx ingress + ModSecurity is hard to debug and maintain, AWS WAF demands ACM certificates (so no LetsEncrypt), or any other?
cloudflare?
I looked into their docs before, but how does that work in terms of provisioning? We use the nginx-ingress controller and external-dns & cert-manager. I cant figure out if these components can manage all the configuration in Cloudflare.
2021-12-23
A simple-yet-powerful API traffic viewer for Kubernetes to help you troubleshoot and debug your microservices. Think TCPDump and Chrome Dev Tools combined.
2021-12-24
Hello. We redeployed kiam to our EKS cluster and some pods are still failing to get their tokens. The kiam server logs keeps saying the role doesn’t have the ability to assume the role assigned to the kiam role even though it does. we’ve tried restarting the application pods and kiam pods and no luck, any ideas?
Kiam server keeps showing errors like
{“generation.metadata”<i class=”em em-0,”level””></i>“error”,”msg”:”error retrieving credentials: AccessDenied: User: arnsts:assumed-role/dev-appsystem-eks-worker-role/i-03694ff305127dad7 is not authorized to perform: sts:AssumeRole on resource: arniam:role/dev-appkiam-server\n\tstatus code: 403, request id: 004dbe53-5c3b-4f67-9594-0860da0cec21”,”pod.iam.requestedRole”<i class=”em em-“dev-resourcemanager-role”,”pod.iam.role””></i>“dev-resourcemanager-role”,”pod.name”<i class=”em em-“hhh-resourcemanager-service-5cc7c8bcdc-kdgbh”,”pod.namespace””></i>“qa-hhh”,”pod.status.ip”<i class=”em em-“10.212.36.21”,”pod.status.phase””></i>“Running”,”resource.version”<i class=”em em-“413214545”,”time””></i>“2021-12-24T1733Z”}
doh, turned out the the wrong arn was passed as to the trust