#kubernetes (2024-02)

kubernetes

Archive: https://archive.sweetops.com/kubernetes/

2024-02-02

rohit avatar

Quick question; I am implementing EKS and some applications (aws-lb controller, some apps, cert-manager). In order to support mTLS, do I need to use a service mesh? I am still trying to understand the service mesh, but is it absolutely necessary to use one to have mTLS communication between pods?

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

It depends. If the requirement is mTLS, then you’ll need another layer, such as a a mesh. However, sometimes mTLS is used to mean “end to end” TLS, which is not really the same thing. If the requirement is just to provide e2e TLS, then it can be a simple TLS sidecar terminating TLS on the pod. The sidecar can use certs generated from cert-manager, and you can still use the ALB, etc.

2
Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

(mTLS is where both the client and server authenticate each other using digital certificates.)

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

also a good question for #office-hours

rohit avatar

Thank you for the response. What does a TLS sidecar terminating TLS on the pod look like?

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

You can just run nginx (or similar) as a sidecar container. This is the gist of it. https://eric-price.net/posts/2023-12-22-nginx-sidecar/

Nginx Sidecar for TLS/SSL Termination on Kubernetes

If you’re not ready for a service mesh like Istio, Cillium or Nginx’s own service mesh, an easy way to implement end-to-end encryption from the application LB to your pod or TLS/SSL termination for your pods behind a network LB is a proxy sidecar using Nginx. I’m going to describe how to set this up using cert-manager to fetch a letsencrypt certificate and mount that to the Nginx base image.

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

Recommend having your service listen on 127.0.0.1:8080 (or similar), and the only exposed port being the nginx sidecar. Since it’s a sidecar, it can talk over localhost to 8080.

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

If you’re using a helm chart, just embed this into the templates, that way multiple apps can use the same pattern. That’s assuming you use what we call the “monochart” pattern for apps.

rohit avatar

monochart being a “helm umbrella” of sorts?

rohit avatar

like instead of deploying aws-lb-controller, cert-manager, and the application via individual helm charts, using a helm umbrella chart to do them all?

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

yes/no. Of course, use vendor-maintained charts if/when available. But if you’re developing services at your corp, consider whether each service truly needs it’s own dedicated chart with 98% boilerplate, or if a generic chart for your organization would suffice. This way, charts become like an interface for deploying new services, and you don’t have an explosion of architectures for k8s.

1
rohit avatar

Ah gotcha. Understood. I will also look into terminating TLS at the sidecar nginx. I need to see if there’s any limitations with Fargate by using this.

2024-02-05

2024-02-08

Corky avatar

Hi y’all. In a meeting with @Erik Osterman (Cloud Posse), he mentioned that running Karpenter controllers on Fargate was problematic, but we did not have time to find out why. It’s working for us so far, and seems to be a recommendation in the AWS EKS Best Practices documentation as well. Does anyone have any related experience they can share?

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

@Jeremy G (Cloud Posse) @Jonathan Eunice

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

They were the ones mostly involved in this.

1
Jeremy G (Cloud Posse) avatar
Jeremy G (Cloud Posse)

Running Karpenter on Fargate is not in itself a problem, but it leads to problematic cluster configurations in the end. I will elaborate below.

1
Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

One of the other major lessons learned was also related to CoreDNS as configured out-of-the-box on EKS, if I recall correctly.

Jonathan Eunice avatar
Jonathan Eunice

We (3Play Media) are running 5 EKS clusters with this configuration (Karpenter on Fargate nodes, as of today 0.31.3 but have run back to 0.2x days). It is working well for us. No real problems encountered across multiple EKS versions. On cluster upgrade, you sometimes have to rollout restart Karpenter a few times before it “catches” the new version (race condition?), but that takes 5 mins maybe. We are considering moving Karpenter to Managed Node Group nodes in some clusters where we run MNGs, just to reduce the number of nodes and the corresponding Datadog monitoring costs. But that is optional and a slight optimization. At present I am unmotivated to prioritize any change away from the Karpenter on Fargate strategy.

Jonathan Eunice avatar
Jonathan Eunice

The CoreDNS settings/configuration for EKS is in my opinion almost criminally bad. Truly, truly suboptimal. Not good for performance, for latency, and worst of all, not well-configured for high availability. It is, of all the things we buy from AWS, one of the few that makes me question “Does AWS know what it’s doing??”

Jonathan Eunice avatar
Jonathan Eunice

But, with a few customizations tucked into our Terraform, it’s running fine for us (on normal EC2 nodes, not Fargate).

Jonathan Eunice avatar
Jonathan Eunice

It was a big debugging project and “OMG why did they do this??” some months back; but once we resolved, no further issues. We’d be happy to share our CoreDNS settings if you’d like.

Corky avatar

Thanks for your experience, @Jonathan Eunice. I’d definitely be interested in your CoreDNS setup. We’re running CoreDNS (as a cluster add-on) on Fargate as well. Aside from the annoying compute-type patching stuff, we haven’t noticed anything amiss. CC @E-Love

Jeremy G (Cloud Posse) avatar
Jeremy G (Cloud Posse)

Update: Karpenter, as of the v1beta migration in v0.33.0, by default deploys the Karpenter controller to an EKS Managed Node Group and recommends deploying the controller into the kube-system namespace.

Original message: I believe the idea of running Karpenter on Fargate was put forward by the Karpenter team because: • They were proud of their accomplishment and wanted to show it off • If you don’t run Karpenter on Fargate, then you have to deploy some nodes some other way, which makes using Karpenter a lot more complicated and the benefits less obvious • In particular, for people setting up an EKS cluster without the help of Terraform and modules like Cloud Posse supplies, creating a managed node pool is a lot of work that can be avoided if you use Karpenter on Fargate • Karpenter’s ability to interact with managed node pools was limited to non-existent when it first launched Karpenter works well with managed node pools now, and the complications of setting up a node pool are greatly mitigated by using Cloud Posse’s Terraform modules and components. So the above motivations are not that powerful.

Our evolution of the decision against running Karpenter on Fargate went like this:

• If you do not run at least one node pool outside of Karpenter, then you cannot create, via Terraform, an EKS cluster with certain EKS-managed add-ons, such as Core DNS, because they require nodes to be running before they will report they are active. Terraform will wait for the add-ons to be active before declaring the cluster completely created, so the whole cluster creation fails. • To support high availability, Cloud Posse and AWS recommend running nodes in 3 separate availability zones. Implied in this recommendation is that any cluster-critical services should be deployed to 3 separate nodes in the 3 separate availability zones. This includes EKS add-ons like Core DNS and EBS CSI driver (controller). • Last time I checked (and this may have been fixed by now), if you specified 3 replicas of an add-on, Karpenter was not sufficiently motivated by the anti-affinity of the add-ons to create 3 nodes, one in each AZ. It just creates 1 node big enough to handle 3 replicas. What is worse, anti-affinity is only considered during scheduling, so once you have all 3 replicas on 1 node, they stay there, even as your cluster grows to dozens of nodes. Your HA and even your basic scalability (due to network bandwidth constraints on the node and cross-AZ traffic) are undermined because Karpenter put all your replicas on 1 node So to get around all of the above, we recommend deploying EKS with a normal managed node pool with 3 node groups (one per AZ). This allows the add-ons to deploy and initialize during cluster creation (satisfying Terraform that the cluster was properly created), and also ensures that the add-ons are deployed to different nodes in different AZs. (While you are at it, you can set up these nodes to provide some floor on available compute resources that ensure all your HA replicas have 3 nodes to run on at all times.) You do not need to use auto-scaling on this node pool, just one node in each AZ, refreshed occasionally.

There is another advantage: you can now have Karpenter provision only Spot instances, and run the managed node pool with On Demand or Reserved instances. This gives you a stable platform for your HA resources and the price savings of Spot instances elsewhere in a relatively simple configuration.

Now that you have a basic node pool to support HA, you can run Karpenter on that node pool, without the risk that Karpenter will kill the node it is running on. Karpenter now (it didn’t for a long time) properly includes the capacity available in the managed node pool when calculating cluster capacity and scaling the nodes it manages, and can scale to zero if the managed node pool is sufficient.

(The caveat here is that we are focusing on clusters that are in constant use and where paying a small premium for extra reliability is worth it. For a cluster where you don’t care if it crashes or hangs, the argument for having 3 static nodes is less compelling.)

Regarding costs, Fargate has a premium in both cost per vCPU and GiB of RAM, and in the quantization of requests. If you are concerned about the cost of running the static node pool, especially for non-production clusters, you can run t3 instances, and/or run 2 instead of 3 nodes.

Pricing comparison in us-west-2 : A c6a.large, which is what we recommend as a default for the static node pool in a production cluster, has 2 vCPUs and 4 GiB of memory, and costs $0.0765 per hour. A Fargate pod requesting 2 vCPUs and 4 GiB would cost $0.09874 per hour. A minimal pod (which is sufficient to run Karpenter) costs $0.0123425 per hour (1/8 the cost of the larger Fargate pod, about 1/6 the cost of the c6a.large, a little more than the cost of a t3.micro with 2 vCPUs and 1 GiB memory). If you have workloads that can run on Gravitron (all the Kubernetes infrastructure does) then you can use the relatively cheaper c6g or c7g mediums at around $0.035 per hour, or $25/month.

So our recommendation is to run a minimal managed node pool and run Karpenter on it. The exception might be for a tiny unimportant cluster where a baseline cost of $120/month is unacceptably high.

Getting Started with Karpenterattachment image

Set up a cluster and add Karpenter

Upgrade Guideattachment image

Learn about upgrading Karpenter

Fargate Pod configuration - Amazon EKS

This section describes some of the unique Pod configuration details for running Kubernetes Pods on AWS Fargate.

1
Jonathan Eunice avatar
Jonathan Eunice

That’s an impressive design argument @Jeremy G (Cloud Posse)! Much appreciated! We started before MNGs were realistic, so our structure reflects “MNGs aren’t available, how else can we make this work?” If we were starting now, we’d probably go an MNG route and nix Fargate. It isn’t a super-priority for us to change, but we might evolve there even as it is, even with Fargate working fine for us.

Jonathan Eunice avatar
Jonathan Eunice

In the clusters that we care about (prod first and foremost, but dev and automation as well), we haven’t seen much trouble with Karpenter allocating a nice spread across AZs. If we needed to guarantee all three AZs, we’d probably need MNG, but Karpenter seems stochastically to get there on its own given sufficient load. (That Karpenter doesn’t have any “at a minimum” configs like “at a minimum, use 3 AZs” or “by default leave X ‘headroom’ for new pods to consume” is one of our complaints / feature requests / hopes for the future.)

Jeremy G (Cloud Posse) avatar
Jeremy G (Cloud Posse)

Karpenter will try to keep a nice balance across AZs, but at cluster creation time, it’s only going to create 1 node. That leads to all your add-on replicas being run on the same 1 node, and since they are cluster critical, they do not get killed and moved to other nodes as the cluster grows. You can be stuck with this non-HA setup for a very long time.

Jonathan Eunice avatar
Jonathan Eunice

We configure a PodDisruptionBudget and topologySpreadConstraints per-cluster. The tPC in particular seems to pretty strongly encourage multi-AZ spread. But primordial spinup / cluster creation time, might not be enough and even after that doesn’t seem a strict guarantee, so anyone needing that needs the stricter MNG approach for assured AZ distribution.

"topologySpreadConstraints": [
  {
    "maxSkew": 1,
    "topologyKey": "topology.kubernetes.io/zone",
    "whenUnsatisfiable": "DoNotSchedule",
    "labelSelector": {
      "matchExpressions": [{"key": "k8s-app", "operator": "In", "values": ["kube-dns"]}]
    }
  },
  {
    "maxSkew": 2,
    "topologyKey": "kubernetes.io/hostname",
    "whenUnsatisfiable": "ScheduleAnyway",
    "labelSelector": {
      "matchExpressions": [{"key": "k8s-app","operator": "In","values": ["kube-dns"]}]
    }
  }
]
Corky avatar

Those are some very detailed explanations, and I appreciate it. We’ve struggled with some of the same things - for instance, the difficulty in spinning up a functional EKS cluster in one shot via Terraform. The HA considerations are also pretty compelling. Thank you both for filling me in!

Jonathan Eunice avatar
Jonathan Eunice

@Corky Do you use Atmos workflows? Not sure we’re at the point where we can “spin up a whole AWS account” or “spin up an entire EKS cluster,” soup to nuts, with one command—but we’re closing on it. We can do large swaths of the setup in single workflow commands. Major time/effort/worry saver.

Corky avatar

My colleague and I were recently made aware of Atmos and are considering it. I watched the live demo and did some reading the other day, and it seems like it may be able to alleviate some pain for us. Config management is a huge PITA right now, with multiple Terraform projects all with their own tfvars and we’re doing very rudimentary sharing of outputs via remote state sharing. The one thing that’s a bigger pain at the moment, however, is spiraling AWS costs - hence the questions on Karpenter, which we’ve deployed to the cluster with our internal tooling but not yet to production clusters.

Jonathan Eunice avatar
Jonathan Eunice

Yeah, managing AWS costs is a bear. Tags and Cost Explorer are your friends there, but it’s a constant fight, keeping the node counts modest, container images as thin as possible, as little on EFS as possible, the NAT Gateways few & with as little traffic as possible, etc. etc. Did not realize I was signing up for my PhD in AWScostology, but here we are, part way through my dissertation.

3
z0rc3r avatar

so that you have a voting majority if one node crashes Karpenter doesn’t use voting. Leader election uses kubernetes leases, which implement simple locking. There is no technical requirement to have more than 2 pods, unless you have more strict HA requirements.

2024-02-09

E-Love avatar
E-Love
03:00:46 PM

Thought this might be relevant here as well

Anyone else get bitten by the new charge for all public IPv4 IPs as of Feb 1? Turns out running a SaaS with a single tenant architecture on EKS (one namespace per customer) with the AWS LB controller (without using IngressGroups) is a recipe for a ton of public IPs (number of AZs times number of ALBs).

1

2024-02-13

Hila F avatar

Hi, anyone here have experience with deploying Apache Ranger in Production (on EC2 or EKS)? All the references I find online seem more related to dev env.

2024-02-16

Henrique Cavarsan avatar
Henrique Cavarsan

hello everyone!

i just wanted to share kftray, an open-source project for kubernetes users. it’s a cross-platform menu bar app that simplifies managing multiple kubectl port forwards. also, i was super happy to see kftray mentioned during the DevOps “Office Hours” (2024-01-10).

since that episode, i’ve made some significant bug fixes and added new features!

check it out, and any feedback or contributions are more than welcome! here are some links about the project:

i hope you find it useful!

KFtray: Manage Multiple kubectl Port Forwards with Ease from Your Menu Bar :rocket:attachment image

KFtray, the new open-source for Kubernetes users. It simplifies managing your kubectl port…

1
venkata.mutyala avatar
venkata.mutyala

wave

Saw this on Reddit. Any updates on GitHub codespaces support? :)

KFtray: Manage Multiple kubectl Port Forwards with Ease from Your Menu Bar :rocket:attachment image

KFtray, the new open-source for Kubernetes users. It simplifies managing your kubectl port…

Henrique Cavarsan avatar
Henrique Cavarsan

hey! wave

sure thing, checked it out the last few days and honestly have no clue where to start

but one thing i noticed: kftray is built with tauri (rust and typescript) and doesn’t have native codespace plugin support, so looks like we’d need to start from scratch to get something like that

but ill look more into it and let u know squirrel

1

2024-02-21

rohit avatar

has anyone implemented a service mesh (any recommendations) to support authn/authz for oauth2 tokens that supports custom scopes? was looking at istio but fairly new to this. did not want to implement something like aws cognito+api gateway to support this. wanted to see if it was feasible to do this all within kubernetes. thanks!

Fizz avatar

Did this with istio. It puts the jwt claims into http headers so then you can use them in the routing decision config by inspecting the headers and denying or allowing a route based on the value of the header. We also didn’t use istio as a service mesh but instead used it as a control plane for envoy, i.e. we configured istio to act as an ingress controller only

1
rohit avatar

Ah thank you for that information. I would like to try this, do you know any public projects that have implemented something like this?

Fizz avatar

No. I followed the istio docs to get it done. First get it deployed and acting as an ingress controller. Next enable jwt validation - https://istio.io/latest/docs/tasks/security/authorization/authz-jwt/.

JWT Token

Shows how to set up access control for JWT token.

Fizz avatar

This is necessary as without it istio will not process the jwt and put it in the headers

Fizz avatar

After that you should get access to the jwt claims in the header. https://istio.io/latest/docs/tasks/security/authentication/jwt-route/

JWT claim based routing

Shows you how to use Istio authentication policy to route requests based on JWT claims.

2024-02-22

Alex Atkinson avatar
Alex Atkinson
#2280 [EKS]: eksctl support status update

AWS is aware of the recent announcement from Weaveworks about shutting down commercial operations. In addition to their significant contributions to the the CNCF and open source, Weaveworks has been a great partner to EKS and we are continuously grateful to Weaveworks for their leadership and development of the eksctl project.

In 2019, AWS and Weaveworks formed a partnership to designate the open-source eksctl CLI tool as the officially supported CLI for Amazon EKS, driven by significant customer demand and the tool’s capability to streamline and automate numerous EKS cluster management tasks. Since then, EKS as a service has evolved, implementing features that closed some of those early automation gaps. eksctl has always stayed a step ahead, adding usability enhancements before similar features are built into EKS service APIs.

Last year, AWS and Weaveworks announced a renewed partnership to co-develop eksctl. As part of that announcement, the eksctl project was moved under the shared /eksctl-io GitHub organization, and AWS engineers were added as project maintainers. Moving forward, AWS will take over full responsibility and ownership of the eksctl. We plan to continue development to meet the needs of customers with continued frequent eksctl releases. We are fully committed to the ongoing development and maintenance of eksctl. Later in 2024, we will move the eksctl project repo to the /aws GitHub organization. We will leave this issue open in case of any questions, comments, or concerns.

2024-02-27

z0rc3r avatar

@Jeremy G (Cloud Posse) could you reopen https://github.com/kubernetes-sigs/aws-efs-csi-driver/issues/1100 please?

1
1
Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

cc @Gabriela Campana (Cloud Posse)

1
Jeremy G (Cloud Posse) avatar
Jeremy G (Cloud Posse)

Reopened

z0rc3r avatar

Thanks!

2024-02-28

Jonathan Eunice avatar
Jonathan Eunice

Dropped in on the Bottlerocket community meeting today. Main topic was their deprecation of metal nodes. In Q&A I asked about my pet desire, container motion a la vMotion or CRIU. Expected to get a “not our gig” brush-off, but they were surprisingly open. They’re actively considering some quasi-related hot kernel update/patching work, and they immediately understood my frustration with Karpenter scheduling and the stranding of long-running workloads. They requested I post an issue in their GitHub for more visibility, which I did. Still a long-shot, but appreciated their receptiveness.

2
venkata.mutyala avatar
venkata.mutyala

Just curious, does karpenter skip the node entirely or does it wait until the job completes (ex. 12 hours) ?

Jonathan Eunice avatar
Jonathan Eunice

Karpenter waits until workload on a node completes, up to a settable maximum. We were running as long as 24h max wait for a job to complete.

1

2024-02-29

    keyboard_arrow_up