#kubernetes (2024-05)

kubernetes

Archive: https://archive.sweetops.com/kubernetes/

2024-05-01

2024-05-03

Hao Wang avatar
Hao Wang
The Time Linkerd Erased My Load Balancerattachment image

Due to a combination of issues with GKE and Linkerd, I ended up deleting my load balancer routes when I removed the Linkerd helm chart.

2

2024-05-07

rohit avatar

Is there a way to run a “virtual load balancer” inside kubernetes so one would not have to manage a cloud-native load balancer (ALB: aws, whatever gcp or azure has)?

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

There are a few options

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

First, understand how Service works, and it does not need to be of type load balancer. It can be a ClusterIP

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

That covers you for anything TCP related.

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

Now, if you need Ingress functionality, using ingress-nginx, with Service of type ClusterIP , and you now have a solution which does not depend on IaaS.

1
rohit avatar

So with DNS would this work by pointing a new record to the cluster and making requests to the service that way?

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

Well, provided the ClusterIP is routable

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

Typically clusters are deployed on private CIDR ranges, so the Service will have a private ClusterIP

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

If you just want the DNS to work internally, that’s enough. Otherwise, you’ll need to ensure the ClusterIP is on a routable network, or configure port forwarding from some other service, like a firewall

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

Would have to know more about what your goal - the business objective.

rohit avatar

Gotcha, seems easy enough.

rohit avatar

i have a helm chart now that uses aws-load-balancer-controller and in order for this helm chart to be deployable “anywhere” (any cloud k8s) its just a matter of removing the IaaS deps

rohit avatar

we were only targeting aws customers, but apparently a few customers in the pipeline have/use GCP - which i know nothing about

rohit avatar

not even mentioning the cert-manager + istio work effort we’re pulling up on these next few sprints (in our org)

rohit avatar

very limited experience with istio - but it too has an ingress, so likely we will go that way + cert-manager for x509 certs to support mTLS internally. we would give the customer an option, deploy cert-m if you don’t have one, just give us a CA to use, or we will gen self-signed. and istio to support it all

2024-05-08

miko avatar

Status: CLOSED Heroes: :crown:@Hao Wang, @Adi

Hey guys! In AWS EKS I have deployed a StatefulSet Apache Kafka, it’s working well except when I upgrade the docker image or if all the nodes goes to 0 it loses the volume data despite having volumeClaimTemplates defined. The full config is in the first reply!

miko avatar
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: kafka
  labels:
    app: kafka-app
  namespace: kafka-kraft
spec:
  serviceName: kafka-svc
  replicas: 3
  selector:
    matchLabels:
      app: kafka-app
  template:
    metadata:
      labels:
        app: kafka-app
    spec:
      containers:
        - name: kafka-container
          image: {{placeholder}}/kafka-kraft:6
          ports:
            - containerPort: 9092
            - containerPort: 9093
          env:
            - name: REPLICAS
              value: '3'
            - name: SERVICE
              value: kafka-svc
            - name: NAMESPACE
              value: kafka-kraft
            - name: SHARE_DIR
              value: /mnt/kafka
            - name: CLUSTER_ID
              value: gXh3X8A_SGCgyuF_lBqweA
          volumeMounts:
            - name: data
              mountPath: /mnt/kafka
      imagePullSecrets:
      - name: docker-reg-cred
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes:
          - "ReadWriteOnce"
        resources:
          requests:
            storage: "1Gi"
miko avatar

CAPACITY, ACCESS MODES, RECLAIM POLICY, STATUS CLAIM, STORAGECLASS, VOLUMEATTRIBUTESCLASS, REASON

• 1Gi, RWO, Delete, Bound, kafka-kraft/data-kafka-2, gp2,

• 1Gi, RWO, Delete, Bound, kafka-kraft/data-kafka-0, gp2,

• 1Gi, RWO, Delete, Bound, kafka-kraft/data-kafka-1, gp2,

Adi avatar

@miko can you check what is retain policy

Adi avatar
Persistent Volumes

This document describes persistent volumes in Kubernetes. Familiarity with volumes, StorageClasses and VolumeAttributesClasses is suggested. Introduction Managing storage is a distinct problem from managing compute instances. The PersistentVolume subsystem provides an API for users and administrators that abstracts details of how storage is provided from how it is consumed. To do this, we introduce two new API resources: PersistentVolume and PersistentVolumeClaim. A PersistentVolume (PV) is a piece of storage in the cluster that has been provisioned by an administrator or dynamically provisioned using Storage Classes.

miko avatar

@Adi I run the following command kubectl get statefulset kafka -o yaml and I received this:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"apps/v1","kind":"StatefulSet","metadata":{"annotations":{},"labels":{"app":"kafka-app"},"name":"kafka","namespace":"kafka-kraft"},"spec":{"replicas":3,"selector":{"matchLabels":{"app":"kafka-app"}},"serviceName":"kafka-svc","template":{"metadata":{"labels":{"app":"kafka-app"}},"spec":{"containers":[{"env":[{"name":"REPLICAS","value":"3"},{"name":"SERVICE","value":"kafka-svc"},{"name":"NAMESPACE","value":"kafka-kraft"},{"name":"SHARE_DIR","value":"/mnt/kafka"},{"name":"CLUSTER_ID","value":"gXh3X8A_SGCgyuF_lBqweA"}],"image":"<redacted>","name":"kafka-container","ports":[{"containerPort":9092},{"containerPort":9093}],"volumeMounts":[{"mountPath":"/mnt/kafka","name":"data"}]}],"imagePullSecrets":[{"name":"docker-reg-cred"}]}},"volumeClaimTemplates":[{"metadata":{"name":"data"},"spec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"1Gi"}}}}]}}
  creationTimestamp: "2024-04-18T07:01:03Z"
  generation: 4
  labels:
    app: kafka-app
  name: kafka
  namespace: kafka-kraft
  resourceVersion: "8108458"
  uid: b4179eec-d36d-41ce-aa07-d6bad10d884a
spec:
  persistentVolumeClaimRetentionPolicy:
    whenDeleted: Retain
    whenScaled: Retain
  podManagementPolicy: OrderedReady
  replicas: 3
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: kafka-app
  serviceName: kafka-svc
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: kafka-app
    spec:
      containers:
      - env:
        - name: REPLICAS
          value: "3"
        - name: SERVICE
          value: kafka-svc
        - name: NAMESPACE
          value: kafka-kraft
        - name: SHARE_DIR
          value: /mnt/kafka
        - name: CLUSTER_ID
          value: gXh3X8A_SGCgyuF_lBqweA
        image: <redacted>
        imagePullPolicy: IfNotPresent
        name: kafka-container
        ports:
        - containerPort: 9092
          protocol: TCP
        - containerPort: 9093
          protocol: TCP
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /mnt/kafka
          name: data
      dnsPolicy: ClusterFirst
      imagePullSecrets:
      - name: docker-reg-cred
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
  updateStrategy:
    rollingUpdate:
      partition: 0
    type: RollingUpdate
  volumeClaimTemplates:
  - apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      creationTimestamp: null
      name: data
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 1Gi
      volumeMode: Filesystem
    status:
      phase: Pending
status:
  availableReplicas: 3
  collisionCount: 0
  currentReplicas: 3
  currentRevision: kafka-8dd896cf4
  observedGeneration: 4
  readyReplicas: 3
  replicas: 3
  updateRevision: kafka-8dd896cf4
  updatedReplicas: 3

It seems that the retain policy is Retain

spec:
  persistentVolumeClaimRetentionPolicy:
    whenDeleted: Retain
    whenScaled: Retain
miko avatar

I went to retrieve all the PVCs as well kubectl get pvc and kubectl describe pvc data-kafka-0 for which I received:

Name:          data-kafka-0
Namespace:     kafka-kraft
StorageClass:  gp2
Status:        Bound
Volume:        pvc-c9f45d3a-e1a7-4d3a-8e51-79a411411d43
Labels:        app=kafka-app
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: ebs.csi.aws.com
               volume.kubernetes.io/selected-node: <redacted>.<redacted>.compute.internal
               volume.kubernetes.io/storage-provisioner: ebs.csi.aws.com
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      1Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Used By:       kafka-0
Events:        <none>
miko avatar

@Adi idk what I’m missing but it seems to be saying Retain as its policy, perhaps it’s due to my Dockerfile path not the same with volumeMounts’ mounthPath from my StatefulSet?

volumeMounts:
            - name: data
              mountPath: /mnt/kafka
Hao Wang avatar
Hao Wang

how about “kubectl get pv”?

Hao Wang avatar
Hao Wang

volumeMode: Filesystem is set, so when the node is gone, the volume is gone with the node

miko avatar

Hi @Hao Wang ohh I’ll check how to change that, but I also lose the data when I upgrade the image version (I’m away from my laptop but when I get back I’ll run “kubectl get pv”)

Hao Wang avatar
Hao Wang

oh I am wrong, filesystem is not file in docker

Hao Wang avatar
Hao Wang

should be the retention policy in storageclass and pv

Hao Wang avatar
Hao Wang

may need to create a new storageclass with reclaimpolicy as Retain

miko avatar

Oooh like change my EBS CSI driver?

miko avatar

Or explicitly define the storage class to use in my statefulset?

Hao Wang avatar
Hao Wang

can try edit gp2 and set reclaim policy to Retain

miko avatar

@Hao Wang running kubectl get pv and I get this:

Hao Wang avatar
Hao Wang

they were created 22 days ago

Hao Wang avatar
Hao Wang

are they reused?

miko avatar

sorry dumb question but what do you mean by reused? what I’ve done so far is run my StatefulSet which I thought should be enough to get a persistent volume :C and yes I’ve run my StatefulSet many days ago but it’s just recently I realized that whenever I upgrade my StatefulSet kafka is I lose the messages inside of it

Hao Wang avatar
Hao Wang

got it :+1: we can change the storageclass to Retain and it should fix the issue

miko avatar

Omg I checked the gp2 description and it’s policy is Delete

miko avatar

but how come :CCC why is the policy Delete? :C

Hao Wang avatar
Hao Wang

it is default setting

1
miko avatar

kubectl describe storageclass gp2

Name:            gp2
IsDefaultClass:  Yes
Annotations:     kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"},"name":"gp2"},"parameters":{"fsType":"ext4","type":"gp2"},"provisioner":"kubernetes.io/aws-ebs","volumeBindingMode":"WaitForFirstConsumer"}
,storageclass.kubernetes.io/is-default-class=true
Provisioner:           kubernetes.io/aws-ebs
Parameters:            fsType=ext4,type=gp2
AllowVolumeExpansion:  <unset>
MountOptions:          <none>
ReclaimPolicy:         Delete
VolumeBindingMode:     WaitForFirstConsumer
Events:                <none>
Hao Wang avatar
Hao Wang

this may be an old EKS, AWS got gp3

miko avatar

my cluster version is 1.29 but I deployed my AWS EKS via terraform along with the EBS CSI driver

miko avatar

ooh wait my EBS CSI driver I manually installed via ~Helm~ eksctl

Hao Wang avatar
Hao Wang

eksctl may not touch storageclass

miko avatar

@Hao Wang I found the command to update gp2 reclaim policy, I’ll give it a try!

kubectl patch pv <your-pv-name> -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'

1
Hao Wang avatar
Hao Wang

yeah, it is the one

Hao Wang avatar
Hao Wang

oh this is for pv

Hao Wang avatar
Hao Wang

kubectl edit sc gp2

miko avatar

Ohh I have to edit gp2 directly?

miko avatar

Ooowkie thank you!

Hao Wang avatar
Hao Wang

np

miko avatar

wow I can’t edit reclaim policy The StorageClass "gp2" is invalid: reclaimPolicy: Forbidden: updates to reclaimPolicy are forbidden. would it be better if I try and migrate to gp3 and try to set default storageclass to gp3 for my future statefulset to always use gp3?

Hao Wang avatar
Hao Wang

yeah, let us give it a try

miko avatar

Holy smokes… migrating to GP3 is a pain in the butt:

https://aws.amazon.com/blogs/containers/migrating-amazon-eks-clusters-from-gp2-to-gp3-ebs-volumes/

Since this cluster is not production I will just delete the resources in the following order: StatefulSet, PVC, PV and re-apply my StatefulSet again.

The quickest solution that doesn’t require deletion is to change the reclaim policy of PV and not StorageClasses level:

DONTkubectl edit sc gp2 ◦ edit reclaim policy to Retain ◦ -> The StorageClass "gp2" is invalid: reclaimPolicy: Forbidden: updates to reclaimPolicy are forbidden.DOkubectl patch pv pvc-646fef81-c677-46f4-8f27-9d394618f236 -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'

lapc20081996 avatar
lapc20081996

Hello folks, good evening from Costa Rica. I was wondering if someone here has ever had a similar question to this.

How do you usually monitor performance for an app running inside a Kubernetes/OpenShift cluster?

I just found a tool bundled within OpenShift called Performance Profile Creator, but it unknown to me if there’s any Kubernetes-native solutions.

https://docs.openshift.com/container-platform/4.15/scalability_and_performance/cnf-create-performance-profiles.html

venkata.mutyala avatar
venkata.mutyala

What are you trying to measure/improvement exactly? Is it a web app where client requests are slower then you would like? Are you trying to ensure nothing like CPU throttling is happening on the app? or that the node/worker is not overloaded?

lapc20081996 avatar
lapc20081996

Mostly that there are no CPU throttling issues or overloads

venkata.mutyala avatar
venkata.mutyala

So I’ve been using kube prometheus stack https://artifacthub.io/packages/helm/prometheus-community/kube-prometheus-stack

It has built in CPU throttling alerts already as I’ve seen them when my pods didn’t have enough requests set.

I’ve noticed it the default charts/graphs don’t do steel time. So i’ve been meaning to do a contribution for that. But if steel time is a concern for you (ex. you aren’t using dedicated/metal nodes), you may want to use this query:

sum by (instance, cpu) (rate(node_cpu_seconds_total{mode="steal"} [2m])) ref: https://stackoverflow.com/questions/76742560/how-to-measure-cpu-steal-time-with-prometheus-node-exporter-metrics#<i class="em em-~"</i>text=Node%20exporter%20exposes%20metric%20node_cpu_seconds_total,as%20part%20of%20cpu%20collector.&text=can%20show%20you%20how%20much,every%20CPU%20of%20every%20machine>.

^ Assumes you install kube prometheus stack first.

kube-prometheus-stack 58.5.0 · prometheus/prometheus-communityattachment image

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

How to measure CPU Steal Time with Prometheus Node-exporter metrics

How CPU Steal Time can be measured by Prometheus Node-exporter CPU metrics? We have an OpenStack/KVM environment and we want to measure/Know how much CPU steal happens (Percent) in our Computes/Hosts/

venkata.mutyala avatar
venkata.mutyala

Also, i’m using EKS but that chart should work on any k8s distro

2024-05-09

2024-05-10

2024-05-15

venkata.mutyala avatar
venkata.mutyala

Anyone here have recommendations on ML based tools that can help recommend or even automatically set things like: Requests, Limits, Affinity, and Anti-affinity scheduling policies?

rohit avatar

are there good projects to inject “chaos” pod termination/node termination, network failures into kubernetes? in order to test our applications resiliency

1
Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)
asobti/kube-monkey

An implementation of Netflix’s Chaos Monkey for Kubernetes clusters

2024-05-16

    keyboard_arrow_up