SweetOps #kubernetes for September, 2022

Archive: https://archive.sweetops.com/kubernetes/

2022-09-01

Erik Osterman (Cloud Posse)

https://github.com/aws/containers-roadmap/issues/185#issuecomment-1234850299

Can’t give anything too specific in this forum - but this is in active development. Will move to coming soon once I feel it aligns the roadmap categories defined here https://github.com/aws/containers-roadmap#faqs And yes, this will be an AWS API - and it will include CloudFormation support.

Erik Osterman (Cloud Posse)

10:49:20 PM

@Jeremy G (Cloud Posse)

2022-09-05

Steve Chernyak

02:41:27 PM

does anybody know where i can find details around how memory is throttled when k8s is running under cgroupsv2 with the feature enabled? I’m trying to wrap my head around what it means for memory to be treated as a “compressible” resource.

2022-09-07

2022-09-08

akhan4u

02:58:28 PM

Hey guys,

Facing an Issue running jenkins-operator on kubernetes cluster v1.22.11-eks with Datadog monitoring agent. The DD agent is injecting some ENV vars in the jenkins-instance when created ex: DD_AGENT_HOST & DD_ENTITY_ID. The above ENV vars are causing the operator to restart the jenkins-instance pod in a loop.

Did anyone of you have used jenkins-operator next to a monitoring agent like Datadog, Newrelic, etc?

akhan4u

03:02:26 PM

I found a suggestion from this page to use a different ENV var DATADOG_JENKINS_PLUGIN_TARGET_HOST. but that didn’t helped me either.

Environment mutation causing pod to restart - Jenkinsci/Kubernetes-Operator

Environment mutation causing pod to restart

akhan4u

03:07:08 PM

Jenkins Operator Chart details

NAME                               	CHART VERSION	APP VERSION	DESCRIPTION
jenkinsci/jenkins-operator	0.6.2        	0.7.1      	Kubernetes native operator which fully manages ...

Datadog Chart details

NAME                               	CHART VERSION	APP VERSION	DESCRIPTION
datadog/datadog                    	2.36.6       	7          	Datadog Agent

Erik Osterman (Cloud Posse)

10:20:48 PM

Hrmmm… could it be running out of memory when jenkins stats due to pod limits?

akhan4u

07:00:14 AM

I’ve updated the pod limits. But the problems seems to be related to only ENV var injection by DD.

Vinícius Azevedo

04:23:48 PM

Can someone help me on understanding an anti-affinity policy I am creating for an application? I’m trying to guarantee each new replica will be deployed in a different AZ, so I came up with the following block:

affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
      podAffinityTerm:
        labelSelector:
          matchExpressions:
          - key: app.kubernetes.io/name
            operator: In
            values:
            - ${app-name}
       topologyKey: topology.kubernetes.io/zone

So, my thinking is:

I am creating an Anti-Affinity policy, meaning I want to avoid, if possible (thus the preferred element), assigning the pod to a node according to certain rules
I am using podAffinityTerm and labelSelector, meaning I am looking for labels in pods that are already assigned to my cluster
I am looking for pods with a label common to all my application pods (the scheduler will use the information in matchExpressions, key, operator, and values)
I am telling the scheduler to look for a different zone (as defined in topologyKey) if the criteria is met
I know that if all the zones are already filled with pods of my application, then the scheduler will be free to assign to any node (because no other rules are defined, and I defined a soft rule for the anti-affinity) So is my understanding correct?

Anirudh Ramanathan

11:00:02 PM

Hi folks, I’m Anirudh. I used to work on K8s core controllers in the past, and for the past 2y I’ve been working on a platform called Signadot to test microservices in K8s at scale. Rather than stamping out new copies of infrastructure, the approach we took is to make use of request-level tenancy and dynamic request routing to isolate environments. This makes it possible to get lightweight environments which can share resources with each other while isolating at the request level, similar to how the “copy-on-write” model works for memory. Just launched on Product Hunt and would love to get feedback if you have a few minutes to spare. TIA!

2022-09-09

2022-09-12

mr.shayv

01:13:20 PM

Does anyone know how to troubleshoot a pod stuck on ContainerCreating? logs and events does not show anything that might be telling.. really weird

venkata.mutyala

02:16:52 PM

What does your manifest contain? Have you checked other related objects for any details? Maybe check the deployment object for any issues/events?

mr.shayv

02:44:57 PM

I have a keycloak gatekeeper configured in a configmap but i checked there’s nothing giving in the logs or events

venkata.mutyala

02:50:24 PM

Huh. Just did a quick google search and saw this: https://serverfault.com/questions/728727/kubernetes-stuck-on-containercreating

It might be a little noisy but the command they share to get all events might be useful here.

Kubernetes stuck on ContainerCreating

A pod in my Kubernetes cluster is stuck on “ContainerCreating” after running a create. How do I see logs for this operation in order to diagnose why it is stuck? kubectl logs doesn’t seem to work s…

mr.shayv

02:53:10 PM

Thanks I’ve seen it and tried unfortunately nothing helped.. i will say that my containers run on containerd (crictl) and not docker though

venkata.mutyala

05:00:31 PM

Have you checked APi level logs? Maybe the scheduler logs?

venkata.mutyala

05:01:18 PM

Also is this cloud managed? AWS/EKS/GCP/etc. support might be able to assist here

venkata.mutyala

05:01:39 PM

Though i have a feeling the answer is just tucked away in logs you already have access too. Just not sure which ones…

venkata.mutyala

05:01:48 PM

Please share when you figure it out.

2022-09-13

Adnan

10:10:28 AM

I am trying to get the aws-ebs-csi-driver helm chart working on a EKS 1.23 cluster.

The message I am getting from PVC events

failed to provision volume with StorageClass "gp2": error generating accessibility requirements: no topology key found on CSINode

The CSI topology feature docs say that:

• The PluginCapability must support VOLUME_ACCESSIBILITY_CONSTRAINTS. • The plugin must fill in accessible_topology in NodeGetInfoResponse. This information will be used to populate the Kubernetes CSINode object and add the topology labels to the Node object. • During CreateVolume, the topology information will get passed in through CreateVolumeRequest.accessibility_requirements. I am not sure how to configure these points.

Adnan

11:16:04 AM

I looked at the worker nodes (ec2) launch template / user data. The kubelet root path was not the standard /var/lib/kubelet. Instead it was a different one. I fixed the missing CSINode driver information by updating the volumes host paths with the correct kubelet root path.

2022-09-15

2022-09-26

tamsky

04:23:56 AM

if anyone here has kicked the tires on https://acorn.io/ – I’d be interested to hear your thoughts. edit: “Acorn is a containerized application packaging framework that simplifies deployment on Kubernetes”

2022-09-29

Sean Turner

09:02:26 PM

EKS question. How does one use pod security groups to connect the traffic between the ALB SG and the pod SG? Using the ALB Ingress

I’ve got the traffic between the pod and RDS SGs working fine, but the traffic between the ALB and the Pod is only permitted when I do the following:

• open TCP 4200 on the pod security group from the VPC CIDR

• open TCP 30141 on the pod security group from the VPC CIDR

Any combination of allowing those ports from the ALB SG doesn’t work. That also includes the ALB ingress shared SG Where 4200 is the container port, and 30141 is the service NodePort

edit—- Got this working. Needed to open the following

• open TCP 4200 on the pod security group from the node security group ID

• open TCP 30141 on the pod security group from the node security group ID

Joaquin Menchaca

10:12:13 PM

Anyone have experience with different service meshes? I have gotten Istio, Linkerd, and NGINX Service Mesh working, but when I tried Consul, cannot get off the ground. Their community is not too responsive sadly.

William Morgan

11:19:43 PM

How was your Linkerd experience?

sthapaprabesh2020

01:30:34 AM

I have used istio before, my stack was ( istio + envoy + external DNS + cert manager )