SweetOps #kubernetes for July, 2022

Archive: https://archive.sweetops.com/kubernetes/

2022-07-11

Adnan

Did anybody ever experience response time lagging between an nginx and an application pod? I have some strange intermittent issues where the application responds in e.g. 300ms but the nginx in front of it responds in 24s.

2022-07-12

Shreyank Sharma

05:00:51 PM

Hello All, we are running ELK running in Kubernetes, installed via helm charts. it was installed using the stable helm repo(which is deprecated https://github.com/helm/charts/tree/master/stable now and all charts are moved to the elastic repo). Elasticsearch, Logstash and Kibana all are in version 6.7.0, now we want to upgrade it to the latest or at least 7. the latest version of elasticsearch in charts from the stable repo is 6.8.6. so am assuming I cannot just upgrade it to 7 to 8 version just using the “helm upgrade” command. So my questions are:

Do we have to recreate the whole ELK cluster if we have to upgrade to 7 or 8, downloading the chart from the elastic repo?
Is there a way to upgrade to 7 or version without changing repo(stable to elastic) info? Thanks in advance

2022-07-18

Sean Turner

07:08:30 PM

Hey all, running EKS. Is there to get certain pods to scale spot nodes, and additionally fall back to on demand when there’s no capacity?

Alternatively, is there a way to run 10% of a workload on demand, and 90% on spot?

Context: I’m trying to use tolerations and affinity to make emr on eks pods run on spot. When these cause a scale up they sometimes autoscale the spot nodes, sometimes autoscale the on demand nodes. I would ideally like the spot nodes to get autoscaled until that’s not possible and then fall back on to on demand nodes

      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 1
              preference:
                matchExpressions:
                  - key: executor-emr
                    operator: In
                    values:
                      - "true"
      tolerations:
        - key: executor-emr
          operator: Equal
          value: "true"
          effect: NoSchedule

Matt McLane

12:13:45 PM

I would highly recommend you look into Karpenter. https://karpenter.sh/ Its an alternative to your traditional autoscaler but with a LOT more smarts. It can do what your looking for easily.

Karpenter

Just-in-time Nodes for Any Kubernetes Cluster

Sean Turner

12:18:48 PM

Yeah, I actually started a poc to implement it and got 85-90% of the way there. I’m leaving in two weeks however so I really want to get solid auto scaling implemented first, and the cluster autoscaler seemed like it was going to work but now I don’t think so at least not perfectly

Matt McLane

12:25:33 PM

The karpenter devs are active in the kubernetes.slack.com slack on the #karpenter channel if that helps. They may be able to answer any questions you have. They were very helpful for me in the past.

2022-07-19

Adnan

09:35:28 AM

I have a weird issue with my nginx-ingress-controller.

Sometimes it have following log values from the nginx-ingress-controller

upstream_duration: 4.560, 0.328

Where did 4 seconds go?

Upstream is another nginx with a duration of 328ms

Did anybody experience something like this before? How could I debug this?

2022-07-20

09:22:20 PM

What the normal amount of namespaces you’ve seen in clusters?

sheldonh

09:23:14 PM

Curious as I had a discussion about this with someone and realized the types of companies you’ve worked with probably bias this a lot.

sheldonh

09:25:35 PM

Second question, more thread oriented

If you install a helm chart that is doing automation with Kubernetes such as secrets and other things…. And you want to install to a namespace….

Would you expect the app to only function internally in that namespace even if it was doing kubernetes automation? My gut is that I’d want a namespaced app to be defaulted to only running against secrets and resources in the namespace explicitly, regardless of RBAC allowing more. However, that’s my assumption. Curious if anyone else thinks automation for k8s with a namespace should be opt in to all namespace automation or opt out to all namespace automation as a rule.

Zachary Loeber

06:52:44 PM

Well secret access is scoped to Pods running in that namespace

Zachary Loeber

06:53:04 PM

But I think I follow either way

Zachary Loeber

06:57:07 PM

If you are using kube deployments and such via helm charts to do broader configuration against the cluster I don’t know that namespace scope matters so much as having the rbac rules of its SA approved in version control somewhere.

Zachary Loeber

06:57:23 PM

Maybe I misunderstand ya though

2022-07-21

2022-07-22

Adnan

08:07:33 AM

Anybody reliably used CRON_TZ in CronJobs with 1.22 version?

tennisonyu

09:43:42 PM

Hello, is anyone here familiar with Istio? I’m a beginner trying to get started but running into some issues.

roth.andy

09:44:31 PM

what’s your question?

tennisonyu

09:48:20 PM

Hey @roth.andy, so I’m a complete noob at this. I’m trying to just get it up and running to be honest. I’ve installed 1.14.1 using helm on EKS but the ingressgateway (I’ve just named it gateway) is not coming up. I keep seeing the below errors and I have no idea what it means. Do you have any insight?

2022-07-22T21:45:05.292287Z	warning	envoy config	StreamAggregatedResources gRPC config stream to xds-grpc closed since 6457s ago: 14, connection error: desc = "transport: Error while dialing dial tcp: lookup istiod.istio-system.svc: i/o timeout"
2022-07-22T21:45:28.893251Z	warning	envoy config	StreamAggregatedResources gRPC config stream to xds-grpc closed since 6481s ago: 14, connection error: desc = "transport: Error while dialing dial tcp: lookup istiod.istio-system.svc: i/o timeout"
2022-07-22T21:45:34.294627Z	warn	ca	ca request failed, starting attempt 1 in 105.763472ms
2022-07-22T21:45:34.401066Z	warn	ca	ca request failed, starting attempt 2 in 189.64364ms
2022-07-22T21:45:34.591468Z	warn	ca	ca request failed, starting attempt 3 in 377.860141ms
2022-07-22T21:45:34.970056Z	warn	ca	ca request failed, starting attempt 4 in 732.207437ms
2022-07-22T21:45:35.702977Z	warn	sds	failed to warm certificate: failed to generate workload certificate: create certificate: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup istiod.istio-system.svc on 172.20.0.10:53: read udp 10.253.25.91:45906->172.20.0.10:53: i/o timeout"

tennisonyu

09:48:51 PM

Thank you for responding as well btw

tennisonyu

09:51:13 PM

This is what my resources look like. It’s weird because my deployment is a bit unstable. Sometimes it works and sometimes it doesn’t

NAME                                 READY   STATUS    RESTARTS   AGE
pod/istio-gateway-7f885db475-jl74c   0/1     Running   0          112m
pod/istiod-57c86cdbd7-cgx42          1/1     Running   0          3h43m

NAME                    TYPE           CLUSTER-IP       EXTERNAL-IP                                                                     PORT(S)                                      AGE
service/istio-gateway   LoadBalancer   172.20.192.132   <commented_out>   15021:31706/TCP,80:32429/TCP,443:31799/TCP   112m
service/istiod          ClusterIP      172.20.9.46      <none>                                                                          15010/TCP,15012/TCP,443/TCP,15014/TCP        3h43m

NAME                            READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/istio-gateway   0/1     1            0           112m
deployment.apps/istiod          1/1     1            1           3h43m

NAME                                       DESIRED   CURRENT   READY   AGE
replicaset.apps/istio-gateway-7f885db475   1         1         0       112m
replicaset.apps/istiod-57c86cdbd7          1         1         1       3h43m

NAME                                                REFERENCE                  TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
horizontalpodautoscaler.autoscaling/istio-gateway   Deployment/istio-gateway   <unknown>/80%   1         5         1          112m
horizontalpodautoscaler.autoscaling/istiod          Deployment/istiod          <unknown>/80%   1         5         1          3h43m

tennisonyu

10:34:54 PM

I’m a little confused how the certificate process works

roth.andy

01:48:34 AM

How are you deploying Istio? The operator? Or something else

roth.andy

01:49:26 AM

What happens if you deploy the “demo” configuration profile?

roth.andy

01:52:02 AM

Start from a configuration that works, then modify iteratively

tennisonyu

01:46:10 PM

Hey Andrew, thanks for the reply. I was deploying the operator using helm per here actually: https://istio.io/latest/docs/setup/install/helm/

Install with Helm

Install and configure Istio for in-depth evaluation.

tennisonyu

01:46:53 PM

but over the weekend, I actually realized this might be more of a kubernetes issue so I’m currently looking into that

2022-07-25

mr.shayv

07:29:58 AM

Hey Is there an alternative to matchLabels? if for example i want to match only atleast 2 labels and not all

Christian

05:35:24 PM

What do you mean? matchLabels is a map of key value pairs

selector: matchLabels: app: nginx tier: frontend

Adnan

07:50:47 AM

Is it possible to run pods only on nodes within a specified subnet? If for some reason, nodes cannot be started in that subnet, start the pods on the other nodes?

Adnan

08:23:21 AM

i guess the solution would be something like this

apiVersion: v1
kind: Pod
metadata:
  name: with-node-affinity
spec:
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: topology.kubernetes.io/zone
            operator: In
            values:
            - antarctica-east1

Adnan

11:00:41 AM

nodeAffinity:
  preferredDuringSchedulingIgnoredDuringExecution:
    - weight: 10
      preference:
        matchExpressions:
          - key: topology.kubernetes.io/zone
            operator: In
            values:
              - eu-central-1a
              - eu-central-1c
    - weight: 90
      preference:
        matchExpressions:
          - key: topology.kubernetes.io/zone
            operator: In
            values:
              - eu-central-1b

With this nodeAffinity configuration, Kubernetes will try to schedule • 90% of the pods on nodes with the label [topology.kubernetes.io/zone=eu-central-1b](http://topology.kubernetes.io/zone=eu-central-1b) and, • 10% of the pods on nodes with the label [topology.kubernetes.io/zone=eu-central-1a](http://topology.kubernetes.io/zone=eu-central-1a) or [topology.kubernetes.io/zone=eu-central-1c](http://topology.kubernetes.io/zone=eu-central-1c) Do I understand this correctly?

#kubernetes (2022-07)

2022-07-11

2022-07-12

2022-07-18

2022-07-19

2022-07-20

2022-07-21

2022-07-22

2022-07-25

2022-07-29