SweetOps #kubernetes for February, 2022

Archive: https://archive.sweetops.com/kubernetes/

2022-02-01

Sean Turner

Is it possible to trigger a job when another job completes? Or would you need something like argo workflows to orchestrate the jobs?

Leo Przybylski

08:11:16 PM

@Leo Przybylski has joined the channel

Lucky

08:11:26 PM

@Lucky has joined the channel

2022-02-02

emem

06:33:20 AM

hi there. please has anyone seen this error after adding datadog as a sidecar to eks fargate

Your pod's cpu/memory requirements exceed the max Fargate configuration

jose.amengual

07:40:41 AM

WTH? you can’t delete messages anymore????????

Erik Osterman (Cloud Posse)

03:04:43 PM

I will check

jose.amengual

04:32:40 PM

you guys moved to a paid account?

Erik Osterman (Cloud Posse)

04:55:24 PM

try now to delete

jose.amengual

04:58:49 PM

now it works

jose.amengual

05:30:22 PM

my english is getting worse……

2022-02-03

2022-02-04

Alencar Junior

01:22:26 PM

Hi folks, currently I’m having issues deploying metrics-server on EKS. I’ve installed the latest version via terraform using the resource helm_release however, I’m getting the following error:

$ kubectl top nodes
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)

$ kubectl describe apiservice v1beta1.metrics.k8s.io
Message: failing or missing response from <https://10.0.5.36:4443/apis/metrics.k8s.io/v1beta1>: Get "<https://10.0.5.36:4443/apis/metrics.k8s.io/v1beta1>": context deadline exceeded
Reason: FailedDiscoveryCheck

$ kubectl logs -n kube-system deploy/metrics-server
round_trippers.go:454] GET <https://10.0.4.76:10250/stats/summary?only_cpu_and_memory=true> 200 OK in 6 milliseconds
round_trippers.go:454] GET <https://10.0.5.211:10250/stats/summary?only_cpu_and_memory=true> 200 OK in 6 milliseconds
scraper.go:157] "Scrape finished" duration="20.409888ms" nodeCount=3 podCount=9
server.go:139] "Storing metrics"
server.go:144] "Scraping cycle complete"
handler.go:153] metrics-server: GET "/readyz" satisfied by nonGoRestful

EKS Cluster version: 1.21 Metrics Server version: 3.7.0

Does anybody know what could be wrong?

Alencar Junior

04:54:39 PM

I’ve solved it buy adding an addtional rule to my security group:

 node_security_group_additional_rules = {

    ms_4443_ing = {
      description                   = "Cluster API to metrics server 4443 ingress port"
      protocol                      = "tcp"
      from_port                     = 4443
      to_port                       = 4443
      type                          = "ingress"
      source_cluster_security_group = true
    }
   ...

2022-02-09

Or Azarzar

03:28:04 PM

_Check out our upcoming Webinar:_ Managed Kubernetes Clusters: Avoiding risky defaults, K8s threat modeling and securing EKS clusters

Click here to register Learn how to navigate the creating of a secure by default K8s cluster, avoid risky default settings and permissions, and listen to some live threat modeling of security EKS clusters. Join Lightspin CISO Jonathan Rau and Director of Security Research Gafnit Amiga to discuss hot topics and tips for leveling up your Kubernetes security knowledge. Questions and topics covered include:

Avoiding risky default settings in your Kubernetes clusters
Creating a secure by default Kubernetes cluster
Unique supply chain risks for Kubernetes

2022-02-13

Erik Osterman (Cloud Posse)

05:16:48 AM

https://youtu.be/BE77h7dmoQU

Kubernetes: The Documentary [PART 1]

Adnan

07:08:57 AM

Hi there,

I have a rolling update deployment strategy with a 0 max unavailable value. Unfortunately when I deploy many pods get terminated and I am not sure why.

Example:

Normal  ScalingReplicaSet  30m   deployment-controller Scaled down replica set deployment-xyz to 27
Normal  ScalingReplicaSet  2m20s deployment-controller Scaled down replica set deployment-xyz to 10

It went from 27 to 10

Strategy:

StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  0 max unavailable, 25% max surge

Any ideas where I could look at to maybe understand why this is happening?

Zach

01:03:49 PM

Try a PodDisruptionBudget. Or increase your max surge

Adnan

01:11:11 PM

The problem was the replicas property of the deployment which takes effect over the HPA every time something is deployed. I guess I’ll skip/remove the replicas property from the deployment if the HPA is enabled.

Zach

01:11:44 PM

Oh, yes you’re supposed to do that.

Zach

01:12:02 PM

It causes the deployment and hpa controllers to fight

2022-02-14

2022-02-17

Zachary Loeber

08:12:52 PM

brew for kubernetes –> https://github.com/kbrew-dev/kbrew

kbrew-dev/kbrew

kbrew is homebrew for Kubernetes

Zachary Loeber

08:13:36 PM

Not that it would be something I’d personally throw on a pipeline but could be useful for some rapid local testing perhaps

kbrew-dev/kbrew

kbrew is homebrew for Kubernetes

Mr.Devops

01:46:15 AM

hi if anyone is using <https://github.com/terraform-aws-modules/terraform-aws-eks> i am trying to add some additional tag to the ELB which get created from this module but having a hard time locating it. If anyone can point me to the right location pls thx.

02:12:25 AM

i don’t believe this module creates a load balancer

02:12:46 AM

in the document it shows all the resources created and none of them are a load balancer

https://github.com/terraform-aws-modules/terraform-aws-eks#resources

terraform-aws-modules/terraform-aws-eks

Terraform module to create an Elastic Kubernetes (EKS) cluster and associated worker instances on AWS

Mr.Devops

02:12:50 AM

wouldn’t the ASG perform that?

02:13:17 AM

i don’t believe the ASG would create a load balancer

Mr.Devops

02:13:45 AM

i mean for instance to be attached to an ELB when ASG resources are created.

Mr.Devops

02:14:16 AM

i’m probably confusion things with kubernetes external-dns (helm)

02:15:38 AM

external-dns creates the route53 record in the r53 hosted zone

02:16:00 AM

perhaps the LB was created with an ingress resource from one of your helm charts

Mr.Devops

02:20:49 AM

ah yes looks like external-dns has ingress controller to perform this https://github.com/kubernetes-sigs/external-dns/blob/master/docs/tutorials/alb-ingress.md#ingress-examples

Using ExternalDNS with alb-ingress-controller

This tutorial describes how to use ExternalDNS with the aws-alb-ingress-controller.

Setting up ExternalDNS and aws-alb-ingress-controller

Follow the <aws.md|AWS tutorial> to setup ExternalDNS for use in Kubernetes clusters
running in AWS. Specify the source=ingress argument so that ExternalDNS will look
for hostnames in Ingress objects. In addition, you may wish to limit which Ingress
objects are used as an ExternalDNS source via the ingress-class argument, but
this is not required.

For help setting up the ALB Ingress Controller, follow the Setup Guide.

Note that the ALB ingress controller uses the same tags for subnet auto-discovery
as Kubernetes does with the AWS cloud provider.

In the examples that follow, it is assumed that you configured the ALB Ingress
Controller with the ingress-class=alb argument (not to be confused with the
same argument to ExternalDNS) so that the controller will only respect Ingress
objects with the kubernetes.io/ingress.class annotation set to “alb”.

Deploy an example application

Create the following sample “echoserver” application to demonstrate how
ExternalDNS works with ALB ingress objects.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: echoserver
spec:
  replicas: 1
  selector:
    matchLabels:
      app: echoserver
  template:
    metadata:
      labels:
        app: echoserver
    spec:
      containers:
      - image: gcr.io/google_containers/echoserver:1.4
        imagePullPolicy: Always
        name: echoserver
        ports:
        - containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
  name: echoserver
spec:
  ports:
    - port: 80
      targetPort: 8080
      protocol: TCP
  type: NodePort
  selector:
    app: echoserver

Note that the Service object is of type NodePort. We don’t need a Service of
type LoadBalancer here, since we will be using an Ingress to create an ALB.

Ingress examples

Create the following Ingress to expose the echoserver application to the Internet.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    alb.ingress.kubernetes.io/scheme: internet-facing
    kubernetes.io/ingress.class: alb
  name: echoserver
spec:
  ingressClassName: alb
  rules:
  - host: echoserver.mycluster.example.org
    http: &echoserver_root
      paths:
      - path: /
        backend:
          service:
            name: echoserver
            port:
              number: 80
        pathType: Prefix
  - host: [echoserver.example.org](http://echoserver.example.org)
    http: *echoserver_root

The above should result in the creation of an (ipv4) ALB in AWS which will forward
traffic to the echoserver application.

If the source=ingress argument is specified, then ExternalDNS will create DNS
records based on the hosts specified in ingress objects. The above example would
result in two alias records being created, echoserver.mycluster.example.org and
[echoserver.example.org](http://echoserver.example.org), which both alias the ALB that is associated with the
Ingress object.

Note that the above example makes use of the YAML anchor feature to avoid having
to repeat the http section for multiple hosts that use the exact same paths. If
this Ingress object will only be fronting one backend Service, we might instead
create the following:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    alb.ingress.kubernetes.io/scheme: internet-facing
    [external-dns.alpha.kubernetes.io/hostname](http://external-dns.alpha.kubernetes.io/hostname): echoserver.mycluster.example.org, [echoserver.example.org](http://echoserver.example.org)
    kubernetes.io/ingress.class: alb
  name: echoserver
spec:
  ingressClassName: alb
  rules:
  - http:
      paths:
      - path: /
        backend:
          service:
            name: echoserver
            port:
              number: 80
        pathType: Prefix

In the above example we create a default path that works for any hostname, and
make use of the [external-dns.alpha.kubernetes.io/hostname](http://external-dns.alpha.kubernetes.io/hostname) annotation to create
multiple aliases for the resulting ALB.

Dualstack ALBs

AWS supports both IPv4 and “dualstack” (both IPv4 and IPv6) interfaces for ALBs.
The ALB ingress controller uses the alb.ingress.kubernetes.io/ip-address-type
annotation (which defaults to ipv4) to determine this. If this annotation is
set to dualstack then ExternalDNS will create two alias records (one A record
and one AAAA record) for each hostname associated with the Ingress object.

Example:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/ip-address-type: dualstack
    kubernetes.io/ingress.class: alb
  name: echoserver
spec:
  ingressClassName: alb
  rules:
  - host: [echoserver.example.org](http://echoserver.example.org)
    http:
      paths:
      - path: /
        backend:
          service:
            name: echoserver
            port:
              number: 80
        pathType: Prefix

The above Ingress object will result in the creation of an ALB with a dualstack
interface. ExternalDNS will create both an A [echoserver.example.org](http://echoserver.example.org) record and
an AAAA record of the same name, that each are aliases for the same ALB.

Mr.Devops

02:21:10 AM

ignore that i brought up terraform module i was looking in the wrong place.

2022-02-19

zadkiel

04:05:39 PM

https://github.com/kinvolk/inspektor-gadget

kinvolk/inspektor-gadget

Collection of gadgets for debugging and introspecting Kubernetes applications using BPF

Sherif

06:01:11 PM

Hey For Platform Engineers over here, what do you think about this PR ? https://github.com/kubernetes/kube-state-metrics/pull/1689

What this PR does / why we need it:

This PR Introduce a new flag --per-metric-labels-allowlist that its syntax works similar to --metric-labels-allowlist but instead of being a filter to add labels to kube_X_labels, it is a filter for K8S’ labels that will be added to each metric time series of a resource.

Motivation

Motivation for this change can be better described from a Platform Team POV, who is responsible for the observability stack and provides it to multiple teams and/or multiple tenants.

The goal is to make it easier to define queries, alerts, and rules without the need for complex joins. Making the barrier for smaller less experienced teams smaller. As well as alleviate pressure from Prometheus Server that’s constantly doing joins for each alert rule.

Use Case 1

A Development Team wants to create alerts for the multiple components of their applications.
Different Components have different alerts, severities, thresholds. (example, web pods, background consumers, different kinds of jobs) and as components live in the same namespace; filtering with a namespace is not feasible.
You’ll now have to use joins with kube_X_labels to filter for the specific resources. Complex queries become more complex, especially ones that had joins already.

Use Case 2

Platform Team defining general, default rules for every namespace.
Platform team expects resources to have a set of standard labels, that defines something like [alerting.xxx/severity](http://alerting.xxx/severity), [alerting.xxx/slack-channel](http://alerting.xxx/slack-channel) along side the [app.kubernetes.io/name](http://app.kubernetes.io/name), [app.kubernetes.io/component](http://app.kubernetes.io/component) ones.
Complex Rules are defined to join with these labels and generate alerts with variables based on labels defined by teams. Queries become even more complex as you want to join with more than one label.

Complex Queries Example

Deployment has not matched the expected number of replicas.

• Using only 1 label for joins

    (
     kube_deployment_spec_replicas{} * on (deployment) group_left(label_app_kubernetes_io_name) kube_deployment_labels{ label_app_kubernetes_io_name="emojiapp", namespace=~"emoji"}
     !=
     kube_deployment_status_replicas_available{} * on (deployment) group_left(label_app_kubernetes_io_name) kube_deployment_labels{ label_app_kubernetes_io_name="emojiapp", namespace=~"emoji"}
    ) 
    and 
    (
     changes(kube_deployment_status_replicas_updated{}[10m] ) * on (deployment) group_left(label_app_kubernetes_io_name) kube_deployment_labels{ label_app_kubernetes_io_name="emojiapp", namespace=~"emoji"}
     ==  0
    )
    
    

• Using 2 label for joins

    (
     kube_deployment_spec_replicas{} * on (deployment) group_left(label_app_kubernetes_io_name, label_alerting_severity) kube_deployment_labels{ label_app_kubernetes_io_name="emojiapp", label_alerting_severity="critical", namespace=~"emoji"}
     !=
     kube_deployment_status_replicas_available{} * on (deployment) group_left(label_app_kubernetes_io_name, label_alerting_severity) kube_deployment_labels{ label_app_kubernetes_io_name="emojiapp", label_alerting_severity="critical", namespace=~"emoji"}
    ) 
    and 
    (
     changes(kube_deployment_status_replicas_updated{}[10m] ) * on (deployment) group_left(label_app_kubernetes_io_name, label_alerting_severity) kube_deployment_labels{ label_app_kubernetes_io_name="emojiapp", label_alerting_severity="critical", namespace=~"emoji"}
     ==  0
    )
    
    

Same query but with labels as part of metric series:

•

    (
      kube_deployment_spec_replicas{label_app_kubernetes_io_name="emojiapp", namespace=~"emoji"} 
      !=
      kube_deployment_status_replicas_available{label_app_kubernetes_io_name="emojiapp", namespace=~"emoji"}
    ) 
    and 
    (
      changes(kube_deployment_status_replicas_updated{label_app_kubernetes_io_name="emojiapp", namespace=~"emoji"}[10m] )
    )
    
    

•

    (
      kube_deployment_spec_replicas{label_app_kubernetes_io_name="emojiapp", label_alerting_severity="critical", namespace=~"emoji"} 
      !=
      kube_deployment_status_replicas_available{label_app_kubernetes_io_name="emojiapp", label_alerting_severity="critical", namespace=~"emoji"}
    ) 
    and 
    (
      changes(kube_deployment_status_replicas_updated{label_app_kubernetes_io_name="emojiapp", label_alerting_severity="critical", namespace=~"emoji"}[10m] )
    )
    
    

Goal

• Making it easier for Platform and Development teams to create queries, rules, alerts, and dashboards for resources running in Kubernetes without the need for complex joins to filter resources. • Alleviate some pressure from the Prometheus Servers that are constantly running Joins for each alert rule.

Alternatives we tried

• Run recording rules to pre-compute the joins:
Having to have a recording rule for each metric series generated by KSM is cumbersome, and adds a dependency between teams and platform teams to add any recording rule. Not ideal especially in a multi-tenant environment.

How does this change affect the cardinality of KSM

(increases, decreases, or does not change cardinality)

• Unless explicitly using the new flag, this PR has no change to generated metrics cardinality. • Using the flag will add new labels for each metric series of each resource that has a label key that’s whitelisted. • Cardinality of the label values depends on how often these labels change for the same resource. • For the use-case behind this new feature, whitelisted labels are often labels that don’t change often. Admins should be cautious of what labels to whitelist.

Performance

• KSM already fetches each resource label during metric collection. • Prometheus Performance shouldn’t be affected as long as whitelisted labels are not changing constantly.

Misc

Which issue(s) this PR fixes:

• Fixes #1415

Relevant:

• Dynamic alert routing with Prometheus and Alertmanager

Notes:

• :warning: This is a Draft PR, the implementation is not final, PR is a working POC for pods resource, and is yet to be discussed.

2022-02-22

Or Azarzar

04:09:13 PM

Hi All Our Managed Kubernetes Clusters: Avoiding risky defaults, K8s threat modeling and securing EKS clusters webinar starts in less than an hour, still time to register

https://bit.ly/3LRJWwt

Welcome! You are invited to join a webinar: Managed Kubernetes Clusters: Avoiding risky defaults, K8s threat modeling and securing EKS clusters. After registering, you will receive a confirmation email about joining the webinar. attachment image

Learn how to navigate the creating of a secure by default K8s cluster, avoid risky default settings and permissions, and listen to some live threat modeling of security EKS clusters. Join Lightspin CISO Jonathan Rau and Director of Security Research Gafnit Amiga to discuss hot topics and tips for leveling up your Kubernetes security knowledge. Questions and topics covered include: - Avoiding risky default settings in your Kubernetes clusters - Creating a secure by default Kubernetes cluster - Unique supply chain risks for Kubernetes Bring your questions and notepads to this live webinar!

2022-02-24

Mr.Devops

08:05:40 AM

Hi we’re using the kubernetes provider here and I’m wondering how others are using this when they have multiply clusters/contexts being used? thx!

something simple as below would only allow a single context to be access, what if i have multiply contexts?

provider "kubernetes" {
  config_paths    = [
                    "~/.kube/config",
                    "~/.kube_another_path/config"
                    ]
  config_context = "cluster01
}

2022-02-25

Eric Berg

09:09:25 PM

Regarding an EKS upgrade from 1.18 to 1.21, one option that was proposed was to leave kube-proxy and the AMIs at the 1.18 versions to reduce the amount of change and expedite the process, following up with these upgrades (and probably more) in the future. I’m not comfortable with that one. Anybody have any thoughts on the matter? Thanks for any input.

rafa_d

06:44:54 PM

@Eric Berg you’d be outside the recommended version skew policy since the kubelet will be 3 minor versions older. Per https://kubernetes.io/releases/version-skew-policy/ kubelet must not be newer than kube-apiserver, and may be up to two minor versions older.

Version Skew Policy

The maximum version skew supported between various Kubernetes components.

Eric Berg

10:30:32 PM

Thanks for the feedback, @rafa_d. We are doing a fast-follow round of updates to core-dns and kube-proxy. Not optimal, but the logs are clean and so far so good.

#kubernetes (2022-02)

2022-02-01

2022-02-02

2022-02-03

2022-02-04

2022-02-09

2022-02-13

2022-02-14

2022-02-17

2022-02-19

2022-02-22

2022-02-24

2022-02-25

2022-02-28