SweetOps #kubernetes for June, 2021

Archive: https://archive.sweetops.com/kubernetes/

2021-06-01

Andrew Nazarov

What do you, folks, have as a policy of modifying K8s yamls of running services directly? If you deploy with Helm, for example, is it allowed in your org to make one-off direct changes in yamls of running services in some special cases?

2021-06-06

Sean Turner

01:41:59 AM

Hey, does anyone have a reference for deploying the aws ALB controller with terraform (The new one, not the aws alb ingress controller) ? Every resource I come across utilises eksctl and the create service account step confuses me as I don’t know what the role and its trust relationship looks like.

2021-06-07

Brian A.

01:54:02 PM

@Sean Turner how about https://registry.terraform.io/modules/iplabs/alb-ingress-controller/kubernetes/latest

Sean Turner

08:41:32 PM

I think I borrowed the service account from that, which is declared here: https://github.com/seanturner026/kube-playground/blob/509175b8d4fc1128b96a06a92a6c41b8c5fe1d73/terraform/resources.tf#L13-L30

And then I used the module in this code to create the role. For some reason I wasn’t able to get the role arn from the output though https://github.com/seanturner026/kube-playground/blob/509175b8d4fc1128b96a06a92a6c41b8c5fe1d73/terraform/modules.tf#L81-L90

2021-06-09

Adnan

08:32:06 PM

Hi All, Does anybody know a good guide/book/video that explains how you auto scale pods. More specifically, how to choose the best metric ?

bradym

09:03:17 PM

I just came across this article yesterday: https://learnk8s.io/kubernetes-autoscaling-strategies

I haven’t read it yet myself, just added it to my “to read” list so far.

Architecting Kubernetes clusters — choosing the best autoscaling strategy

Learn how to size your cluster nodes, configure the Horizontal and Cluster Autoscaler, and overprovision your cluster for faster pod scaling.

2021-06-14

rei

12:19:28 PM

https://sweetops.slack.com/archives/CB6GHNLG0/p1623673123358300

Hi folks, I am starting to migrate my terraform state and stuff to Terraform Cloud. So far so good, however now I encountered the following error, when migrating the module using the cloudposse eks modue.

│ Error: Get "<http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth>": dial tcp 127.0.0.1:80: connect: connection refused
│ 
│   with module.eks_cluster.kubernetes_config_map.aws_auth_ignore_changes[0],
│   on .terraform/modules/eks_cluster/auth.tf line 83, in resource "kubernetes_config_map" "aws_auth_ignore_changes":
│   83: resource "kubernetes_config_map" "aws_auth_ignore_changes" {
│ 

Any ideas/hints?

I have tried to change the kubernetes provider, checked the IAM credentials. Still no clue

Mohammed Yahya

12:43:10 PM

I saw this before, you need to disable exporting k8s settings to your local machine, in your case, tf cloud. I’m not sure what is the variable you need to set to false, also, the eks module from Anton’s work do the same.

│ Error: Get "<http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth>": dial tcp 127.0.0.1:80: connect: connection refused
│ 
│   with module.eks_cluster.kubernetes_config_map.aws_auth_ignore_changes[0],
│   on .terraform/modules/eks_cluster/auth.tf line 83, in resource "kubernetes_config_map" "aws_auth_ignore_changes":
│   83: resource "kubernetes_config_map" "aws_auth_ignore_changes" {
│ 

Any ideas/hints?

I have tried to change the kubernetes provider, checked the IAM credentials. Still no clue

rei

05:50:33 PM

Uhm, that sounds good. Any hint were to look for the flag?

2021-06-15

btai

04:50:58 PM

anyone have a good way to limit the amount of a type of pod actively deploying on a node? For example, if 6 api pods are all scheduled on a node at the same time, and the limit is 3, we only want 3 api pods actively spinning up on the node

Kyle Johnson

09:32:56 PM

topologySpreadConstraints https://kubernetes.io/blog/2020/05/introducing-podtopologyspread/

Introducing PodTopologySpread

Author: Wei Huang (IBM), Aldo Culquicondor (Google) Managing Pods distribution across a cluster is hard. The well-known Kubernetes features for Pod affinity and anti-affinity, allow some control of Pod placement in different topologies. However, these features only resolve part of Pods distribution use cases: either place unlimited Pods to a single topology, or disallow two Pods to co-locate in the same topology. In between these two extreme cases, there is a common need to distribute the Pods evenly across the topologies, so as to achieve better cluster utilization and high availability of applications.

btai

06:20:38 PM

from what Ive read about podtopologyspread, this just schedules the type of pods evenly across all nodes? There could be a scenario where we see more of the same type of pods deployed on a node at the same time (i.e. a new k8s worker node added to the cluster)

Mr.Devops

08:19:55 PM

I rarely see talks about AKS . Wondering if anyone is using it and have it prod ready?

2021-06-16

2021-06-17

mfridh

05:51:18 PM

- EKS Managed Node Groups … I just realized (if I realized correctly?) there’s no way to assign load balancer Target Groups when using managed node groups…

I guess this is the case because the “amazon way” would mainly be to use the AWS Load Balancer controller rather than relying on registering nodes in target groups…

mfridh

05:52:28 PM

but hang on… surely this is available via the Launch Template though… I must’ve completely forgotten this

mfridh

05:57:28 PM

ok. this is the roadmap item: https://github.com/aws/containers-roadmap/issues/709

[EKS] [request]: EKS managed node group support for ASG target group · Issue #709 · aws/containers-roadmap attachment image

Community Note Please vote on this issue by adding a reaction to the original issue to help the community and maintainers prioritize this request Please do not leave "+1" or "me to…

mfridh

06:01:19 PM

I love how speaking to myself on the CloudPosse Slack always make me look at that last place where the true clues are

Issif

06:33:45 PM

this is exact reason why we choosed to used this module https://github.com/cloudposse/terraform-aws-eks-workers instead of managed node groups by AWS

cloudposse/terraform-aws-eks-workers attachment image

Terraform module to provision an AWS AutoScaling Group, IAM Role, and Security Group for EKS Workers - cloudposse/terraform-aws-eks-workers

2021-06-21

mfridh

01:14:46 PM

What are folks average hpa settings at for cpu across your deployments for serious Kubernetes clusters? Have anything interesting to share?

Shreyank Sharma

05:02:09 PM

Hi all, We are running Dev Kubernetes cluster version 1.11.4 (installing using kops in AWS) with PVC(ebs volume) size 3tb, and our app is not using much data so we want to shrink it to 1tb. after referring various links i come to know that, it is not able to shrink pvc(but it can be extended). is there any other way to achieve shrinking pvc.

Thanks

mfridh

07:30:44 AM

I can’t think of a way except copying the data over to a smaller volume and file system…

Shreyank Sharma

07:36:19 AM

Thank you

2021-06-22

mfridh

04:02:08 PM

nice!

export KUBECTL_EXTERNAL_DIFF="git-diff"

~/bin/git-diff:

#!/bin/bash
exec git diff --no-index --color --color-moved -- "$@"

…

kubectl diff -f foo.yaml

Joaquin Menchaca

02:57:05 AM

Anyone know how to get Let’s Encrypt certs to work w/ cert-manager so that sites are trusted?

Erik Osterman (Cloud Posse)

02:59:16 AM

How do you mean? It pretty much works out of the box :-)

Erik Osterman (Cloud Posse)

02:59:34 AM

Make sure you issue them in production mode and not staging

Joaquin Menchaca

03:00:29 AM

I think I was missing the step for creating a `

apiVersion: cert-manager.io/v1alpha2
kind: Certificate

Before I start referencing them in an ingress. I created the issuer.

Docs I was following (Azure) only documented private certs.

Joaquin Menchaca

03:04:32 AM

@Erik Osterman (Cloud Posse) Actually, honestly, I don’t know. I tried ingress-shim to automatically create the Certificate, I am able to get a certificate, but it is private, not safe.

Joaquin Menchaca

03:14:15 AM

I created an issue like this:

apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: letsencrypt
spec:
  acme:
    server: <https://acme-v02.api.letsencrypt.org/directory>
    email: $ACME_ISSUER_EMAIL
    privateKeySecretRef:
      name: letsencrypt
    solvers:
    - dns01:
        azureDNS:
          subscriptionID: $AZ_SUBSCRIPTION_ID
          resourceGroupName: $AZ_RESOURCE_GROUP
          hostedZoneName: $AZ_DNS_DOMAIN
          # Azure Cloud Environment, default to AzurePublicCloud
          environment: AzurePublicCloud

and Helmfile snippet with:

repositories:
  ...
  - name: jetstack
    url: <https://charts.jetstack.io>

releases:
  ... 
  - name: cert-manager
    namespace: kube-addons
    chart: jetstack/cert-manager
    version: 1.4.0
    values:
      - installCRDs: true

And I only get private certificates, not ones with public trusted CAs.

Erik Osterman (Cloud Posse)

02:48:06 PM

oh, i’ve been bit by this before.

Erik Osterman (Cloud Posse)

02:48:15 PM

You need to delete the certificates that were issued previously.

Erik Osterman (Cloud Posse)

02:48:38 PM

It won’t reissue a trusted certificate, even if an untrusted one was already generated

Erik Osterman (Cloud Posse)

02:48:48 PM

Also, I suggest checking out our helmfile https://github.com/cloudposse/helmfiles/blob/master/releases/cert-manager/helmfile.yaml#L135-L156

cloudposse/helmfiles attachment image

Comprehensive Distribution of Helmfiles for Kubernetes - cloudposse/helmfiles

Erik Osterman (Cloud Posse)

02:48:57 PM

It’s nice to create 2 issuers, one for staging, one for prod.

Erik Osterman (Cloud Posse)

02:49:06 PM

That way the service can decide which to use.

Andrew Nazarov

06:21:08 PM

Don’t know your setup, nonetheless I wonder if you are using Issuer (not ClusterIssuer) intentionally

Joaquin Menchaca

07:12:33 PM

I was using an issuer in following docs, I didn’t know different b/w, but now, looks like I want a ClusterIssuer.

Joaquin Menchaca

09:36:31 PM

Inspired from the SweetOp’s helmfile example, I am created these two, will test them today. I pray that it works.

• https://gist.github.com/darkn3rd/594e5ddcf27fe577e04e356884cf7e54 • https://gist.github.com/darkn3rd/66f065846da654c70a8e194957fef839 I am little fuzzy on the strategy, if I specify [hello.example.com](http://hello.example.com), [ratel.example.com](http://ratel.example.com), [alpha.example.com](http://alpha.example.com) in ingress tls.hosts, this will be 3 certificate requests? Should I specify a *.[example.com](http://example.com) instead?

Would I do this for through the Certificate ([cert-manager.io/v1](http://cert-manager.io/v1)), instead, and somehow reference this in the ingress?

Joaquin Menchaca

07:44:56 AM

Got it working.

Joaquin Menchaca

09:44:17 PM

I published results: https://joachim8675309.medium.com/aks-with-cert-manager-f24786e87b20

AKS with Cert Manager attachment image

Using cert-manager add-on with AKS

2021-06-23

2021-06-25

Pierre-Yves

12:45:20 PM

hello, can you help me customize my ingress to allow other port than 80 ? here I need the port 5044. I have look here and there but did not found a solution to change the source port ..

$ kubectl get ing -n elk
NAME               CLASS    HOSTS                   ADDRESS         PORTS   AGE
ingress-logstash   <none>   logstash.mydomain   172.22.54.200       80      18m

my ingress:

apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: ingress-logstash
  namespace: elk
  annotations:
    # use the shared ingress-nginx
    kubernetes.io/ingress.class: "nginx"
spec:
  rules:
  - host: logstash.mydomain
    http:
      paths:
      - path: /
        pathType: ImplementationSpecific # default
        backend:
          serviceName: logstash-main
          servicePort: 5044

the ingress describe:

$ kubectl describe ingress ingress-logstash -n elk
Name:             ingress-logstash
Namespace:        elk
Address:          172.22.54.200
Default backend:  default-http-backend:80 (<error: endpoints "default-http-backend" not found>)
Rules:
  Host                   Path  Backends
  ----                   ----  --------
  logstash.s-05.saas-fr
                         /   logstash-main:5044 (10.244.4.118:5044)
Annotations:             kubernetes.io/ingress.class: nginx
Events:
  Type    Reason  Age                From                      Message
  ----    ------  ----               ----                      -------
  Normal  Sync    18m (x3 over 22m)  nginx-ingress-controller  Scheduled for sync

Kyle Johnson

07:07:10 PM

you probably need a second ingress-nginx controller with a different http port config

Kyle Johnson

07:08:01 PM

there is also a tcp service option which may work for you with a single controller, but it’s fairly dumb (just tcp, no knowledge of http)

Kyle Johnson

07:09:10 PM

external service ports in the helm chart: https://github.com/kubernetes/ingress-nginx/blob/4b95eefab026f871a1d1793c5527524ddaa06fcf/charts/ingress-nginx/values.yaml#L428

2021-06-27

Joaquin Menchaca

08:06:29 PM

Follow up newbie cert-manager question. I would like to use a wild card cert for all of the web traffic, e.g. *.[example.com](http://example.com), so my question is, when I specify the tls map in the ingress, do I need to include that in the list of domain names for the certificate, just put the wild card there. The examples out there, lead one to believe you put all the domains in there that you want to use. Or should I use the Certificate CRD, then somehow reference that in the ingress. Really fuzzy on this.

2021-06-28

tomv

09:21:11 PM

Anyone know of a way to force ArgoCD to do a sync that updates the helm revision so I can roll deployments?

2021-06-29

Aumkar Prajapati

02:41:00 PM

Hey all, had a kubernetes related autoscaling question, so we have this old k8s cluster that’s using kops, basically we’ve outgrown our max node size and we adjusted the scaling count from max 35 to max 40. I also made adjustments to the cluster-autoscaler to adjust the max state of it as well. We decided to fire off a job to test it but when we go to describe the pod it still says the max state is 35 nodes, any ideas on what’s up? Configs as follows:

        command:
        - ./cluster-autoscaler
        - -v=4
        - --cloud-provider=aws
        - --namespace=kube-system
        - --logtostderr=true
        - --stderrthreshold=info
        - --expander=least-waste
        - --balance-similar-node-groups=true
        - --skip-nodes-with-local-storage=false
        # Repeat the below line for new ASG
        - --nodes=5:40:nodes.k8s

The autoscaling group in AWS was also bumped to a max of 40 but this error persists

  Type     Reason             Age                From                Message
  ----     ------             ----               ----                -------
  Warning  FailedScheduling   28s (x4 over 57s)  default-scheduler   0/35 nodes are available: 35 Insufficient cpu, 35 Insufficient memory.
  Normal   NotTriggerScaleUp  6s (x5 over 47s)   cluster-autoscaler  pod didn't trigger scale-up (it wouldn't fit if a new node is added)

2021-06-30

Shreyank Sharma

01:55:02 PM

Hello,

we are using Nginx as our ingress controller.

we have an ingress with host, my.domain.com, and it was working fine for years, but recently if I try to access my.domain.com, randomly gives- “502 Bad Gateway (nginx/1.15.5)”

• we tried some of the quick debug for the 502 issues. • when the issue occurs we logged into nodes and did a curl request to the POD_IP:8080/index.html it worked • also we don’t have any ingress configured with the same host and path that might conflict • there is no recent pod restarts or event in ingress controller pod.

also when the “502 Bad Gateway (nginx/1.15.5)” occured ingress-controller pod shows. 2021/06/30 08:59:50 [error] 1050#1050: *1352377 connect() failed (111: Connection refused) while connecting to upstream, client: CLIENT_IP_HERE , server: [my.domain.com](http://my.domain.com) , request: "GET /index.html HTTP/2.0", upstream: "<http://POD_IP:8080/index.html>", host: "[my.domain.com](http://my.domain.com)", referrer: "<https://my.domain/index.html>"

so, according to a link The 502 HTTP status code returned by Nginx means that Nginx was unable to contact your application at the time of the request.

according to the statement above, there is an issue with the pod or the ingress-controller??

but most of the time my.domain.com is accessible and ISSUE LOOKS INTERMITTENT,

is any other place we need to check for logs….?or anyone experienced the same issue?

Thanks in advance.