#kubernetes (2021-06)

kubernetes

Archive: https://archive.sweetops.com/kubernetes/

2021-06-01

Andrew Nazarov avatar
Andrew Nazarov

What do you, folks, have as a policy of modifying K8s yamls of running services directly? If you deploy with Helm, for example, is it allowed in your org to make one-off direct changes in yamls of running services in some special cases?

2021-06-06

Sean Turner avatar
Sean Turner

Hey, does anyone have a reference for deploying the aws ALB controller with terraform (The new one, not the aws alb ingress controller) ? Every resource I come across utilises eksctl and the create service account step confuses me as I don’t know what the role and its trust relationship looks like.

2021-06-07

Brian A. avatar
Brian A.
Sean Turner avatar
Sean Turner

I think I borrowed the service account from that, which is declared here: https://github.com/seanturner026/kube-playground/blob/509175b8d4fc1128b96a06a92a6c41b8c5fe1d73/terraform/resources.tf#L13-L30

And then I used the module in this code to create the role. For some reason I wasn’t able to get the role arn from the output though https://github.com/seanturner026/kube-playground/blob/509175b8d4fc1128b96a06a92a6c41b8c5fe1d73/terraform/modules.tf#L81-L90

2021-06-09

Adnan avatar

Hi All, Does anybody know a good guide/book/video that explains how you auto scale pods. More specifically, how to choose the best metric ?

bradym avatar

I just came across this article yesterday: https://learnk8s.io/kubernetes-autoscaling-strategies

I haven’t read it yet myself, just added it to my “to read” list so far.

Architecting Kubernetes clusters — choosing the best autoscaling strategy

Learn how to size your cluster nodes, configure the Horizontal and Cluster Autoscaler, and overprovision your cluster for faster pod scaling.

2021-06-14

rei avatar

Hi folks, I am starting to migrate my terraform state and stuff to Terraform Cloud. So far so good, however now I encountered the following error, when migrating the module using the cloudposse eks modue.

│ Error: Get "<http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth>": dial tcp 127.0.0.1:80: connect: connection refused
│ 
│   with module.eks_cluster.kubernetes_config_map.aws_auth_ignore_changes[0],
│   on .terraform/modules/eks_cluster/auth.tf line 83, in resource "kubernetes_config_map" "aws_auth_ignore_changes":
│   83: resource "kubernetes_config_map" "aws_auth_ignore_changes" {
│ 

Any ideas/hints?

I have tried to change the kubernetes provider, checked the IAM credentials. Still no clue

Mohammed Yahya avatar
Mohammed Yahya

I saw this before, you need to disable exporting k8s settings to your local machine, in your case, tf cloud. I’m not sure what is the variable you need to set to false, also, the eks module from Anton’s work do the same.

Hi folks, I am starting to migrate my terraform state and stuff to Terraform Cloud. So far so good, however now I encountered the following error, when migrating the module using the cloudposse eks modue.

│ Error: Get "<http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth>": dial tcp 127.0.0.1:80: connect: connection refused
│ 
│   with module.eks_cluster.kubernetes_config_map.aws_auth_ignore_changes[0],
│   on .terraform/modules/eks_cluster/auth.tf line 83, in resource "kubernetes_config_map" "aws_auth_ignore_changes":
│   83: resource "kubernetes_config_map" "aws_auth_ignore_changes" {
│ 

Any ideas/hints?

I have tried to change the kubernetes provider, checked the IAM credentials. Still no clue

rei avatar

Uhm, that sounds good. Any hint were to look for the flag?

2021-06-15

btai avatar

anyone have a good way to limit the amount of a type of pod actively deploying on a node? For example, if 6 api pods are all scheduled on a node at the same time, and the limit is 3, we only want 3 api pods actively spinning up on the node

Kyle Johnson avatar
Kyle Johnson
Introducing PodTopologySpread

Author: Wei Huang (IBM), Aldo Culquicondor (Google) Managing Pods distribution across a cluster is hard. The well-known Kubernetes features for Pod affinity and anti-affinity, allow some control of Pod placement in different topologies. However, these features only resolve part of Pods distribution use cases: either place unlimited Pods to a single topology, or disallow two Pods to co-locate in the same topology. In between these two extreme cases, there is a common need to distribute the Pods evenly across the topologies, so as to achieve better cluster utilization and high availability of applications.

1
btai avatar

from what Ive read about podtopologyspread, this just schedules the type of pods evenly across all nodes? There could be a scenario where we see more of the same type of pods deployed on a node at the same time (i.e. a new k8s worker node added to the cluster)

Mr.Devops avatar
Mr.Devops

I rarely see talks about AKS . Wondering if anyone is using it and have it prod ready?

2021-06-16

2021-06-17

mfridh avatar

aws - EKS Managed Node Groups … I just realized (if I realized correctly?) there’s no way to assign load balancer Target Groups when using managed node groups…

I guess this is the case because the “amazon way” would mainly be to use the AWS Load Balancer controller rather than relying on registering nodes in target groups…

mfridh avatar

but hang on… surely this is available via the Launch Template though… I must’ve completely forgotten this

mfridh avatar
[EKS] [request]: EKS managed node group support for ASG target group · Issue #709 · aws/containers-roadmapattachment image

Community Note Please vote on this issue by adding a reaction to the original issue to help the community and maintainers prioritize this request Please do not leave &quot;+1&quot; or &quot;me to…

mfridh avatar

I love how speaking to myself on the CloudPosse Slack always make me look at that last place where the true clues are

1
Issif avatar

this is exact reason why we choosed to used this module https://github.com/cloudposse/terraform-aws-eks-workers instead of managed node groups by AWS

cloudposse/terraform-aws-eks-workersattachment image

Terraform module to provision an AWS AutoScaling Group, IAM Role, and Security Group for EKS Workers - cloudposse/terraform-aws-eks-workers

2021-06-21

mfridh avatar

What are folks average hpa settings at for cpu across your deployments for serious Kubernetes clusters? Have anything interesting to share?

Shreyank Sharma avatar
Shreyank Sharma

Hi all, We are running Dev Kubernetes cluster version 1.11.4 (installing using kops in AWS) with PVC(ebs volume) size 3tb, and our app is not using much data so we want to shrink it to 1tb. after referring various links i come to know that, it is not able to shrink pvc(but it can be extended). is there any other way to achieve shrinking pvc.

Thanks

mfridh avatar

I can’t think of a way except copying the data over to a smaller volume and file system…

Shreyank Sharma avatar
Shreyank Sharma

Thank you

2021-06-22

mfridh avatar

nice!

export KUBECTL_EXTERNAL_DIFF="git-diff"

~/bin/git-diff:

#!/bin/bash
exec git diff --no-index --color --color-moved -- "$@"

kubectl diff -f foo.yaml
1
Joaquin Menchaca avatar
Joaquin Menchaca

Anyone know how to get Let’s Encrypt certs to work w/ cert-manager so that sites are trusted?

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

How do you mean? It pretty much works out of the box :-)

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

Make sure you issue them in production mode and not staging

Joaquin Menchaca avatar
Joaquin Menchaca

I think I was missing the step for creating a `

apiVersion: cert-manager.io/v1alpha2
kind: Certificate

Before I start referencing them in an ingress. I created the issuer.

Docs I was following (Azure) only documented private certs.

Joaquin Menchaca avatar
Joaquin Menchaca

@Erik Osterman (Cloud Posse) Actually, honestly, I don’t know. I tried ingress-shim to automatically create the Certificate, I am able to get a certificate, but it is private, not safe.

Joaquin Menchaca avatar
Joaquin Menchaca

I created an issue like this:

apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: letsencrypt
spec:
  acme:
    server: <https://acme-v02.api.letsencrypt.org/directory>
    email: $ACME_ISSUER_EMAIL
    privateKeySecretRef:
      name: letsencrypt
    solvers:
    - dns01:
        azureDNS:
          subscriptionID: $AZ_SUBSCRIPTION_ID
          resourceGroupName: $AZ_RESOURCE_GROUP
          hostedZoneName: $AZ_DNS_DOMAIN
          # Azure Cloud Environment, default to AzurePublicCloud
          environment: AzurePublicCloud

and Helmfile snippet with:

repositories:
  ...
  - name: jetstack
    url: <https://charts.jetstack.io>

releases:
  ... 
  - name: cert-manager
    namespace: kube-addons
    chart: jetstack/cert-manager
    version: 1.4.0
    values:
      - installCRDs: true

And I only get private certificates, not ones with public trusted CAs.

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

oh, i’ve been bit by this before.

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

You need to delete the certificates that were issued previously.

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

It won’t reissue a trusted certificate, even if an untrusted one was already generated

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)
cloudposse/helmfilesattachment image

Comprehensive Distribution of Helmfiles for Kubernetes - cloudposse/helmfiles

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

It’s nice to create 2 issuers, one for staging, one for prod.

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

That way the service can decide which to use.

Andrew Nazarov avatar
Andrew Nazarov

Don’t know your setup, nonetheless I wonder if you are using Issuer (not ClusterIssuer) intentionally

Joaquin Menchaca avatar
Joaquin Menchaca

I was using an issuer in following docs, I didn’t know different b/w, but now, looks like I want a ClusterIssuer.

Joaquin Menchaca avatar
Joaquin Menchaca

Inspired from the SweetOp’s helmfile example, I am created these two, will test them today. I pray that it works.

https://gist.github.com/darkn3rd/594e5ddcf27fe577e04e356884cf7e54https://gist.github.com/darkn3rd/66f065846da654c70a8e194957fef839 I am little fuzzy on the strategy, if I specify [hello.example.com](http://hello.example.com), [ratel.example.com](http://ratel.example.com), [alpha.example.com](http://alpha.example.com) in ingress tls.hosts, this will be 3 certificate requests? Should I specify a *.[example.com](http://example.com) instead?

Would I do this for through the Certificate ([cert-manager.io/v1](http://cert-manager.io/v1)), instead, and somehow reference this in the ingress?

Joaquin Menchaca avatar
Joaquin Menchaca

Got it working.

1
Joaquin Menchaca avatar
Joaquin Menchaca
AKS with Cert Managerattachment image

Using cert-manager add-on with AKS

1

2021-06-23

2021-06-25

Pierre-Yves avatar
Pierre-Yves

hello, can you help me customize my ingress to allow other port than 80 ? here I need the port 5044. I have look here and there but did not found a solution to change the source port ..

$ kubectl get ing -n elk
NAME               CLASS    HOSTS                   ADDRESS         PORTS   AGE
ingress-logstash   <none>   logstash.mydomain   172.22.54.200       80      18m

my ingress:

apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: ingress-logstash
  namespace: elk
  annotations:
    # use the shared ingress-nginx
    kubernetes.io/ingress.class: "nginx"
spec:
  rules:
  - host: logstash.mydomain
    http:
      paths:
      - path: /
        pathType: ImplementationSpecific # default
        backend:
          serviceName: logstash-main
          servicePort: 5044

the ingress describe:

$ kubectl describe ingress ingress-logstash -n elk
Name:             ingress-logstash
Namespace:        elk
Address:          172.22.54.200
Default backend:  default-http-backend:80 (<error: endpoints "default-http-backend" not found>)
Rules:
  Host                   Path  Backends
  ----                   ----  --------
  logstash.s-05.saas-fr
                         /   logstash-main:5044 (10.244.4.118:5044)
Annotations:             kubernetes.io/ingress.class: nginx
Events:
  Type    Reason  Age                From                      Message
  ----    ------  ----               ----                      -------
  Normal  Sync    18m (x3 over 22m)  nginx-ingress-controller  Scheduled for sync
Kyle Johnson avatar
Kyle Johnson

you probably need a second ingress-nginx controller with a different http port config

Kyle Johnson avatar
Kyle Johnson

there is also a tcp service option which may work for you with a single controller, but it’s fairly dumb (just tcp, no knowledge of http)

2021-06-27

Joaquin Menchaca avatar
Joaquin Menchaca

Follow up newbie cert-manager question. I would like to use a wild card cert for all of the web traffic, e.g. *.[example.com](http://example.com), so my question is, when I specify the tls map in the ingress, do I need to include that in the list of domain names for the certificate, just put the wild card there. The examples out there, lead one to believe you put all the domains in there that you want to use. Or should I use the Certificate CRD, then somehow reference that in the ingress. Really fuzzy on this.

2021-06-28

tomv avatar

Anyone know of a way to force ArgoCD to do a sync that updates the helm revision so I can roll deployments?

2021-06-29

Aumkar Prajapati avatar
Aumkar Prajapati

Hey all, had a kubernetes related autoscaling question, so we have this old k8s cluster that’s using kops, basically we’ve outgrown our max node size and we adjusted the scaling count from max 35 to max 40. I also made adjustments to the cluster-autoscaler to adjust the max state of it as well. We decided to fire off a job to test it but when we go to describe the pod it still says the max state is 35 nodes, any ideas on what’s up? Configs as follows:

        command:
        - ./cluster-autoscaler
        - -v=4
        - --cloud-provider=aws
        - --namespace=kube-system
        - --logtostderr=true
        - --stderrthreshold=info
        - --expander=least-waste
        - --balance-similar-node-groups=true
        - --skip-nodes-with-local-storage=false
        # Repeat the below line for new ASG
        - --nodes=5:40:nodes.k8s

The autoscaling group in AWS was also bumped to a max of 40 but this error persists

  Type     Reason             Age                From                Message
  ----     ------             ----               ----                -------
  Warning  FailedScheduling   28s (x4 over 57s)  default-scheduler   0/35 nodes are available: 35 Insufficient cpu, 35 Insufficient memory.
  Normal   NotTriggerScaleUp  6s (x5 over 47s)   cluster-autoscaler  pod didn't trigger scale-up (it wouldn't fit if a new node is added)

2021-06-30

Shreyank Sharma avatar
Shreyank Sharma

Hello,

we are using Nginx as our ingress controller.

we have an ingress with host, my.domain.com, and it was working fine for years, but recently if I try to access my.domain.com, randomly gives- “502 Bad Gateway (nginx/1.15.5)”

• we tried some of the quick debug for the 502 issues. • when the issue occurs we logged into nodes and did a curl request to the POD_IP:8080/index.html it worked • also we don’t have any ingress configured with the same host and path that might conflict • there is no recent pod restarts or event in ingress controller pod.

also when the “502 Bad Gateway (nginx/1.15.5)” occured ingress-controller pod shows. 2021/06/30 08:59:50 [error] 1050#1050: *1352377 connect() failed (111: Connection refused) while connecting to upstream, client: CLIENT_IP_HERE , server: [my.domain.com](http://my.domain.com) , request: "GET /index.html HTTP/2.0", upstream: "<http://POD_IP:8080/index.html>", host: "[my.domain.com](http://my.domain.com)", referrer: "<https://my.domain/index.html>"

so, according to a link The 502 HTTP status code returned by Nginx means that Nginx was unable to contact your application at the time of the request.

according to the statement above, there is an issue with the pod or the ingress-controller??

but most of the time my.domain.com is accessible and ISSUE LOOKS INTERMITTENT,

is any other place we need to check for logs….?or anyone experienced the same issue?

Thanks in advance.

    keyboard_arrow_up