#kubernetes (2021-06)
Archive: https://archive.sweetops.com/kubernetes/
2021-06-01
What do you, folks, have as a policy of modifying K8s yamls of running services directly? If you deploy with Helm, for example, is it allowed in your org to make one-off direct changes in yamls of running services in some special cases?
2021-06-06
Hey, does anyone have a reference for deploying the aws ALB controller with terraform (The new one, not the aws alb ingress controller) ? Every resource I come across utilises eksctl and the create service account step confuses me as I don’t know what the role and its trust relationship looks like.
2021-06-07
@Sean Turner how about https://registry.terraform.io/modules/iplabs/alb-ingress-controller/kubernetes/latest
I think I borrowed the service account from that, which is declared here: https://github.com/seanturner026/kube-playground/blob/509175b8d4fc1128b96a06a92a6c41b8c5fe1d73/terraform/resources.tf#L13-L30
And then I used the module in this code to create the role. For some reason I wasn’t able to get the role arn from the output though https://github.com/seanturner026/kube-playground/blob/509175b8d4fc1128b96a06a92a6c41b8c5fe1d73/terraform/modules.tf#L81-L90
2021-06-09
Hi All, Does anybody know a good guide/book/video that explains how you auto scale pods. More specifically, how to choose the best metric ?
I just came across this article yesterday: https://learnk8s.io/kubernetes-autoscaling-strategies
I haven’t read it yet myself, just added it to my “to read” list so far.
Learn how to size your cluster nodes, configure the Horizontal and Cluster Autoscaler, and overprovision your cluster for faster pod scaling.
2021-06-14
Hi folks, I am starting to migrate my terraform state and stuff to Terraform Cloud. So far so good, however now I encountered the following error, when migrating the module using the cloudposse eks modue.
│ Error: Get "<http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth>": dial tcp 127.0.0.1:80: connect: connection refused
│
│ with module.eks_cluster.kubernetes_config_map.aws_auth_ignore_changes[0],
│ on .terraform/modules/eks_cluster/auth.tf line 83, in resource "kubernetes_config_map" "aws_auth_ignore_changes":
│ 83: resource "kubernetes_config_map" "aws_auth_ignore_changes" {
│
Any ideas/hints?
I have tried to change the kubernetes provider, checked the IAM credentials. Still no clue
I saw this before, you need to disable exporting k8s settings to your local machine, in your case, tf cloud. I’m not sure what is the variable you need to set to false, also, the eks module from Anton’s work do the same.
Hi folks, I am starting to migrate my terraform state and stuff to Terraform Cloud. So far so good, however now I encountered the following error, when migrating the module using the cloudposse eks modue.
│ Error: Get "<http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth>": dial tcp 127.0.0.1:80: connect: connection refused
│
│ with module.eks_cluster.kubernetes_config_map.aws_auth_ignore_changes[0],
│ on .terraform/modules/eks_cluster/auth.tf line 83, in resource "kubernetes_config_map" "aws_auth_ignore_changes":
│ 83: resource "kubernetes_config_map" "aws_auth_ignore_changes" {
│
Any ideas/hints?
I have tried to change the kubernetes provider, checked the IAM credentials. Still no clue
Uhm, that sounds good. Any hint were to look for the flag?
2021-06-15
anyone have a good way to limit the amount of a type of pod actively deploying on a node? For example, if 6 api pods are all scheduled on a node at the same time, and the limit is 3, we only want 3 api pods actively spinning up on the node
topologySpreadConstraints https://kubernetes.io/blog/2020/05/introducing-podtopologyspread/
Author: Wei Huang (IBM), Aldo Culquicondor (Google) Managing Pods distribution across a cluster is hard. The well-known Kubernetes features for Pod affinity and anti-affinity, allow some control of Pod placement in different topologies. However, these features only resolve part of Pods distribution use cases: either place unlimited Pods to a single topology, or disallow two Pods to co-locate in the same topology. In between these two extreme cases, there is a common need to distribute the Pods evenly across the topologies, so as to achieve better cluster utilization and high availability of applications.
from what Ive read about podtopologyspread, this just schedules the type of pods evenly across all nodes? There could be a scenario where we see more of the same type of pods deployed on a node at the same time (i.e. a new k8s worker node added to the cluster)
I rarely see talks about AKS . Wondering if anyone is using it and have it prod ready?
2021-06-16
2021-06-17
- EKS Managed Node Groups … I just realized (if I realized correctly?) there’s no way to assign load balancer Target Groups when using managed node groups…
I guess this is the case because the “amazon way” would mainly be to use the AWS Load Balancer controller rather than relying on registering nodes in target groups…
but hang on… surely this is available via the Launch Template though… I must’ve completely forgotten this
ok. this is the roadmap item: https://github.com/aws/containers-roadmap/issues/709
Community Note Please vote on this issue by adding a reaction to the original issue to help the community and maintainers prioritize this request Please do not leave "+1" or "me to…
I love how speaking to myself on the CloudPosse Slack always make me look at that last place where the true clues are
this is exact reason why we choosed to used this module https://github.com/cloudposse/terraform-aws-eks-workers instead of managed node groups by AWS
Terraform module to provision an AWS AutoScaling Group, IAM Role, and Security Group for EKS Workers - cloudposse/terraform-aws-eks-workers
2021-06-21
What are folks average hpa settings at for cpu across your deployments for serious Kubernetes clusters? Have anything interesting to share?
Hi all, We are running Dev Kubernetes cluster version 1.11.4 (installing using kops in AWS) with PVC(ebs volume) size 3tb, and our app is not using much data so we want to shrink it to 1tb. after referring various links i come to know that, it is not able to shrink pvc(but it can be extended). is there any other way to achieve shrinking pvc.
Thanks
I can’t think of a way except copying the data over to a smaller volume and file system…
Thank you
2021-06-22
nice!
export KUBECTL_EXTERNAL_DIFF="git-diff"
~/bin/git-diff
:
#!/bin/bash
exec git diff --no-index --color --color-moved -- "$@"
…
kubectl diff -f foo.yaml
Anyone know how to get Let’s Encrypt certs to work w/ cert-manager so that sites are trusted?
How do you mean? It pretty much works out of the box :-)
Make sure you issue them in production mode and not staging
I think I was missing the step for creating a `
apiVersion: cert-manager.io/v1alpha2
kind: Certificate
Before I start referencing them in an ingress. I created the issuer.
Docs I was following (Azure) only documented private certs.
@Erik Osterman (Cloud Posse) Actually, honestly, I don’t know. I tried ingress-shim to automatically create the Certificate, I am able to get a certificate, but it is private, not safe.
I created an issue like this:
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: letsencrypt
spec:
acme:
server: <https://acme-v02.api.letsencrypt.org/directory>
email: $ACME_ISSUER_EMAIL
privateKeySecretRef:
name: letsencrypt
solvers:
- dns01:
azureDNS:
subscriptionID: $AZ_SUBSCRIPTION_ID
resourceGroupName: $AZ_RESOURCE_GROUP
hostedZoneName: $AZ_DNS_DOMAIN
# Azure Cloud Environment, default to AzurePublicCloud
environment: AzurePublicCloud
and Helmfile snippet with:
repositories:
...
- name: jetstack
url: <https://charts.jetstack.io>
releases:
...
- name: cert-manager
namespace: kube-addons
chart: jetstack/cert-manager
version: 1.4.0
values:
- installCRDs: true
And I only get private certificates, not ones with public trusted CAs.
oh, i’ve been bit by this before.
You need to delete the certificates that were issued previously.
It won’t reissue a trusted certificate, even if an untrusted one was already generated
Also, I suggest checking out our helmfile https://github.com/cloudposse/helmfiles/blob/master/releases/cert-manager/helmfile.yaml#L135-L156
Comprehensive Distribution of Helmfiles for Kubernetes - cloudposse/helmfiles
It’s nice to create 2 issuers, one for staging, one for prod.
That way the service can decide which to use.
Don’t know your setup, nonetheless I wonder if you are using Issuer (not ClusterIssuer) intentionally
I was using an issuer in following docs, I didn’t know different b/w, but now, looks like I want a ClusterIssuer.
Inspired from the SweetOp’s helmfile example, I am created these two, will test them today. I pray that it works.
• https://gist.github.com/darkn3rd/594e5ddcf27fe577e04e356884cf7e54
• https://gist.github.com/darkn3rd/66f065846da654c70a8e194957fef839
I am little fuzzy on the strategy, if I specify [hello.example.com](http://hello.example.com)
, [ratel.example.com](http://ratel.example.com)
, [alpha.example.com](http://alpha.example.com)
in ingress tls.hosts
, this will be 3 certificate requests? Should I specify a *.[example.com](http://example.com)
instead?
Would I do this for through the Certificate
([cert-manager.io/v1](http://cert-manager.io/v1)
), instead, and somehow reference this in the ingress?
I published results: https://joachim8675309.medium.com/aks-with-cert-manager-f24786e87b20
Using cert-manager add-on with AKS
2021-06-23
2021-06-25
hello, can you help me customize my ingress to allow other port than 80 ? here I need the port 5044. I have look here and there but did not found a solution to change the source port ..
$ kubectl get ing -n elk
NAME CLASS HOSTS ADDRESS PORTS AGE
ingress-logstash <none> logstash.mydomain 172.22.54.200 80 18m
my ingress:
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
name: ingress-logstash
namespace: elk
annotations:
# use the shared ingress-nginx
kubernetes.io/ingress.class: "nginx"
spec:
rules:
- host: logstash.mydomain
http:
paths:
- path: /
pathType: ImplementationSpecific # default
backend:
serviceName: logstash-main
servicePort: 5044
the ingress describe:
$ kubectl describe ingress ingress-logstash -n elk
Name: ingress-logstash
Namespace: elk
Address: 172.22.54.200
Default backend: default-http-backend:80 (<error: endpoints "default-http-backend" not found>)
Rules:
Host Path Backends
---- ---- --------
logstash.s-05.saas-fr
/ logstash-main:5044 (10.244.4.118:5044)
Annotations: kubernetes.io/ingress.class: nginx
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Sync 18m (x3 over 22m) nginx-ingress-controller Scheduled for sync
you probably need a second ingress-nginx controller with a different http port config
there is also a tcp service option which may work for you with a single controller, but it’s fairly dumb (just tcp, no knowledge of http)
external service ports in the helm chart: https://github.com/kubernetes/ingress-nginx/blob/4b95eefab026f871a1d1793c5527524ddaa06fcf/charts/ingress-nginx/values.yaml#L428
2021-06-27
Follow up newbie cert-manager
question. I would like to use a wild card cert for all of the web traffic, e.g. *.[example.com](http://example.com)
, so my question is, when I specify the tls map in the ingress, do I need to include that in the list of domain names for the certificate, just put the wild card there. The examples out there, lead one to believe you put all the domains in there that you want to use. Or should I use the Certificate CRD, then somehow reference that in the ingress. Really fuzzy on this.
2021-06-28
Anyone know of a way to force ArgoCD to do a sync that updates the helm revision so I can roll deployments?
2021-06-29
Hey all, had a kubernetes related autoscaling question, so we have this old k8s cluster that’s using kops, basically we’ve outgrown our max node size and we adjusted the scaling count from max 35 to max 40. I also made adjustments to the cluster-autoscaler to adjust the max state of it as well. We decided to fire off a job to test it but when we go to describe the pod it still says the max state is 35 nodes, any ideas on what’s up? Configs as follows:
command:
- ./cluster-autoscaler
- -v=4
- --cloud-provider=aws
- --namespace=kube-system
- --logtostderr=true
- --stderrthreshold=info
- --expander=least-waste
- --balance-similar-node-groups=true
- --skip-nodes-with-local-storage=false
# Repeat the below line for new ASG
- --nodes=5:40:nodes.k8s
The autoscaling group in AWS was also bumped to a max of 40 but this error persists
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 28s (x4 over 57s) default-scheduler 0/35 nodes are available: 35 Insufficient cpu, 35 Insufficient memory.
Normal NotTriggerScaleUp 6s (x5 over 47s) cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added)
2021-06-30
Hello,
we are using Nginx as our ingress controller.
we have an ingress with host, my.domain.com, and it was working fine for years, but recently if I try to access my.domain.com, randomly gives- “502 Bad Gateway (nginx/1.15.5)”
• we tried some of the quick debug for the 502 issues. • when the issue occurs we logged into nodes and did a curl request to the POD_IP:8080/index.html it worked • also we don’t have any ingress configured with the same host and path that might conflict • there is no recent pod restarts or event in ingress controller pod.
also when the “502 Bad Gateway (nginx/1.15.5)” occured ingress-controller pod shows.
2021/06/30 08:59:50 [error] 1050#1050: *1352377 connect() failed (111: Connection refused) while connecting to upstream, client: CLIENT_IP_HERE , server: [my.domain.com](http://my.domain.com) , request: "GET /index.html HTTP/2.0", upstream: "<http://POD_IP:8080/index.html>", host: "[my.domain.com](http://my.domain.com)", referrer: "<https://my.domain/index.html>"
so, according to a link The 502 HTTP status code returned by Nginx means that Nginx was unable to contact your application at the time of the request.
according to the statement above, there is an issue with the pod or the ingress-controller??
but most of the time my.domain.com is accessible and ISSUE LOOKS INTERMITTENT,
is any other place we need to check for logs….?or anyone experienced the same issue?
Thanks in advance.