#kubernetes (2020-07)
Archive: https://archive.sweetops.com/kubernetes/
2020-07-01
data:image/s3,"s3://crabby-images/43114/43114676a67bb6cf8c2363d3df4c4916ae8b99ce" alt="David Medinets avatar"
When I set the kubelet-certificate-authority
flag in kube-apiserver.yaml, I am running into the following message when trying to start a pod. I am using kubespray to provision the AWS cluster.
Error from server: Get <https://10.250.205.173:10250/containerLogs/default/bash-shell-d8bd1/bash-shell-d8bd1>: x509: cannot validate certificate for 10.250.205.173 because it doesn't contain any IP SANs
I think this is the controller node getting status from a worker node. The information that I found about this issue is:
That message is coming from the master trying to connect to the node (the flow of traffic
is kubectl -> master API -> kubelet -> container). When starting the master, are you
setting --kubelet_certificate_authority? If so, the master expects to be able to validate
the kubelet's serving cert, which means it needs to be valid for the hostnames/IP addresses
the master uses to connect to it.
Any help to resolve this issue would be appreciated.
data:image/s3,"s3://crabby-images/43114/43114676a67bb6cf8c2363d3df4c4916ae8b99ce" alt="David Medinets avatar"
I question why the master is using an IP address for the worker node. I’ve been trying to find information about kubelet-preferred-address-types
. I wonder if I can change that setting.
data:image/s3,"s3://crabby-images/43114/43114676a67bb6cf8c2363d3df4c4916ae8b99ce" alt="David Medinets avatar"
I set --kubelet-preferred-address-types=InternalDNS
(and just this value). Then tried to start a pod. This error was displayed:
Error from server: no preferred addresses found; known addresses: [{InternalIP 10.250.205.173} {Hostname ip-10-250-205-173.ec2.internal}]
Edit: The InternalIP
and Hostname
are literally telling me what is acceptable as address type. When I add Hostname, the error changes:
Error from server: Get <https://ip-10-250-205-173.ec2.internal:10250/containerLogs/kube-system/nodelocaldns-s8mfk/node-cache>: x509: certificate signed by unknown authority
2020-07-02
data:image/s3,"s3://crabby-images/797a6/797a63879662c679236c619c29aaad59ec6f5f3a" alt="soumya avatar"
Is there a way I can prevent creation of new config maps every time I deploy through helm.
data:image/s3,"s3://crabby-images/b83f6/b83f6dcd726008d8a8574d12c60fb1882ad1fce0" alt="Tim Birkett avatar"
That is how Helm keeps a history of each deployment state and works out what has changed between deployments.
data:image/s3,"s3://crabby-images/b83f6/b83f6dcd726008d8a8574d12c60fb1882ad1fce0" alt="Tim Birkett avatar"
You can set a lower number of releases to keep in history with the --history-max
flag, the default is 10.
data:image/s3,"s3://crabby-images/b83f6/b83f6dcd726008d8a8574d12c60fb1882ad1fce0" alt="Tim Birkett avatar"
Set to 1 if you don’t think you’ll ever want to roll back to releases earlier than the last one… Setting 0 means “no limit” rather than no history.
data:image/s3,"s3://crabby-images/797a6/797a63879662c679236c619c29aaad59ec6f5f3a" alt="soumya avatar"
Thanks @Tim Birkett so basically while doing helm init we need to pass this flag?
helm init --history-max 2
data:image/s3,"s3://crabby-images/6ed29/6ed2936fc5e2cb980f4b7bc052d9c7bf1978299e" alt="Issif avatar"
Hi, I’m trying https://github.com/cloudposse/terraform-aws-eks-cluster and for an unknown reason, even if all my ressources seem OK, my cluster has 0 nodes.
Terraform module for provisioning an EKS cluster. Contribute to cloudposse/terraform-aws-eks-cluster development by creating an account on GitHub.
data:image/s3,"s3://crabby-images/b83f6/b83f6dcd726008d8a8574d12c60fb1882ad1fce0" alt="Tim Birkett avatar"
Hey @Issif - Did you implement: https://github.com/cloudposse/terraform-aws-eks-node-group to create nodes?
Terraform module for provisioning an EKS cluster. Contribute to cloudposse/terraform-aws-eks-cluster development by creating an account on GitHub.
data:image/s3,"s3://crabby-images/6ed29/6ed2936fc5e2cb980f4b7bc052d9c7bf1978299e" alt="Issif avatar"
@Tim Birkett nope, I prefer the solution with workers, as I can managed them more precisely and add them in a target group with auto-scaling enabled
data:image/s3,"s3://crabby-images/6ed29/6ed2936fc5e2cb980f4b7bc052d9c7bf1978299e" alt="Issif avatar"
I hope I found the issue, a new deployment is ocurring right now
data:image/s3,"s3://crabby-images/b83f6/b83f6dcd726008d8a8574d12c60fb1882ad1fce0" alt="Tim Birkett avatar"
Ah you used: https://github.com/cloudposse/terraform-aws-eks-workers - Cool, hope the issue is sorted for you
data:image/s3,"s3://crabby-images/6ed29/6ed2936fc5e2cb980f4b7bc052d9c7bf1978299e" alt="Issif avatar"
Yes I use this
data:image/s3,"s3://crabby-images/6ed29/6ed2936fc5e2cb980f4b7bc052d9c7bf1978299e" alt="Issif avatar"
and that’s a fail …
data:image/s3,"s3://crabby-images/6ed29/6ed2936fc5e2cb980f4b7bc052d9c7bf1978299e" alt="Issif avatar"
really strange, my aws-auth seems correct, SG too
data:image/s3,"s3://crabby-images/6ed29/6ed2936fc5e2cb980f4b7bc052d9c7bf1978299e" alt="Issif avatar"
I did it, I enabled natgw but my cluster was in public subnet, after that all was good. That are really good modules, really love them
2020-07-03
2020-07-04
2020-07-06
data:image/s3,"s3://crabby-images/6bc4b/6bc4b9034ce5612751ee4ab32af26b5cc09d3233" alt="dalekurt avatar"
Hey all!
Is anyone deploying to AWS EKS with the readinessGates
in the K8s Deployment?
The issue TL;DR
• ALB (ingress) is being managed as a separate deployment with multiple host paths
• When deploying to AWS EKS with readinessGates
the Service does not register the pods; The deployment then times out and fails
• When deploying to AWS EKS without readinessGates
the Service registers the pods; The deployment is successful BUT podDistributionBudget issue arises when nodes are rotated during upgrade.
2020-07-08
data:image/s3,"s3://crabby-images/58679/586794d40424c3244046cba1a7d3f42192941b5a" alt="frednotet avatar"
Hey everyone. my pod refuses to launch due to an x509: certificate signed by the unknown authority
. I use an AWS Certificate for my docker registry and it looks like I should perform a simple update-ca-certificates
to solve my issue; I added it in my gitlab-runner as a step before doing the helm install...
but it doesn’t help (actually, this commands returns a skipping
so not even sure it works). i’m on this error since yesterday and I have the feeling to have tested everything… could somebody help me on this?
data:image/s3,"s3://crabby-images/56555/565555f1bf8827aeb2cf27e19cca07b056239417" alt="Eric Berg avatar"
Hey, all. Looks like my DD cluster agent is not configured properly for the liveness and readiness probes. First, no ready pods:
$ k get pods
NAME READY STATUS RESTARTS AGE
datadog-cluster-agent-6cf486544f-gvv9s 0/1 Running 0 6d
And from describe
, it looks like these tests are misconfigured (no host), but i haven’t found a way of specifying that – which I should not have to do:
Liveness: http-get http://:5555/live delay=15s timeout=5s period=15s #success=1 #failure=6
Readiness: http-get http://:5555/ready delay=15s timeout=5s period=15s #success=1 #failure=6
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 4m42s (x34539 over 5d23h) kubelet, ip-10-6-32-102.us-east-2.compute.internal Readiness probe failed: HTTP probe failed with statuscode: 500
Any idea what i’m missing here?
data:image/s3,"s3://crabby-images/56555/565555f1bf8827aeb2cf27e19cca07b056239417" alt="Eric Berg avatar"
Found it. For some reason, thought k8s had plenty of host name options, it left it out. I ran across an old post, which requests that the host
parameter be added to livenessProbe.httpGet
. Adding host: 127.0.0.1
fixed it…enough to expose other issues.
data:image/s3,"s3://crabby-images/2dd60/2dd604fa6fff150f1b0f4c0d148234610a67081b" alt="timduhenchanter avatar"
Anyone using the IstioOperator
CRD? The Gateway spec does not include SDS so I’m trying to figure out during migration where to specify the SDS container? The new Istio Helm Gateway chart does not seem to have the option either.
data:image/s3,"s3://crabby-images/2dd60/2dd604fa6fff150f1b0f4c0d148234610a67081b" alt="timduhenchanter avatar"
With 1.6 it was rolled into the ingress gateway container and is no longer optional. That is why it does not exist in spec
data:image/s3,"s3://crabby-images/2dd60/2dd604fa6fff150f1b0f4c0d148234610a67081b" alt="timduhenchanter avatar"
Finally found an issue related to it. Google keyword game is not strong today
2020-07-09
data:image/s3,"s3://crabby-images/6bc4b/6bc4b9034ce5612751ee4ab32af26b5cc09d3233" alt="dalekurt avatar"
Anyone doing container forensics?
2020-07-10
2020-07-11
data:image/s3,"s3://crabby-images/43114/43114676a67bb6cf8c2363d3df4c4916ae8b99ce" alt="David Medinets avatar"
Adding the EventRateLimit admission control to my api-server manifest file results in the api-server not restarting but I don’t know where to find any error messages. Why would this fail? How can I debug the issue?
2020-07-14
data:image/s3,"s3://crabby-images/b10c8/b10c84adc7c895af6177515ddfeda5e8a66ab15f" alt="Milosb avatar"
Guys, would you expect any issues with EKS pods/containers if you encrypt node’s root volumes?
data:image/s3,"s3://crabby-images/f6195/f61952d3cd9f94c03daf718b6ff7a177afdeca19" alt="Geordan avatar"
I would not. No problems doing so with PKS/TKIG so far. (Obviously not apples to apples but the only data point I have)
data:image/s3,"s3://crabby-images/6ed29/6ed2936fc5e2cb980f4b7bc052d9c7bf1978299e" alt="Issif avatar"
We use encrypted EBS for PV it works
data:image/s3,"s3://crabby-images/334be/334be9a7546b0e2999fea3f1bfa760b4590418e4" alt="Vlad Ionescu (he/him) avatar"
No issues here either, and we’ve been using encrypted node drives since they came out
data:image/s3,"s3://crabby-images/b10c8/b10c84adc7c895af6177515ddfeda5e8a66ab15f" alt="Milosb avatar"
thanks guys
2020-07-15
2020-07-17
data:image/s3,"s3://crabby-images/44902/449029945fc1a4b20fc4380407df7a1de709e0f8" alt="curious deviant avatar"
Hello, I am looking to use AWS API Gateway with my EKS cluster. I found this -> https://aws.amazon.com/blogs/containers/api-gateway-as-an-ingress-controller-for-eks/. I am looking for some feedback from folks if they have tried this out. In particular I would like to know if the AWS API Gateway when deployed as an EKS ingress, supports Custom Authorizers.
data:image/s3,"s3://crabby-images/8de52/8de52e7a612b17ada376917b39575db5cb465a57" alt="attachment image"
When teams deploy microservices on Amazon EKS, they usually expose a REST API for use in front ends and third-party applications. A best practice is to manage these APIs with an API Gateway. This provides a unique entry point for your APIs and also eliminates the need to implement API-specific code for things like security, […]
2020-07-21
data:image/s3,"s3://crabby-images/10c5f/10c5f18510e11e73457c5af38950444503b0d326" alt="avatar"
Kube2IAM or Kiam
data:image/s3,"s3://crabby-images/6c28f/6c28f72506a18bd678ad974a747f8c062cf09cc7" alt="rms1000watt avatar"
I’m currently on kube2iam, but I see hanging/failures if kube2iam gets slammed with too many requests. I’m debating if we stick with kube2iam and hack around daemonset replicas.. or if we cutover to something else
data:image/s3,"s3://crabby-images/b4dbd/b4dbd4731894616fdcdb715cc3e7f0291d2d1c56" alt="joey avatar"
IRSA
data:image/s3,"s3://crabby-images/6c28f/6c28f72506a18bd678ad974a747f8c062cf09cc7" alt="rms1000watt avatar"
I forgot what pains we had with IRSA. Might be about the migration pattern in prod or something with helm/helmfile
data:image/s3,"s3://crabby-images/6c28f/6c28f72506a18bd678ad974a747f8c062cf09cc7" alt="rms1000watt avatar"
I’ll look at it again tho
data:image/s3,"s3://crabby-images/b4dbd/b4dbd4731894616fdcdb715cc3e7f0291d2d1c56" alt="joey avatar"
i’ve also had problems with kube2iam and race conditions where other pods might start before kube2iam has started on a new node and the new pods won’t have credentials. i think IRSA wasn’t fully supported or available until a couple months ago because i was using kube2iam prior but have since moved just about everything to IRSA. i think i have 1 or 2 lingering dependencies that might still be using kube2iam that i need to circle back to.
data:image/s3,"s3://crabby-images/9f7d3/9f7d37e6df4fb280d718c728e563fdba7ce5b9ba" alt="Chris Fowles avatar"
i’m using kiam but only for things that don’t suport IRSA
data:image/s3,"s3://crabby-images/9f7d3/9f7d37e6df4fb280d718c728e563fdba7ce5b9ba" alt="Chris Fowles avatar"
also using IRSA for kiam
data:image/s3,"s3://crabby-images/6c28f/6c28f72506a18bd678ad974a747f8c062cf09cc7" alt="rms1000watt avatar"
@stobiewankenobi @Ronak FYI
data:image/s3,"s3://crabby-images/6c28f/6c28f72506a18bd678ad974a747f8c062cf09cc7" alt="rms1000watt avatar"
@Chris Fowles @joey (no rush) what scale do you have on your largest cluster using IRSA? Total nodes, total pods?
data:image/s3,"s3://crabby-images/9f7d3/9f7d37e6df4fb280d718c728e563fdba7ce5b9ba" alt="Chris Fowles avatar"
not large clusters tbh
data:image/s3,"s3://crabby-images/b4dbd/b4dbd4731894616fdcdb715cc3e7f0291d2d1c56" alt="joey avatar"
50 nodes 600 pods
data:image/s3,"s3://crabby-images/6c28f/6c28f72506a18bd678ad974a747f8c062cf09cc7" alt="rms1000watt avatar"
➜ ~ kubectl get pods --all-namespaces | grep 1/1 | wc -l
2330
➜ ~ kubectl get nodes | wc -l
301
Yeah, I’m considering IRSA
data:image/s3,"s3://crabby-images/9a0f8/9a0f8d41476ffe9065fbe0b98227d0cdcaa0cd11" alt="Erik Osterman (Cloud Posse) avatar"
btw, discussed in office hours yesterday!
data:image/s3,"s3://crabby-images/9a0f8/9a0f8d41476ffe9065fbe0b98227d0cdcaa0cd11" alt="Erik Osterman (Cloud Posse) avatar"
New Zoom Recording from our Office Hours session on 2020-07-22 is now available. Missed it? Remember to register for the next one so you’ll receive a calendar reminder next time!
data:image/s3,"s3://crabby-images/9a0f8/9a0f8d41476ffe9065fbe0b98227d0cdcaa0cd11" alt="Erik Osterman (Cloud Posse) avatar"
data:image/s3,"s3://crabby-images/9a0f8/9a0f8d41476ffe9065fbe0b98227d0cdcaa0cd11" alt="Erik Osterman (Cloud Posse) avatar"
We quit kube2iam ~2 years ago. Even on small clusters with 5-6 nodes and < 100 pods, we’d easily exceed the rate limits hitting the AWS APIs that would cause blackouts. Then it would get exasperated as more and more services STS tokens expired.
data:image/s3,"s3://crabby-images/9a0f8/9a0f8d41476ffe9065fbe0b98227d0cdcaa0cd11" alt="Erik Osterman (Cloud Posse) avatar"
Service accounts are definitely the way to go for EKS (and what we use), but for kops, we’re still using Kiam and haven’t looked into how to support IRSA
data:image/s3,"s3://crabby-images/afcda/afcdaf6c850e24589d88452e0bf9448a38682f9c" alt="jose.amengual avatar"
you can check the closed PRs/issues
2020-07-22
2020-07-23
data:image/s3,"s3://crabby-images/b83f6/b83f6dcd726008d8a8574d12c60fb1882ad1fce0" alt="Tim Birkett avatar"
What are people doing to monitor worker health before it’s connected to a control plane? Custom Cloudwatch metrics based on kubelet /healthz
? Something else?
data:image/s3,"s3://crabby-images/b83f6/b83f6dcd726008d8a8574d12c60fb1882ad1fce0" alt="Tim Birkett avatar"
For context we’ve had a couple of self-created outages by unhappy kubelet config in user-data on EKS nodes… The instance is up but never joins the cluster. Sure… we could try: Not making mistakes - but where is the fun in that? Is there an approach to out of Kubernetes instance monitoring that has worked well for people?
data:image/s3,"s3://crabby-images/865f3/865f3e236cc8d5526c28ddcae8750a9998dac7b9" alt="Yonatan Koren avatar"
Quick question for those who had to convert deprecated APIs when moving to Kubernetes v1.16:
Do I need to worry about converting ReplicaSets objects? My intuition says no and that I only need to worry about their corresponding Deployment objects.
Am I correct in assuming that?
data:image/s3,"s3://crabby-images/9a0f8/9a0f8d41476ffe9065fbe0b98227d0cdcaa0cd11" alt="Erik Osterman (Cloud Posse) avatar"
Updating the deployment resources will trigger the new replicasets to get created. That should do it.
data:image/s3,"s3://crabby-images/9a0f8/9a0f8d41476ffe9065fbe0b98227d0cdcaa0cd11" alt="Erik Osterman (Cloud Posse) avatar"
Also related: https://github.com/FairwindsOps/pluto
A cli tool to help discover deprecated apiVersions in Kubernetes - FairwindsOps/pluto
data:image/s3,"s3://crabby-images/865f3/865f3e236cc8d5526c28ddcae8750a9998dac7b9" alt="Yonatan Koren avatar"
So after looking into it for a while, I realized that this drop of the older APIs is just manifest support. So they just mean “kube-apiserver will not support your old YAMLs”
data:image/s3,"s3://crabby-images/865f3/865f3e236cc8d5526c28ddcae8750a9998dac7b9" alt="Yonatan Koren avatar"
For some reason I was under the impression that the in cluster definitions need to be updated. But clearly kube-apiserver understands multiple APIs anyways. For example, kubectl get daemonsets.v1.apps,deployments.v1.apps,replicasets.v1.apps
will get you those main resources affected by the v1.16 deprecations, at the new version
data:image/s3,"s3://crabby-images/865f3/865f3e236cc8d5526c28ddcae8750a9998dac7b9" alt="Yonatan Koren avatar"
whereas kubectl get daemonsets,deployments,replicasets
will get you the oldest supported version, so it has nothing to do with redeploying and everything to do with configurations such as YAML manifests or Helm configurations
data:image/s3,"s3://crabby-images/865f3/865f3e236cc8d5526c28ddcae8750a9998dac7b9" alt="Yonatan Koren avatar"
so sounds like this is easier than i thought it’d be. I underestimated how smart kube-apiserver is
data:image/s3,"s3://crabby-images/29940/299407a4e6bb36f60cfd662375e42365a25b2d0d" alt="som.ban.mca avatar"
@som.ban.mca has joined the channel
2020-07-27
data:image/s3,"s3://crabby-images/3c547/3c54718d528a1cd5e01420b9e569bdf089661131" alt="roth.andy avatar"
Has CloudPosse come up with a turn-key way to do IAM Roles for Service Accounts (IRSA)? I need to start looking at doing that now for an EKS cluster we have
data:image/s3,"s3://crabby-images/9a0f8/9a0f8d41476ffe9065fbe0b98227d0cdcaa0cd11" alt="Erik Osterman (Cloud Posse) avatar"
We have a turnkey way we do it in our engagements. We don’t have a module for it yet open sourced. We’ve been using a local module.
@Jeremy G (Cloud Posse) seems like we should create one now that we’re using the same pattern in a few engagements (the eks-iam/modules/service-account/
module)
data:image/s3,"s3://crabby-images/9a0f8/9a0f8d41476ffe9065fbe0b98227d0cdcaa0cd11" alt="Erik Osterman (Cloud Posse) avatar"
what do you think @Jeremy G (Cloud Posse)?
data:image/s3,"s3://crabby-images/f32e8/f32e85f027666416b654b0749258dcbce33cf974" alt="Jeremy G (Cloud Posse) avatar"
@Erik Osterman (Cloud Posse) Yes, we have a repo for such a module set up, waiting for someone to sponsor to effort to convert the closed source to open source.
data:image/s3,"s3://crabby-images/9a0f8/9a0f8d41476ffe9065fbe0b98227d0cdcaa0cd11" alt="Erik Osterman (Cloud Posse) avatar"
Ok, will add it to the backlog for the next engagement.
data:image/s3,"s3://crabby-images/f32e8/f32e85f027666416b654b0749258dcbce33cf974" alt="Jeremy G (Cloud Posse) avatar"
Should already be in the backlog for one of the existing engagements.
data:image/s3,"s3://crabby-images/9a0f8/9a0f8d41476ffe9065fbe0b98227d0cdcaa0cd11" alt="Erik Osterman (Cloud Posse) avatar"
so @roth.andy this doesn’t really help you at this time, but we should get this knocked out mid-august or so.
data:image/s3,"s3://crabby-images/3c547/3c54718d528a1cd5e01420b9e569bdf089661131" alt="roth.andy avatar"
I just finished getting it working. It actually was pretty straightforward. I’m definitely a fan
data:image/s3,"s3://crabby-images/2b794/2b794cbcc9a0fa2e9b443ded32f8f3609a846e7e" alt="wannafly37 avatar"
I can’t speak for CloudPosse but I’m using it from the terraform-aws-eks
module and it was pretty straight forward
2020-07-28
2020-07-30
data:image/s3,"s3://crabby-images/dbc8b/dbc8b50f8dfbdc7921d313df807f9f0669d87e21" alt="Craig Dunford avatar"
Quick question - does anyone know if liveness probes continue to execute once a pod enters the Terminating
state? If they do, and if they fail, will the pod be forcibly terminated and/or rescheduled? (https://github.com/kubernetes/kubernetes/issues/52817 looks somewhat related to my question)
What happened: liveness/readiness probe fails while pod is terminated. Also it happened only once during the pod termination. The issue started happening after upgrading version to v1.7 from v1.6.X…
data:image/s3,"s3://crabby-images/9a0f8/9a0f8d41476ffe9065fbe0b98227d0cdcaa0cd11" alt="Erik Osterman (Cloud Posse) avatar"
Hrm… don’t know the answer, but curious what you find out.
What happened: liveness/readiness probe fails while pod is terminated. Also it happened only once during the pod termination. The issue started happening after upgrading version to v1.7 from v1.6.X…