#kubernetes (2020-07)
Archive: https://archive.sweetops.com/kubernetes/
2020-07-01

When I set the kubelet-certificate-authority
flag in kube-apiserver.yaml, I am running into the following message when trying to start a pod. I am using kubespray to provision the AWS cluster.
Error from server: Get <https://10.250.205.173:10250/containerLogs/default/bash-shell-d8bd1/bash-shell-d8bd1>: x509: cannot validate certificate for 10.250.205.173 because it doesn't contain any IP SANs
I think this is the controller node getting status from a worker node. The information that I found about this issue is:
That message is coming from the master trying to connect to the node (the flow of traffic
is kubectl -> master API -> kubelet -> container). When starting the master, are you
setting --kubelet_certificate_authority? If so, the master expects to be able to validate
the kubelet's serving cert, which means it needs to be valid for the hostnames/IP addresses
the master uses to connect to it.
Any help to resolve this issue would be appreciated.

I question why the master is using an IP address for the worker node. I’ve been trying to find information about kubelet-preferred-address-types
. I wonder if I can change that setting.

I set --kubelet-preferred-address-types=InternalDNS
(and just this value). Then tried to start a pod. This error was displayed:
Error from server: no preferred addresses found; known addresses: [{InternalIP 10.250.205.173} {Hostname ip-10-250-205-173.ec2.internal}]
Edit: The InternalIP
and Hostname
are literally telling me what is acceptable as address type. When I add Hostname, the error changes:
Error from server: Get <https://ip-10-250-205-173.ec2.internal:10250/containerLogs/kube-system/nodelocaldns-s8mfk/node-cache>: x509: certificate signed by unknown authority
2020-07-02

Is there a way I can prevent creation of new config maps every time I deploy through helm.

That is how Helm keeps a history of each deployment state and works out what has changed between deployments.

You can set a lower number of releases to keep in history with the --history-max
flag, the default is 10.

Set to 1 if you don’t think you’ll ever want to roll back to releases earlier than the last one… Setting 0 means “no limit” rather than no history.

Thanks @Tim Birkett so basically while doing helm init we need to pass this flag?
helm init --history-max 2

Hi, I’m trying https://github.com/cloudposse/terraform-aws-eks-cluster and for an unknown reason, even if all my ressources seem OK, my cluster has 0 nodes.
Terraform module for provisioning an EKS cluster. Contribute to cloudposse/terraform-aws-eks-cluster development by creating an account on GitHub.

Hey @Issif - Did you implement: https://github.com/cloudposse/terraform-aws-eks-node-group to create nodes?
Terraform module for provisioning an EKS cluster. Contribute to cloudposse/terraform-aws-eks-cluster development by creating an account on GitHub.

@Tim Birkett nope, I prefer the solution with workers, as I can managed them more precisely and add them in a target group with auto-scaling enabled

I hope I found the issue, a new deployment is ocurring right now

Ah you used: https://github.com/cloudposse/terraform-aws-eks-workers - Cool, hope the issue is sorted for you

Yes I use this

and that’s a fail …

really strange, my aws-auth seems correct, SG too

I did it, I enabled natgw but my cluster was in public subnet, after that all was good. That are really good modules, really love them
2020-07-03
2020-07-04
2020-07-06

Hey all!
Is anyone deploying to AWS EKS with the readinessGates
in the K8s Deployment?
The issue TL;DR
• ALB (ingress) is being managed as a separate deployment with multiple host paths
• When deploying to AWS EKS with readinessGates
the Service does not register the pods; The deployment then times out and fails
• When deploying to AWS EKS without readinessGates
the Service registers the pods; The deployment is successful BUT podDistributionBudget issue arises when nodes are rotated during upgrade.
2020-07-08

Hey everyone. my pod refuses to launch due to an x509: certificate signed by the unknown authority
. I use an AWS Certificate for my docker registry and it looks like I should perform a simple update-ca-certificates
to solve my issue; I added it in my gitlab-runner as a step before doing the helm install...
but it doesn’t help (actually, this commands returns a skipping
so not even sure it works). i’m on this error since yesterday and I have the feeling to have tested everything… could somebody help me on this?

Hey, all. Looks like my DD cluster agent is not configured properly for the liveness and readiness probes. First, no ready pods:
$ k get pods
NAME READY STATUS RESTARTS AGE
datadog-cluster-agent-6cf486544f-gvv9s 0/1 Running 0 6d
And from describe
, it looks like these tests are misconfigured (no host), but i haven’t found a way of specifying that – which I should not have to do:
Liveness: http-get http://:5555/live delay=15s timeout=5s period=15s #success=1 #failure=6
Readiness: http-get http://:5555/ready delay=15s timeout=5s period=15s #success=1 #failure=6
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 4m42s (x34539 over 5d23h) kubelet, ip-10-6-32-102.us-east-2.compute.internal Readiness probe failed: HTTP probe failed with statuscode: 500
Any idea what i’m missing here?

Found it. For some reason, thought k8s had plenty of host name options, it left it out. I ran across an old post, which requests that the host
parameter be added to livenessProbe.httpGet
. Adding host: 127.0.0.1
fixed it…enough to expose other issues.

Anyone using the IstioOperator
CRD? The Gateway spec does not include SDS so I’m trying to figure out during migration where to specify the SDS container? The new Istio Helm Gateway chart does not seem to have the option either.

With 1.6 it was rolled into the ingress gateway container and is no longer optional. That is why it does not exist in spec

Finally found an issue related to it. Google keyword game is not strong today
2020-07-09

Anyone doing container forensics?
2020-07-10
2020-07-11

Adding the EventRateLimit admission control to my api-server manifest file results in the api-server not restarting but I don’t know where to find any error messages. Why would this fail? How can I debug the issue?
2020-07-14

Guys, would you expect any issues with EKS pods/containers if you encrypt node’s root volumes?

I would not. No problems doing so with PKS/TKIG so far. (Obviously not apples to apples but the only data point I have)

We use encrypted EBS for PV it works

No issues here either, and we’ve been using encrypted node drives since they came out

thanks guys
2020-07-15
2020-07-17

Hello, I am looking to use AWS API Gateway with my EKS cluster. I found this -> https://aws.amazon.com/blogs/containers/api-gateway-as-an-ingress-controller-for-eks/. I am looking for some feedback from folks if they have tried this out. In particular I would like to know if the AWS API Gateway when deployed as an EKS ingress, supports Custom Authorizers.

When teams deploy microservices on Amazon EKS, they usually expose a REST API for use in front ends and third-party applications. A best practice is to manage these APIs with an API Gateway. This provides a unique entry point for your APIs and also eliminates the need to implement API-specific code for things like security, […]
2020-07-21

Kube2IAM or Kiam

I’m currently on kube2iam, but I see hanging/failures if kube2iam gets slammed with too many requests. I’m debating if we stick with kube2iam and hack around daemonset replicas.. or if we cutover to something else

IRSA

I forgot what pains we had with IRSA. Might be about the migration pattern in prod or something with helm/helmfile

I’ll look at it again tho

i’ve also had problems with kube2iam and race conditions where other pods might start before kube2iam has started on a new node and the new pods won’t have credentials. i think IRSA wasn’t fully supported or available until a couple months ago because i was using kube2iam prior but have since moved just about everything to IRSA. i think i have 1 or 2 lingering dependencies that might still be using kube2iam that i need to circle back to.

i’m using kiam but only for things that don’t suport IRSA

also using IRSA for kiam

@stobiewankenobi @Ronak FYI

@Chris Fowles @joey (no rush) what scale do you have on your largest cluster using IRSA? Total nodes, total pods?

not large clusters tbh

50 nodes 600 pods

➜ ~ kubectl get pods --all-namespaces | grep 1/1 | wc -l
2330
➜ ~ kubectl get nodes | wc -l
301
Yeah, I’m considering IRSA

btw, discussed in office hours yesterday!

New Zoom Recording from our Office Hours session on 2020-07-22 is now available. Missed it? Remember to register for the next one so you’ll receive a calendar reminder next time!


We quit kube2iam ~2 years ago. Even on small clusters with 5-6 nodes and < 100 pods, we’d easily exceed the rate limits hitting the AWS APIs that would cause blackouts. Then it would get exasperated as more and more services STS tokens expired.

Service accounts are definitely the way to go for EKS (and what we use), but for kops, we’re still using Kiam and haven’t looked into how to support IRSA

you can check the closed PRs/issues
2020-07-22
2020-07-23

What are people doing to monitor worker health before it’s connected to a control plane? Custom Cloudwatch metrics based on kubelet /healthz
? Something else?

For context we’ve had a couple of self-created outages by unhappy kubelet config in user-data on EKS nodes… The instance is up but never joins the cluster. Sure… we could try: Not making mistakes - but where is the fun in that? Is there an approach to out of Kubernetes instance monitoring that has worked well for people?

Quick question for those who had to convert deprecated APIs when moving to Kubernetes v1.16:
Do I need to worry about converting ReplicaSets objects? My intuition says no and that I only need to worry about their corresponding Deployment objects.
Am I correct in assuming that?

Updating the deployment resources will trigger the new replicasets to get created. That should do it.

Also related: https://github.com/FairwindsOps/pluto
A cli tool to help discover deprecated apiVersions in Kubernetes - FairwindsOps/pluto

So after looking into it for a while, I realized that this drop of the older APIs is just manifest support. So they just mean “kube-apiserver will not support your old YAMLs”

For some reason I was under the impression that the in cluster definitions need to be updated. But clearly kube-apiserver understands multiple APIs anyways. For example, kubectl get daemonsets.v1.apps,deployments.v1.apps,replicasets.v1.apps
will get you those main resources affected by the v1.16 deprecations, at the new version

whereas kubectl get daemonsets,deployments,replicasets
will get you the oldest supported version, so it has nothing to do with redeploying and everything to do with configurations such as YAML manifests or Helm configurations

so sounds like this is easier than i thought it’d be. I underestimated how smart kube-apiserver is

@som.ban.mca has joined the channel
2020-07-27

Has CloudPosse come up with a turn-key way to do IAM Roles for Service Accounts (IRSA)? I need to start looking at doing that now for an EKS cluster we have

We have a turnkey way we do it in our engagements. We don’t have a module for it yet open sourced. We’ve been using a local module.
@Jeremy G (Cloud Posse) seems like we should create one now that we’re using the same pattern in a few engagements (the eks-iam/modules/service-account/
module)

what do you think @Jeremy G (Cloud Posse)?

@Erik Osterman (Cloud Posse) Yes, we have a repo for such a module set up, waiting for someone to sponsor to effort to convert the closed source to open source.

Ok, will add it to the backlog for the next engagement.

Should already be in the backlog for one of the existing engagements.

so @roth.andy this doesn’t really help you at this time, but we should get this knocked out mid-august or so.

I just finished getting it working. It actually was pretty straightforward. I’m definitely a fan

I can’t speak for CloudPosse but I’m using it from the terraform-aws-eks
module and it was pretty straight forward
2020-07-28
2020-07-30

Quick question - does anyone know if liveness probes continue to execute once a pod enters the Terminating
state? If they do, and if they fail, will the pod be forcibly terminated and/or rescheduled? (https://github.com/kubernetes/kubernetes/issues/52817 looks somewhat related to my question)
What happened: liveness/readiness probe fails while pod is terminated. Also it happened only once during the pod termination. The issue started happening after upgrading version to v1.7 from v1.6.X…

Hrm… don’t know the answer, but curious what you find out.
What happened: liveness/readiness probe fails while pod is terminated. Also it happened only once during the pod termination. The issue started happening after upgrading version to v1.7 from v1.6.X…