SweetOps #kubernetes for July, 2020

Archive: https://archive.sweetops.com/kubernetes/

2020-07-01

David Medinets

When I set the kubelet-certificate-authority flag in kube-apiserver.yaml, I am running into the following message when trying to start a pod. I am using kubespray to provision the AWS cluster.

Error from server: Get <https://10.250.205.173:10250/containerLogs/default/bash-shell-d8bd1/bash-shell-d8bd1>: x509: cannot validate certificate for 10.250.205.173 because it doesn't contain any IP SANs

I think this is the controller node getting status from a worker node. The information that I found about this issue is:

That message is coming from the master trying to connect to the node (the flow of traffic 
is kubectl -> master API -> kubelet -> container). When starting the master, are you 
setting --kubelet_certificate_authority? If so, the master expects to be able to validate 
the kubelet's serving cert, which means it needs to be valid for the hostnames/IP addresses 
the master uses to connect to it.

Any help to resolve this issue would be appreciated.

David Medinets

02:14:14 PM

I question why the master is using an IP address for the worker node. I’ve been trying to find information about kubelet-preferred-address-types. I wonder if I can change that setting.

David Medinets

02:22:41 PM

I set --kubelet-preferred-address-types=InternalDNS (and just this value). Then tried to start a pod. This error was displayed:

Error from server: no preferred addresses found; known addresses: [{InternalIP 10.250.205.173} {Hostname ip-10-250-205-173.ec2.internal}]

Edit: The InternalIP and Hostname are literally telling me what is acceptable as address type. When I add Hostname, the error changes:

Error from server: Get <https://ip-10-250-205-173.ec2.internal:10250/containerLogs/kube-system/nodelocaldns-s8mfk/node-cache>: x509: certificate signed by unknown authority

2020-07-02

soumya

10:43:53 AM

Is there a way I can prevent creation of new config maps every time I deploy through helm.

Tim Birkett

04:22:27 PM

That is how Helm keeps a history of each deployment state and works out what has changed between deployments.

Tim Birkett

04:25:30 PM

You can set a lower number of releases to keep in history with the --history-max flag, the default is 10.

Tim Birkett

04:26:50 PM

Set to 1 if you don’t think you’ll ever want to roll back to releases earlier than the last one… Setting 0 means “no limit” rather than no history.

soumya

10:35:58 AM

Thanks @Tim Birkett so basically while doing helm init we need to pass this flag?

helm init --history-max 2

Issif

06:33:47 PM

Hi, I’m trying https://github.com/cloudposse/terraform-aws-eks-cluster and for an unknown reason, even if all my ressources seem OK, my cluster has 0 nodes.

cloudposse/terraform-aws-eks-cluster

Terraform module for provisioning an EKS cluster. Contribute to cloudposse/terraform-aws-eks-cluster development by creating an account on GitHub.

Tim Birkett

08:29:54 PM

Hey @Issif - Did you implement: https://github.com/cloudposse/terraform-aws-eks-node-group to create nodes?

cloudposse/terraform-aws-eks-cluster

Terraform module for provisioning an EKS cluster. Contribute to cloudposse/terraform-aws-eks-cluster development by creating an account on GitHub.

Issif

08:31:40 PM

@Tim Birkett nope, I prefer the solution with workers, as I can managed them more precisely and add them in a target group with auto-scaling enabled

Issif

08:32:22 PM

I hope I found the issue, a new deployment is ocurring right now

Tim Birkett

08:33:12 PM

Ah you used: https://github.com/cloudposse/terraform-aws-eks-workers - Cool, hope the issue is sorted for you

Issif

08:33:29 PM

Yes I use this

Issif

08:52:10 PM

and that’s a fail …

Issif

08:56:05 PM

really strange, my aws-auth seems correct, SG too

Issif

08:11:35 AM

I did it, I enabled natgw but my cluster was in public subnet, after that all was good. That are really good modules, really love them

2020-07-03

2020-07-04

2020-07-06

dalekurt

01:33:31 AM

Hey all! Is anyone deploying to AWS EKS with the readinessGates in the K8s Deployment?

The issue TL;DR • ALB (ingress) is being managed as a separate deployment with multiple host paths • When deploying to AWS EKS with readinessGates the Service does not register the pods; The deployment then times out and fails • When deploying to AWS EKS without readinessGates the Service registers the pods; The deployment is successful BUT podDistributionBudget issue arises when nodes are rotated during upgrade.

2020-07-08

frednotet

12:48:10 PM

Hey everyone. my pod refuses to launch due to an x509: certificate signed by the unknown authority . I use an AWS Certificate for my docker registry and it looks like I should perform a simple update-ca-certificates to solve my issue; I added it in my gitlab-runner as a step before doing the helm install... but it doesn’t help (actually, this commands returns a skipping so not even sure it works). i’m on this error since yesterday and I have the feeling to have tested everything… could somebody help me on this?

Eric Berg

06:07:36 PM

Hey, all. Looks like my DD cluster agent is not configured properly for the liveness and readiness probes. First, no ready pods:

$ k get pods
NAME                                          READY   STATUS    RESTARTS   AGE
datadog-cluster-agent-6cf486544f-gvv9s        0/1     Running   0          6d

And from describe , it looks like these tests are misconfigured (no host), but i haven’t found a way of specifying that – which I should not have to do:

    Liveness:   http-get http://:5555/live delay=15s timeout=5s period=15s #success=1 #failure=6
    Readiness:  http-get http://:5555/ready delay=15s timeout=5s period=15s #success=1 #failure=6

Events:
  Type     Reason     Age                        From                                                Message
  ----     ------     ----                       ----                                                -------
  Warning  Unhealthy  4m42s (x34539 over 5d23h)  kubelet, ip-10-6-32-102.us-east-2.compute.internal  Readiness probe failed: HTTP probe failed with statuscode: 500

Any idea what i’m missing here?

Eric Berg

08:35:35 PM

Found it. For some reason, thought k8s had plenty of host name options, it left it out. I ran across an old post, which requests that the host parameter be added to livenessProbe.httpGet. Adding host: 127.0.0.1 fixed it…enough to expose other issues.

timduhenchanter

10:59:25 PM

Anyone using the IstioOperator CRD? The Gateway spec does not include SDS so I’m trying to figure out during migration where to specify the SDS container? The new Istio Helm Gateway chart does not seem to have the option either.

timduhenchanter

11:33:35 PM

With 1.6 it was rolled into the ingress gateway container and is no longer optional. That is why it does not exist in spec

timduhenchanter

11:34:05 PM

Finally found an issue related to it. Google keyword game is not strong today

2020-07-09

dalekurt

03:20:42 PM

Anyone doing container forensics?

2020-07-10

2020-07-11

David Medinets

09:40:41 PM

Adding the EventRateLimit admission control to my api-server manifest file results in the api-server not restarting but I don’t know where to find any error messages. Why would this fail? How can I debug the issue?

2020-07-14

Milosb

01:15:42 PM

Guys, would you expect any issues with EKS pods/containers if you encrypt node’s root volumes?

Geordan

03:11:36 PM

I would not. No problems doing so with PKS/TKIG so far. (Obviously not apples to apples but the only data point I have)

Issif

03:22:34 PM

We use encrypted EBS for PV it works

Vlad Ionescu (he/him)

03:42:56 PM

No issues here either, and we’ve been using encrypted node drives since they came out

Milosb

01:46:58 PM

thanks guys

2020-07-15

2020-07-17

curious deviant

01:07:00 PM

Hello, I am looking to use AWS API Gateway with my EKS cluster. I found this -> https://aws.amazon.com/blogs/containers/api-gateway-as-an-ingress-controller-for-eks/. I am looking for some feedback from folks if they have tried this out. In particular I would like to know if the AWS API Gateway when deployed as an EKS ingress, supports Custom Authorizers.

API Gateway as an Ingress Controller for Amazon EKS | Amazon Web Services attachment image

When teams deploy microservices on Amazon EKS, they usually expose a REST API for use in front ends and third-party applications. A best practice is to manage these APIs with an API Gateway. This provides a unique entry point for your APIs and also eliminates the need to implement API-specific code for things like security, […]

2020-07-21

10:14:32 PM

Kube2IAM or Kiam

rms1000watt

10:16:34 PM

I’m currently on kube2iam, but I see hanging/failures if kube2iam gets slammed with too many requests. I’m debating if we stick with kube2iam and hack around daemonset replicas.. or if we cutover to something else

joey

10:18:26 PM

IRSA

rms1000watt

11:00:55 PM

I forgot what pains we had with IRSA. Might be about the migration pattern in prod or something with helm/helmfile

rms1000watt

11:01:04 PM

I’ll look at it again tho

joey

11:03:37 PM

i’ve also had problems with kube2iam and race conditions where other pods might start before kube2iam has started on a new node and the new pods won’t have credentials. i think IRSA wasn’t fully supported or available until a couple months ago because i was using kube2iam prior but have since moved just about everything to IRSA. i think i have 1 or 2 lingering dependencies that might still be using kube2iam that i need to circle back to.

Chris Fowles

12:45:05 AM

i’m using kiam but only for things that don’t suport IRSA

Chris Fowles

12:45:38 AM

also using IRSA for kiam

rms1000watt

06:11:54 AM

@stobiewankenobi @Ronak FYI

rms1000watt

06:37:40 AM

@Chris Fowles @joey (no rush) what scale do you have on your largest cluster using IRSA? Total nodes, total pods?

Chris Fowles

06:42:58 AM

not large clusters tbh

joey

04:07:57 PM

50 nodes 600 pods

rms1000watt

04:25:26 PM

➜  ~ kubectl get pods --all-namespaces | grep 1/1 | wc -l
    2330
➜  ~ kubectl get nodes | wc -l
     301

Yeah, I’m considering IRSA

Erik Osterman (Cloud Posse)

09:17:06 PM

btw, discussed in office hours yesterday!

Erik Osterman (Cloud Posse)

09:17:19 PM

https://sweetops.slack.com/archives/CB3579ZM3/p1595450670046300

New Zoom Recording from our Office Hours session on 2020-07-22 is now available. Missed it? Remember to register for the next one so you’ll receive a calendar reminder next time!

Erik Osterman (Cloud Posse)

09:17:36 PM

https://cloudposse.wistia.com/medias/31hpuz34yp

Public "Office Hours" 2020-07-22

Erik Osterman (Cloud Posse)

09:20:53 PM

We quit kube2iam ~2 years ago. Even on small clusters with 5-6 nodes and < 100 pods, we’d easily exceed the rate limits hitting the AWS APIs that would cause blackouts. Then it would get exasperated as more and more services STS tokens expired.

Erik Osterman (Cloud Posse)

09:21:42 PM

Service accounts are definitely the way to go for EKS (and what we use), but for kops, we’re still using Kiam and haven’t looked into how to support IRSA

jose.amengual

11:10:37 PM

you can check the closed PRs/issues

2020-07-22

2020-07-23

Tim Birkett

03:55:03 PM

What are people doing to monitor worker health before it’s connected to a control plane? Custom Cloudwatch metrics based on kubelet /healthz ? Something else?

Tim Birkett

03:57:35 PM

For context we’ve had a couple of self-created outages by unhappy kubelet config in user-data on EKS nodes… The instance is up but never joins the cluster. Sure… we could try: Not making mistakes - but where is the fun in that? Is there an approach to out of Kubernetes instance monitoring that has worked well for people?

Yonatan Koren

07:53:23 PM

Quick question for those who had to convert deprecated APIs when moving to Kubernetes v1.16:

Do I need to worry about converting ReplicaSets objects? My intuition says no and that I only need to worry about their corresponding Deployment objects.

Am I correct in assuming that?

Erik Osterman (Cloud Posse)

09:15:38 PM

Updating the deployment resources will trigger the new replicasets to get created. That should do it.

Erik Osterman (Cloud Posse)

09:15:50 PM

FairwindsOps/pluto

A cli tool to help discover deprecated apiVersions in Kubernetes - FairwindsOps/pluto

Yonatan Koren

09:27:51 PM

So after looking into it for a while, I realized that this drop of the older APIs is just manifest support. So they just mean “kube-apiserver will not support your old YAMLs”

Yonatan Koren

09:28:52 PM

For some reason I was under the impression that the in cluster definitions need to be updated. But clearly kube-apiserver understands multiple APIs anyways. For example, kubectl get daemonsets.v1.apps,deployments.v1.apps,replicasets.v1.apps will get you those main resources affected by the v1.16 deprecations, at the new version

Yonatan Koren

09:29:34 PM

whereas kubectl get daemonsets,deployments,replicasets will get you the oldest supported version, so it has nothing to do with redeploying and everything to do with configurations such as YAML manifests or Helm configurations

Yonatan Koren

09:30:10 PM

so sounds like this is easier than i thought it’d be. I underestimated how smart kube-apiserver is

som.ban.mca

08:48:24 PM

@som.ban.mca has joined the channel

2020-07-27

roth.andy

05:58:24 PM

Has CloudPosse come up with a turn-key way to do IAM Roles for Service Accounts (IRSA)? I need to start looking at doing that now for an EKS cluster we have

Erik Osterman (Cloud Posse)

06:57:52 PM

We have a turnkey way we do it in our engagements. We don’t have a module for it yet open sourced. We’ve been using a local module.

@Jeremy G (Cloud Posse) seems like we should create one now that we’re using the same pattern in a few engagements (the eks-iam/modules/service-account/ module)

Erik Osterman (Cloud Posse)

06:58:02 PM

what do you think @Jeremy G (Cloud Posse)?

Jeremy G (Cloud Posse)

06:59:05 PM

@Erik Osterman (Cloud Posse) Yes, we have a repo for such a module set up, waiting for someone to sponsor to effort to convert the closed source to open source.

Erik Osterman (Cloud Posse)

06:59:52 PM

Ok, will add it to the backlog for the next engagement.

Jeremy G (Cloud Posse)

07:00:26 PM

Should already be in the backlog for one of the existing engagements.

Erik Osterman (Cloud Posse)

07:00:34 PM

so @roth.andy this doesn’t really help you at this time, but we should get this knocked out mid-august or so.

roth.andy

10:24:09 PM

I just finished getting it working. It actually was pretty straightforward. I’m definitely a fan

wannafly37

06:02:31 PM

I can’t speak for CloudPosse but I’m using it from the terraform-aws-eks module and it was pretty straight forward

2020-07-28

2020-07-30

Craig Dunford

11:59:35 AM

Quick question - does anyone know if liveness probes continue to execute once a pod enters the Terminating state? If they do, and if they fail, will the pod be forcibly terminated and/or rescheduled? (https://github.com/kubernetes/kubernetes/issues/52817 looks somewhat related to my question)

liveness/readiness probe is executed and failed while pod is terminated · Issue #52817 · kubernetes/kubernetes

What happened: liveness/readiness probe fails while pod is terminated. Also it happened only once during the pod termination. The issue started happening after upgrading version to v1.7 from v1.6.X…

Erik Osterman (Cloud Posse)

04:04:51 PM

Hrm… don’t know the answer, but curious what you find out.

liveness/readiness probe is executed and failed while pod is terminated · Issue #52817 · kubernetes/kubernetes