#kubernetes (2020-07)

kubernetes

Archive: https://archive.sweetops.com/kubernetes/

2020-07-30

Craig Dunford avatar
Craig Dunford

Quick question - does anyone know if liveness probes continue to execute once a pod enters the Terminating state? If they do, and if they fail, will the pod be forcibly terminated and/or rescheduled? (https://github.com/kubernetes/kubernetes/issues/52817 looks somewhat related to my question)

liveness/readiness probe is executed and failed while pod is terminated · Issue #52817 · kubernetes/kubernetes

What happened: liveness/readiness probe fails while pod is terminated. Also it happened only once during the pod termination. The issue started happening after upgrading version to v1.7 from v1.6.X…

2020-07-28

2020-07-27

roth.andy avatar
roth.andy

Has CloudPosse come up with a turn-key way to do IAM Roles for Service Accounts (IRSA)? I need to start looking at doing that now for an EKS cluster we have

:--1:1
Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

We have a turnkey way we do it in our engagements. We don’t have a module for it yet open sourced. We’ve been using a local module.

@Jeremy (Cloud Posse) seems like we should create one now that we’re using the same pattern in a few engagements (the eks-iam/modules/service-account/ module)

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

what do you think @Jeremy (Cloud Posse)?

Jeremy (Cloud Posse) avatar
Jeremy (Cloud Posse)

@Erik Osterman (Cloud Posse) Yes, we have a repo for such a module set up, waiting for someone to sponsor to effort to convert the closed source to open source.

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

Ok, will add it to the backlog for the next engagement.

Jeremy (Cloud Posse) avatar
Jeremy (Cloud Posse)

Should already be in the backlog for one of the existing engagements.

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

so @roth.andy this doesn’t really help you at this time, but we should get this knocked out mid-august or so.

roth.andy avatar
roth.andy

I just finished getting it working. It actually was pretty straightforward. I’m definitely a fan

wannafly37 avatar
wannafly37

I can’t speak for CloudPosse but I’m using it from the terraform-aws-eks module and it was pretty straight forward

2020-07-23

Tim Birkett avatar
Tim Birkett

What are people doing to monitor worker health before it’s connected to a control plane? Custom Cloudwatch metrics based on kubelet /healthz ? Something else?

Tim Birkett avatar
Tim Birkett

For context we’ve had a couple of self-created outages by unhappy kubelet config in user-data on EKS nodes… The instance is up but never joins the cluster. Sure… we could try: Not making mistakes - but where is the fun in that? Is there an approach to out of Kubernetes instance monitoring that has worked well for people?

Yonatan Koren avatar
Yonatan Koren

Quick question for those who had to convert deprecated APIs when moving to Kubernetes v1.16:

Do I need to worry about converting ReplicaSets objects? My intuition says no and that I only need to worry about their corresponding Deployment objects.

Am I correct in assuming that?

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

Updating the deployment resources will trigger the new replicasets to get created. That should do it.

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)
FairwindsOps/pluto

A cli tool to help discover deprecated apiVersions in Kubernetes - FairwindsOps/pluto

:--1:1
Yonatan Koren avatar
Yonatan Koren

So after looking into it for a while, I realized that this drop of the older APIs is just manifest support. So they just mean “kube-apiserver will not support your old YAMLs”

Yonatan Koren avatar
Yonatan Koren

For some reason I was under the impression that the in cluster definitions need to be updated. But clearly kube-apiserver understands multiple APIs anyways. For example, kubectl get daemonsets.v1.apps,deployments.v1.apps,replicasets.v1.apps will get you those main resources affected by the v1.16 deprecations, at the new version

Yonatan Koren avatar
Yonatan Koren

whereas kubectl get daemonsets,deployments,replicasets will get you the oldest supported version, so it has nothing to do with redeploying and everything to do with configurations such as YAML manifests or Helm configurations

Yonatan Koren avatar
Yonatan Koren

so sounds like this is easier than i thought it’d be. I underestimated how smart kube-apiserver is

som.ban.mca avatar
som.ban.mca
08:48:24 PM

@ has joined the channel

2020-07-22

2020-07-21

 avatar
10:14:32 PM

Kube2IAM or Kiam

rms1000watt avatar
rms1000watt

I’m currently on kube2iam, but I see hanging/failures if kube2iam gets slammed with too many requests. I’m debating if we stick with kube2iam and hack around daemonset replicas.. or if we cutover to something else

joey avatar

IRSA

rms1000watt avatar
rms1000watt

I forgot what pains we had with IRSA. Might be about the migration pattern in prod or something with helm/helmfile

rms1000watt avatar
rms1000watt

I’ll look at it again tho

joey avatar

i’ve also had problems with kube2iam and race conditions where other pods might start before kube2iam has started on a new node and the new pods won’t have credentials. i think IRSA wasn’t fully supported or available until a couple months ago because i was using kube2iam prior but have since moved just about everything to IRSA. i think i have 1 or 2 lingering dependencies that might still be using kube2iam that i need to circle back to.

Chris Fowles avatar
Chris Fowles

i’m using kiam but only for things that don’t suport IRSA

Chris Fowles avatar
Chris Fowles

also using IRSA for kiam

rms1000watt avatar
rms1000watt

@stobiewankenobi @Ronak FYI

rms1000watt avatar
rms1000watt

@Chris Fowles @joey (no rush) what scale do you have on your largest cluster using IRSA? Total nodes, total pods?

Chris Fowles avatar
Chris Fowles

not large clusters tbh

joey avatar

50 nodes 600 pods

rms1000watt avatar
rms1000watt
➜  ~ kubectl get pods --all-namespaces | grep 1/1 | wc -l
    2330
➜  ~ kubectl get nodes | wc -l
     301

Yeah, I’m considering IRSA

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

btw, discussed in office hours yesterday!

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)
New Zoom Recording from our [Office Hours> session on 2020-07-22 is now available. Missed it? Remember to <https://cloudposse.com/office-hours/ register for the next one](https://cloudposse.wistia.com/medias/31hpuz34yp) so you’ll receive a calendar reminder next time!
Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

We quit kube2iam ~2 years ago. Even on small clusters with 5-6 nodes and < 100 pods, we’d easily exceed the rate limits hitting the AWS APIs that would cause blackouts. Then it would get exasperated as more and more services STS tokens expired.

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

Service accounts are definitely the way to go for EKS (and what we use), but for kops, we’re still using Kiam and haven’t looked into how to support IRSA

PePe avatar

you can check the closed PRs/issues

2020-07-17

curious deviant avatar
curious deviant

Hello, I am looking to use AWS API Gateway with my EKS cluster. I found this -> https://aws.amazon.com/blogs/containers/api-gateway-as-an-ingress-controller-for-eks/. I am looking for some feedback from folks if they have tried this out. In particular I would like to know if the AWS API Gateway when deployed as an EKS ingress, supports Custom Authorizers.

API Gateway as an Ingress Controller for Amazon EKS | Amazon Web Services attachment image

When teams deploy microservices on Amazon EKS, they usually expose a REST API for use in front ends and third-party applications. A best practice is to manage these APIs with an API Gateway. This provides a unique entry point for your APIs and also eliminates the need to implement API-specific code for things like security, […]

2020-07-15

2020-07-14

Milosb avatar
Milosb

Guys, would you expect any issues with EKS pods/containers if you encrypt node’s root volumes?

Geordan avatar
Geordan

I would not. No problems doing so with PKS/TKIG so far. (Obviously not apples to apples but the only data point I have)

Issif avatar
Issif

We use encrypted EBS for PV it works

Vlad Ionescu avatar
Vlad Ionescu

No issues here either, and we’ve been using encrypted node drives since they came out

Milosb avatar
Milosb

thanks guys

2020-07-11

David Medinets avatar
David Medinets

Adding the EventRateLimit admission control to my api-server manifest file results in the api-server not restarting but I don’t know where to find any error messages. Why would this fail? How can I debug the issue?

2020-07-10

2020-07-09

dalekurt avatar
dalekurt

Anyone doing container forensics?

2020-07-08

frednotet avatar
frednotet

Hey everyone. my pod refuses to launch due to an x509: certificate signed by the unknown authority . I use an AWS Certificate for my docker registry and it looks like I should perform a simple update-ca-certificates to solve my issue; I added it in my gitlab-runner as a step before doing the helm install... but it doesn’t help (actually, this commands returns a skipping so not even sure it works). i’m on this error since yesterday and I have the feeling to have tested everything… could somebody help me on this?

Eric D. Berg avatar
Eric D. Berg

Hey, all. Looks like my DD cluster agent is not configured properly for the liveness and readiness probes. First, no ready pods:

$ k get pods
NAME                                          READY   STATUS    RESTARTS   AGE
datadog-cluster-agent-6cf486544f-gvv9s        0/1     Running   0          6d

And from describe , it looks like these tests are misconfigured (no host), but i haven’t found a way of specifying that – which I should not have to do:

    Liveness:   http-get http://:5555/live delay=15s timeout=5s period=15s #success=1 #failure=6
    Readiness:  http-get http://:5555/ready delay=15s timeout=5s period=15s #success=1 #failure=6
Events:
  Type     Reason     Age                        From                                                Message
  ----     ------     ----                       ----                                                -------
  Warning  Unhealthy  4m42s (x34539 over 5d23h)  kubelet, ip-10-6-32-102.us-east-2.compute.internal  Readiness probe failed: HTTP probe failed with statuscode: 500

Any idea what i’m missing here?

Eric D. Berg avatar
Eric D. Berg

Found it. For some reason, thought k8s had plenty of host name options, it left it out. I ran across an old post, which requests that the host parameter be added to livenessProbe.httpGet. Adding host: 127.0.0.1 fixed it…enough to expose other issues.

timduhenchanter avatar
timduhenchanter

Anyone using the IstioOperator CRD? The Gateway spec does not include SDS so I’m trying to figure out during migration where to specify the SDS container? The new Istio Helm Gateway chart does not seem to have the option either.

timduhenchanter avatar
timduhenchanter

With 1.6 it was rolled into the ingress gateway container and is no longer optional. That is why it does not exist in spec

timduhenchanter avatar
timduhenchanter

Finally found an issue related to it. Google keyword game is not strong today

1

2020-07-06

dalekurt avatar
dalekurt

Hey all! Is anyone deploying to AWS EKS with the readinessGates in the K8s Deployment?

The issue TL;DR

• ALB (ingress) is being managed as a separate deployment with multiple host paths

• When deploying to AWS EKS with readinessGates the Service does not register the pods; The deployment then times out and fails

• When deploying to AWS EKS without readinessGates the Service registers the pods; The deployment is successful BUT podDistributionBudget issue arises when nodes are rotated during upgrade.

2020-07-04

2020-07-03

2020-07-02

soumya avatar
soumya
10:43:53 AM

Is there a way I can prevent creation of new config maps every time I deploy through helm.

Tim Birkett avatar
Tim Birkett

That is how Helm keeps a history of each deployment state and works out what has changed between deployments.

Tim Birkett avatar
Tim Birkett

You can set a lower number of releases to keep in history with the --history-max flag, the default is 10.

Tim Birkett avatar
Tim Birkett

Set to 1 if you don’t think you’ll ever want to roll back to releases earlier than the last one… Setting 0 means “no limit” rather than no history.

soumya avatar
soumya

Thanks @ so basically while doing helm init we need to pass this flag?

helm init --history-max 2
Issif avatar
Issif

Hi, I’m trying https://github.com/cloudposse/terraform-aws-eks-cluster and for an unknown reason, even if all my ressources seem OK, my cluster has 0 nodes.

cloudposse/terraform-aws-eks-cluster

Terraform module for provisioning an EKS cluster. Contribute to cloudposse/terraform-aws-eks-cluster development by creating an account on GitHub.

Tim Birkett avatar
Tim Birkett

Hey @Issif - Did you implement: https://github.com/cloudposse/terraform-aws-eks-node-group to create nodes?

cloudposse/terraform-aws-eks-cluster

Terraform module for provisioning an EKS cluster. Contribute to cloudposse/terraform-aws-eks-cluster development by creating an account on GitHub.

Issif avatar
Issif

@ nope, I prefer the solution with workers, as I can managed them more precisely and add them in a target group with auto-scaling enabled

Issif avatar
Issif

I hope I found the issue, a new deployment is ocurring right now

Tim Birkett avatar
Tim Birkett

Ah you used: https://github.com/cloudposse/terraform-aws-eks-workers - Cool, hope the issue is sorted for you

Issif avatar
Issif

Yes I use this

Issif avatar
Issif

and that’s a fail …

Issif avatar
Issif

really strange, my aws-auth seems correct, SG too

Issif avatar
Issif

I did it, I enabled natgw but my cluster was in public subnet, after that all was good. That are really good modules, really love them

2020-07-01

David Medinets avatar
David Medinets

When I set the kubelet-certificate-authority flag in kube-apiserver.yaml, I am running into the following message when trying to start a pod. I am using kubespray to provision the AWS cluster.

Error from server: Get <https://10.250.205.173:10250/containerLogs/default/bash-shell-d8bd1/bash-shell-d8bd1>: x509: cannot validate certificate for 10.250.205.173 because it doesn't contain any IP SANs

I think this is the controller node getting status from a worker node. The information that I found about this issue is:

That message is coming from the master trying to connect to the node (the flow of traffic 
is kubectl -> master API -> kubelet -> container). When starting the master, are you 
setting --kubelet_certificate_authority? If so, the master expects to be able to validate 
the kubelet's serving cert, which means it needs to be valid for the hostnames/IP addresses 
the master uses to connect to it.

Any help to resolve this issue would be appreciated.

David Medinets avatar
David Medinets

I question why the master is using an IP address for the worker node. I’ve been trying to find information about kubelet-preferred-address-types. I wonder if I can change that setting.

David Medinets avatar
David Medinets

I set --kubelet-preferred-address-types=InternalDNS (and just this value). Then tried to start a pod. This error was displayed:

Error from server: no preferred addresses found; known addresses: [{InternalIP 10.250.205.173} {Hostname ip-10-250-205-173.ec2.internal}]

Edit: The InternalIP and Hostname are literally telling me what is acceptable as address type. When I add Hostname, the error changes:

Error from server: Get <https://ip-10-250-205-173.ec2.internal:10250/containerLogs/kube-system/nodelocaldns-s8mfk/node-cache>: x509: certificate signed by unknown authority
    keyboard_arrow_up