#kubernetes (2019-12)

kubernetes

Archive: https://archive.sweetops.com/kubernetes/

2019-12-27

btai avatar

other than this feature introduced as alpha in k8s 1.16, is there an simple way to evenly distribute a type of pod onto a cluster (i.e. 100 redis pods distributed onto a 10 node cluster = 10 redis pods per node) currently on 1.13/1.14 clusters https://kubernetes.io/docs/concepts/workloads/pods/pod-topology-spread-constraints/

erik-stephens avatar
erik-stephens

@btai We use pod antiAffinity rules to spread across zones, racks, chassis, etc.

btai avatar

@erik-stephens are you able to guarantee the pods are spread evenly across nodes?

erik-stephens avatar
erik-stephens

No, but it ends up working out good enough - to our surprise. That new spread feature does look interesting.

btai avatar

mine hasnt looked fine even with the anti-affinity rules across zones and hostnames, I have 1800 pods wit the same label and 60 nodes and the distribution could be much better

2019-12-20

Roderik van der Veer avatar
Roderik van der Veer

What is the best way to test helm charts (on travisci)? I was trying to do helm install –dry-run –generate-name but it needs a k8s cluster which is a lot of setup. Does helm lint also generate the yaml files?

Jonathan avatar
Jonathan

try template , it is very useful to see how the files will be generated

Roderik van der Veer avatar
Roderik van der Veer

cool, i went with helm lint "$d" && (helm template "$d" | kubeval)

:--1:4
1
Erik Osterman avatar
Erik Osterman

Nice tip

Erik Osterman avatar
Erik Osterman
Can you expose your services with an API gateway in Kubernetes?

In Kubernetes, an Ingress is a component that routes the traffic from outside the cluster to your services and Pods inside the cluster. You can select an Ingress that is also an API gateway.

2019-12-19

erik-stephens avatar
erik-stephens

Looking for recommended way to get kubernetes events into our elasticsearch log cluster. eventrouter looks good but referred to upstream effort 2.5 years ago (https://github.com/heptiolabs/eventrouter/issues/26#issuecomment-318472723). Anyone happen to know what that upstream effort is?

Why not eventer from the heapster project? · Issue #26 · heptiolabs/eventrouter

I'm wondering if you are all aware of the eventer tool which does the same things as eventrouter, but is part of the heapster project, https://github.com/kubernetes/heapster/blob/master/events/

Erik Osterman avatar
Erik Osterman

Related but not what you are looking for: https://github.com/getsentry/sentry-kubernetes

getsentry/sentry-kubernetes

Kubernetes event reporter for Sentry. Contribute to getsentry/sentry-kubernetes development by creating an account on GitHub.

Erik Osterman avatar
Erik Osterman

We use this… happy with it

Erik Osterman avatar
Erik Osterman

This is for shipping to sentry rather than ES

erik-stephens avatar
erik-stephens

We’ve thought about deploying sentry for application level errors. Maybe this will help tip the scales

Erik Osterman avatar
Erik Osterman

Do you use #helmfile?

erik-stephens avatar
erik-stephens

Yup!

Erik Osterman avatar
Erik Osterman
cloudposse/helmfiles

Comprehensive Distribution of Helmfiles for Kubernetes - cloudposse/helmfiles

Erik Osterman avatar
Erik Osterman

Ours is here

Erik Osterman avatar
Erik Osterman

For sentry

Erik Osterman avatar
Erik Osterman

And the forwarder

erik-stephens avatar
erik-stephens

You know of any issues with your charts working in air-gapped (no internet access) environments?

erik-stephens avatar
erik-stephens

Just curious, cuz we typically have to fork charts to get them working here.

erik-stephens avatar
erik-stephens

Nevermind, those are helmfiles :)

Erik Osterman avatar
Erik Osterman

Unfortunately, no experience working in airgapped environments

Erik Osterman avatar
Erik Osterman

@erik-stephens totally randomly (just right now) I stumbled across this tool: https://github.com/max-rocket-internet/k8s-event-logger

max-rocket-internet/k8s-event-logger

Watches k8s cluster events and logs them to to stdout in JSON - max-rocket-internet/k8s-event-logger

Erik Osterman avatar
Erik Osterman

it’s a pretty smart solution.

Erik Osterman avatar
Erik Osterman

basically you just deploy a pod that echos the events to stdout

Erik Osterman avatar
Erik Osterman

that way whatever logging solution you already have in place will capture them

erik-stephens avatar
erik-stephens

@Erik Osterman Looks like just the thing we were looking for. Much appreciated!

Ben Read avatar
Ben Read
Kubernetes for Full-Stack Developers, a self-guided course. | DigitalOcean attachment image

Whether you’re just curious, getting started with Kubernetes, or have experience with it, this curriculum will help you learn more about Kubernetes and …

2019-12-18

2019-12-17

btai avatar

would i be able to figure out what causes my worker node reboot by looking at /var/log/syslog and /var/log/messages ? I took an AMI copy of the worker node after i cordoned+drained it and it doesn’t save the dmesg logs from the actual instance. I do have those other two logs @Erik Osterman

Erik Osterman avatar
Erik Osterman

what is your base OS?

Erik Osterman avatar
Erik Osterman

you can also maybe take a look at the AWS EC2 tty console logs

Erik Osterman avatar
Erik Osterman

kernel errors are spewed there too

Erik Osterman avatar
Erik Osterman

(e.g. OOM)

btai avatar

looks like its missing from there. the logs i see only show kernel message timestamped after the reboot.

btai avatar

what is the best way to ship dmesg logs? i lack visibility there and our log shipping agents (run as daemonset) arent shipping dmesg logs

Erik Osterman avatar
Erik Osterman

What is your agent?

btai avatar

logdna @Erik Osterman

btai avatar

i see kern.log, syslog, auth.log, daemon.log being shipped but not my dmesg logs

Erik Osterman avatar
Erik Osterman

Hrmm kern.log should have the output too

btai avatar

yeah i see logs like this at about the time the node reboots

btai avatar
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000] Linux version 4.9.0-11-amd64 ([[email protected]](mailto:[email protected])) (gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1) ) #1 SMP Debian 4.9.189-3+deb9u2 (2019-11-11)
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.9.0-11-amd64 root=UUID=095c0fd8-a089-4b98-bd6c-7bd0b84c6abd ro init=/bin/systemd net.ifnames=0 biosdevname=0 cgroup_enable=memory oops=panic panic=10 console=ttyS0 nvme_[core.io](http://core.io)_timeout=255
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000] e820: BIOS-provided physical RAM map:
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009dfff] usable
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000] BIOS-e820: [mem 0x000000000009e000-0x000000000009ffff] reserved
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000007fffffff] usable
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000] BIOS-e820: [mem 0x00000000fc000000-0x00000000ffffffff] reserved
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000] BIOS-e820: [mem 0x0000000100000000-0x0000000fbfffffff] usable
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000] NX (Execute Disable) protection: active
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000] SMBIOS 2.7 present.
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000] DMI: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000] Hypervisor detected: Xen
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000] Xen version 4.2.
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000] Xen Platform PCI: I/O protocol version 1
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000] Netfront and the Xen platform PCI driver have been compiled for this kernel: unplug emulated NICs.
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000] Blkfront and the Xen platform PCI driver have been compiled for this kernel: unplug emulated disks.
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000] You might have to change the root device
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000] from /dev/hd[a-d] to /dev/xvd[a-d]
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000] in your root= kernel command line option
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000] HVMOP_pagetable_dying not supported
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000] e820: remove [mem 0x000a0000-0x000fffff] usable
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000] e820: last_pfn = 0xfc0000 max_arch_pfn = 0x400000000
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000] MTRR default type: write-back
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000] MTRR fixed ranges enabled:
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000]   00000-9FFFF write-back
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000]   A0000-BFFFF write-combining
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000]   C0000-FFFFF write-back
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000] MTRR variable ranges enabled:
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000]   0 base 000080000000 mask 3FFFC0000000 uncachable
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000]   1 base 0000C0000000 mask 3FFFE0000000 uncachable
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000]   2 base 0000E0000000 mask 3FFFF0000000 uncachable
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000]   3 base 0000F0000000 mask 3FFFF8000000 uncachable
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000]   4 base 0000F8000000 mask 3FFFFC000000 uncachable
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000]   5 disabled
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000]   6 disabled
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000]   7 disabled
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WC  UC- WT
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000] e820: last_pfn = 0x80000 max_arch_pfn = 0x400000000
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000] found SMP MP-table at [mem 0x000fbc50-0x000fbc5f] mapped at [ffff945e000fbc50]
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000] Base memory trampoline at [ffff945e00098000] 98000 size 24576
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000] Using GB pages for direct mapping
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000] BRK [0x770d35000, 0x770d35fff] PGTABLE
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000] BRK [0x770d36000, 0x770d36fff] PGTABLE
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000] BRK [0x770d37000, 0x770d37fff] PGTABLE
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000] BRK [0x770d38000, 0x770d38fff] PGTABLE
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000] RAMDISK: [mem 0x35a95000-0x36d41fff]
Dec 17 10:57:51 ip-172-29-93-9 kernel: [    0.000000] ACPI: Early table checksum verification disabled
btai avatar

but unless im missing something, i cant seem to find the log messages that tell me why it needed to reboot

Erik Osterman avatar
Erik Osterman

The number in brackets is seconds of uptime

Erik Osterman avatar
Erik Osterman

What you want is a big number there presumably since this comes after the server has been online

Erik Osterman avatar
Erik Osterman

For a bit

Erik Osterman avatar
Erik Osterman

Also confirm the output in the log matches dmesg

Erik Osterman avatar
Erik Osterman

You can also stream dmesg with “dmesg —follow”

btai avatar

i believe i dont have the dmesg logs anymore

btai avatar

i booted this instance off an ami copy i took.

Erik Osterman avatar
Erik Osterman

Oh, ya, I am talking about more making sure you can capture it in the future

btai avatar

it was ~3am at the time and i wasnt thinking and had shut down the actual instance

btai avatar

ah

Erik Osterman avatar
Erik Osterman

You won’t be able to get anything from an ami copy

Erik Osterman avatar
Erik Osterman

It’s about useless.

btai avatar

good to know

btai avatar

yeah these logs in the ami copy all seem to be missing the piece i need to debug this

Erik Osterman avatar
Erik Osterman

If you could snapshot a machines runtime state including memory, maybe it would help but the ami copy useless in this situation since you are shutting down and copying the file system only

btai avatar

yeah

Erik Osterman avatar
Erik Osterman

The kernel writes to a ring buffer in memory

btai avatar

how are you shipping those buffer logs?

Erik Osterman avatar
Erik Osterman

This is why it needs to be streamed somewhere which can be to disk

btai avatar

i seee

Erik Osterman avatar
Erik Osterman

I gotta step away, but systemd and syslog are both capable to streaming that to disk

btai avatar

thanks @Erik Osterman pointing me in the right direction

btai avatar

@Erik Osterman is a good approach to create custom ami using the kops ami as a base image that just streams dmesg to disk in the init script?

btai avatar

i can track down to the time in which i believe the kubelet starts up again in syslog but theres approx 1 minute of nothing that gets logged to syslog right before that point

Arjun Iyer avatar
Arjun Iyer

Thanks for the invite @rms1000watt!

2
rms1000watt avatar
rms1000watt

https://www.signadot.com/ ^^^^ is Arjun’s Company

Arjun Iyer avatar
Arjun Iyer

Still in stealth mode Here to learn from this active community !

Erik Osterman avatar
Erik Osterman

Welcome @Arjun Iyer !

Arjun Iyer avatar
Arjun Iyer

Thanks @Erik Osterman ! Looking forward to connecting

2019-12-14

Chris Fowles avatar
Chris Fowles

you can re-encrypt traffic to the backend without needing to turn off termination at the elb

Chris Fowles avatar
Chris Fowles

common pattern is to encrypt with private certificates internally and then a publicly trusted wildcard at the elb

2019-12-13

btai avatar

If we set SSL termination at the cloud provider elb level. The traffic from our node to our ingress controller should be unencrypted right. hypothetically if i wanted to have it encrypted through to the Ingress controller is the best way to turn off ssl termination on the cloud provider elb?

2019-12-12

rms1000watt avatar
rms1000watt

I don’t see an #istio channel or anything, so I’ll post here

rms1000watt avatar
rms1000watt

whyyyyyyyyyyyy?!?!?!

Erik Osterman avatar
Erik Osterman

from our experience, the “why” is that helm hasn’t solved the CRD problem

Erik Osterman avatar
Erik Osterman

that said, I wish this would just be solved in Helm

Erik Osterman avatar
Erik Osterman

I think @mumoshu has done some related work on this: https://github.com/helm/helm/issues/6505

helm3 template fails when there are custom resource templates · Issue #6505 · helm/helm

Output of helm version: v3.0.0-beta.3 When I have a custom resource in a helm template, if I try and use helm template it fails with: $ helm template . Error: apiVersion "certmanager.k8s.io/v1…

Erik Osterman avatar
Erik Osterman

Erik Osterman avatar
Erik Osterman

Wow, that sucks.

Erik Osterman avatar
Erik Osterman

We also saw they were investing in a new helm chart that we’ve been waiting for in parallel to this istioctl tool

Erik Osterman avatar
Erik Osterman

@Jeremy Grodberg

Erik Osterman avatar
Erik Osterman
istio/installer

A modular, a-la-carte installer for Istio components - istio/installer

Erik Osterman avatar
Erik Osterman

but i guess that’s just the telemetry portion which is built on helm charts

Erik Osterman avatar
Erik Osterman
Erik Osterman avatar
Erik Osterman

though this is wierd…

Erik Osterman avatar
Erik Osterman

and it’s actively being developed

Erik Osterman avatar
Erik Osterman
Erik Osterman avatar
Erik Osterman

@Rhooker @Adam Carlile heads up

Rhooker avatar
Rhooker
11:27:54 PM

@Rhooker has joined the channel

Adam Carlile avatar
Adam Carlile
11:27:54 PM

@Adam Carlile has joined the channel

rms1000watt avatar
rms1000watt

this just means they’re carving out a niche for cloudposse to maintain working helm charts for helmfile deploys

2
Erik Osterman avatar
Erik Osterman

haha - really hoping not to! my preference is deferring thought leadership as much as possible, especially with something as complicated as istio

1
Pierre Humberdroz avatar
Pierre Humberdroz

so if I am redeploying clusters on a bi weekly basis how does istio want me to manage that?

Pierre Humberdroz avatar
Pierre Humberdroz

in praticular this is for dev and integration clusters which we tear up and down on semi regular basis and sometimes even spin up for a certain feature

2019-12-11

sarkis avatar
sarkis

so what’s this Kustomize I am hearing about? Worth looking into? :)

Erik Osterman avatar
Erik Osterman

@mumoshu I think uses it

mumoshu avatar
mumoshu

i’d say it’s a valid next step after you get tied of kubectl+sed

1
kskewes avatar
kskewes

Cuelang looks great, recommend watching the tgik8s episode with Joe Beda just released on the weekend.

2019-12-10

Roderik van der Veer avatar
Roderik van der Veer

A tip that might save you the same 6 hours of your life that I lost: It appears that a) you can overload the metrics-server pod on GKE by calling it too many times and b) helm will stops working when any api is down (fixes for 2.16 and 3.1 in progress/merged). Tips to work around it here: https://github.com/helm/helm/issues/6361

unable to retrieve the complete list of server APIs · Issue #6361 · helm/helm

Output of helm version: version.BuildInfo{Version:"v3.0+unreleased", GitCommit:"180db556aaf45f34516f8ddb9ddac28d71736a3e", GitTreeState:"clean", GoVersion:"go1.13…

2

2019-12-09

johncblandii avatar
johncblandii

Anyone have good structural patterns to follow for node labeling?

For instance, do you label specific nodes for general applications or are they app specific or both?

Maycon Santos avatar
Maycon Santos

I’ve tried team specific labels but as we run our cluster on AWS It prove not to be optimal as some teams different requirements . and we end up with underutilized nodes.

:--1:1
Maycon Santos avatar
Maycon Santos

What we do now is one main label that everybody uses for stateless and regular workloads, and created couple labels for resource specific workloads, like CPU, RAM, GPU and Storage.

johncblandii avatar
johncblandii

Underutilized was my main concern

johncblandii avatar
johncblandii

I was considering doing it by type too: default, CI (more resource intensive; hello Jenkins), etc.

roth.andy avatar
roth.andy

This checklist, under “Tagging resources”, has some good practices: https://learnk8s.io/production-best-practices

Kubernetes production best practices

This document highlights and consolidates best practices for building, deploying and scaling apps on Kubernetes in production.

roth.andy avatar
roth.andy

nevermind, I read too quickly, you were specifically asking about node labeling, which this isn’t talking about

johncblandii avatar
johncblandii

all good, @roth.andy. I’m sure i can learn something from there

2019-12-04

Adam Blackwell avatar
Adam Blackwell

Hey folks. I tried a few searches, but I apologize if this question is answered elsewhere. I’m about to move the POC certmanager deployment which can’t perform dns001 chalenges to https://github.com/cloudposse/charts/tree/master/incubator/cert-manager but before I do I was curious if there was anyone here who is using it on EKS with a service account & aws-iam-authenticator (aka not kiam)?

I’m getting this error without using a chart and am struggling to see what I’ve misconfigured:

controller.go:131] cert-manager/controller/challenges "msg"="re-queuing item  due to error processing" "error"="error instantiating route53 challenge solver: unable to assume role: WebIdentityErr: unable to read file at /var/run/secrets/eks.amazonaws.com/serviceaccount/token\ncaused by: open /var/run/secrets/eks.amazonaws.com/serviceaccount/token: permission denied" "key"="edx..."
cloudposse/charts

The “Cloud Posse” Distribution of Kubernetes Applications - cloudposse/charts

Adam Blackwell avatar
Adam Blackwell
Docker images with non-root account fails to read token file · Issue #8 · aws/amazon-eks-pod-identity-webhook

What happened: Tried to install External DNS to my EKS cluster with amazon-eks-pod-identity-webhook What you expected to happen: External DNS working with IAM credentials provided by amazon-eks-pod…

2019-12-03

Pablo Costa avatar
Pablo Costa
Amazon EKS on AWS Fargate Now Generally Available | Amazon Web Services attachment image

Starting today, you can start using to run Kubernetes pods on . and make it straightforward to run Kubernetes-based applications on AWS by removing the need to provision and manage infrastructure for pods. With , customers don’t need to be experts in Kubernetes operations to run a cost-optimized and highly-available cluster. eliminates the need for […]

4
Erik Osterman avatar
Erik Osterman

#first

Erik Osterman avatar
Erik Osterman

@aknysh

:--1:1
2
Chris Fowles avatar
Chris Fowles

i really don’t like that pattern of mapping a namespace

stobiewankenobi avatar
stobiewankenobi

I think it supports labels as well, it looks like there are multiple ways to tell it what to do and what should launch in fargate.

stobiewankenobi avatar
stobiewankenobi

Nice beard btw

Chris Fowles avatar
Chris Fowles

surely affinities and taints would be a much better way of doing that

1

2019-12-02

    keyboard_arrow_up