#kubernetes (2020-11)

kubernetes

Archive: https://archive.sweetops.com/kubernetes/

2020-11-29

loren avatar
loren

Has me nodding along, kube already feels like legacy to me, it will be replaced by something sooner than later (by the founder of tailscale, which just came up in #random)… https://blog.dave.tf/post/new-kubernetes/

A better Kubernetes, from the ground up · blog.dave.tf

If we were to start over from first principles, what would I do differently than k8s?

2020-11-28

Padarn avatar
Padarn

Hi all - what are some approaches for application config management in kubernetes? A few topics I’m interested in

• dynamic configuration (say for example configuration a feature flag in an app)

• deploying applications that share configuration (or secrets) Just looking for some projects to look into to get a feel for what people are doing

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

Good question for office hours

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

Some ideas are the various operators for managing secrets like external secrets from SSM, ASM, Vault

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

The Reloader project from stakater to automatically roll replica sets when secrets or config maps change

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

#helmfile for managing configuration of helm releases

Vlad Ionescu avatar
Vlad Ionescu

Dynamic config: LaunchDarkly, Split, or whatever hosted provider. I saw many re-implementations fail spectacularly

Deploying apps sharing the same config: beware creating a distributed monolith! I’d vote for helmfile / ParameterStore / shared ConfigMap.

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

Flagger is an open source alternative for launchdarkly https://github.com/weaveworks/flagger

weaveworks/flagger

Progressive delivery Kubernetes operator (Canary, A/B Testing and Blue/Green deployments) - weaveworks/flagger

Vlad Ionescu avatar
Vlad Ionescu

Alternative-ish.

Flagger does Blue/Green in all its forms (canary, A/B, mirroring). Let’s take an example with 1% seeing the new feature. That is a random 1% of requests.

With LaunchDarkly (or whatever else feature flagging tool) you get to chose the 1%: instead of random you could show the feature to users whose email ends in [mycorp.com](http://mycorp\.com), or the marketing team, or that specific QA team that will test the thing, or users who are on the free plan, or users who opt in to seeing beta features, or the client who this feature is developed for, and so on. You get a lot more control on risk allocation.

Padarn avatar
Padarn

Thanks guys, a lot of good reading to look into

Padarn avatar
Padarn

I’ll move it to #office-hours for any follow up

2020-11-27

2020-11-26

Mikael Fridh avatar
Mikael Fridh

Do you guys (iptables or network policy) block the EC2 metadata api or redirect to a metadata proxy for containers to remain “sane” when providing iam roles via the “native” eks service role method?

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

When we were using kiam on kops, we would do this, but haven’t carried it over to EKS.

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

If it would help, I can dig up the rules we used

Mikael Fridh avatar
Mikael Fridh

No need Erik. Thanks.

I think the cleanest way is a metadata proxy.

If blocking metadata completely some apps might complain if they expect or need certain metadata. However. In most cases the needs are simple and providing an explicit AWS_REGION environment variable does the trick.

I think if I was to provide some form of multi tenant cluster with EC2 like promises (which I’m not really in the business of doing right now) I might be obliged to provide some emulation of the regular EC2 metadata.

2020-11-25

Amit Karpe avatar
Amit Karpe

Any idea, if we corrupt the configmap aws-auth, then how to recover it? Once configmap aws-auth settings got corrupted no one can access EKS Cluster. Any workaround?

Issif avatar
Issif

one user can still access to your cluster, it’s the user you used for creation

Issif avatar
Issif

for our part, it’s a dedicated user for terraform, as we apply from a CI/CD

1
Amit Karpe avatar
Amit Karpe

Thank you. We able to revert the bad changes by EKS creator user

Issif avatar
Issif

perfect

2020-11-24

Vugar avatar
Vugar

Greetings! I was wondering if anyone had any chance to play with crossplane? Would you know if it is somewhat comparable to TF cloud operator?

Crossplane

Manage any infrastructure your applications need directly from Kubernetes

tim.j.birkett avatar
tim.j.birkett

Do you have a link to TF cloud operator?

Crossplane

Manage any infrastructure your applications need directly from Kubernetes

tim.j.birkett avatar
tim.j.birkett

I’m thinking of looking at crossplane. It looks pretty good.

Vugar avatar
Vugar

Yep here is TF cloud operator link: https://github.com/hashicorp/terraform-k8s

hashicorp/terraform-k8s

Terraform Cloud Operator for Kubernetes. Contribute to hashicorp/terraform-k8s development by creating an account on GitHub.

Vugar avatar
Vugar

I wonder if crossplane has anything similar to TF state… haven’t had a chance to look into documentation yet

Hao Wang avatar
Hao Wang

pretty neat project, thanks for sharing

:--1:1
Aumkar Prajapati avatar
Aumkar Prajapati

Hey all, working with creating a new cluster, basically running into an issue where basically the alb-ingress-controller can’t see any subnets on an ingress being created despite those subnets existing, any ideas?

aws-load-balancer-controller-5d96f6c4f6-vq86z controller {"level":"error","ts":1606243773.951048,"logger":"controller","msg":"Reconciler error","controller":"ingress","name":"grafana","namespace":"monitoring","error":"couldn't auto-discover subnets: unable to discover at least one subnet"}
Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

(Note the alb-ingress-controller is now the https://github.com/kubernetes-sigs/aws-load-balancer-controller)

kubernetes-sigs/aws-load-balancer-controller

A Kubernetes controller for Elastic Load Balancers - kubernetes-sigs/aws-load-balancer-controller

Aumkar Prajapati avatar
Aumkar Prajapati

Yup, that’s the one I’m using ^^

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

I think this usually happens if the subnets aren’t tagged properly

Aumkar Prajapati avatar
Aumkar Prajapati
:--1:1
Aumkar Prajapati avatar
Aumkar Prajapati

What should the subnets be tagged with? I have 3 public and 3 private subnets, attached to a single vpc that the eks cluster is a part of

Aumkar Prajapati avatar
Aumkar Prajapati

Here’s the terraform used to create the vpcs / subnets

module "us-prod-eks-vpc" {
  source = "terraform-aws-modules/vpc/aws"

  providers = { aws = [aws.us](http://aws\.us) }
  name = "us-prod-eks-vpc"
  cidr = "10.0.0.0/16"

  azs             = ["us-east-1a", "us-east-1b", "us-east-1c"]
  private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
  public_subnets  = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]

  enable_nat_gateway = true
  enable_vpn_gateway = false

  tags = {
    Name = "us-prod-eks-vpc"
    Environment = "prod-us"
    Region = "us-east-1"
  }
}
Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)
Application load balancing on Amazon EKS - Amazon EKS

You can load balance application traffic across pods using the AWS Application Load Balancer (ALB). To learn more, see What is an Application Load Balancer? in the Application Load Balancers User Guide . You can share an ALB across multiple applications in your Kubernetes cluster using Ingress groups. In the past, you needed to use a separate ALB for each application. The controller automatically provisions AWS ALBs in response to Kubernetes Ingress objects. ALBs can be used with pods deployed to nodes or to AWS Fargate. You can deploy an ALB to public or private subnets.

Aumkar Prajapati avatar
Aumkar Prajapati

I’ll take a look, thanks

Aumkar Prajapati avatar
Aumkar Prajapati

Okay so I did the tags and they indeed were missing from the subnets, added those in, still results in the same error :s

Aumkar Prajapati avatar
Aumkar Prajapati
  public_subnet_tags = {
    "[kubernetes.io/role/elb](http://kubernetes\.io/role/elb)" = "1"
  }

  private_subnet_tags = {
    "[kubernetes.io/role/internal-elb](http://kubernetes\.io/role/internal\-elb)" = "1"
  }

basically were added in

btai avatar

try adding this tag to your VPC

locals {
  tags = {   "[kubernetes.io/cluster/${local.name}](http://kubernetes\.io/cluster/\$\{local\.name\})" = "shared"
  }
}
Emmanuel Gelati avatar
Emmanuel Gelati

I confirm, using the above tag solved the issue for me too

:100:1
Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

If anyone wants to open up a PR to document this - that would probably save many others grief :-)

Padarn avatar
Padarn

Hi guys - can someone hep explain to me how port-forward works under the hood. I got a bit confused when I saw that the rbac permissions required need “create” permissions:

 rule {
    apiGroups = [""]
    resources = ["pods/portforward"]
    verbs     = ["get", "list", "create"]
  }
Dhrumil Patel avatar
Dhrumil Patel

Hi all, is anyone know any tools that can provide SAML authentication in Kubernetes EKS cluster It may possible using HashiCorp Boundary but I want to explore other tools…

Padarn avatar
Padarn

does something like https://github.com/heptiolabs/gangway help?

heptiolabs/gangway

An application that can be used to easily enable authentication flows via OIDC for a kubernetes cluster. - heptiolabs/gangway

:--1:2
Padarn avatar
Padarn

its OpenID based but maybe it can help with your use case

Dhrumil Patel avatar
Dhrumil Patel

Thanks I will check…

loren avatar
loren
How to Set Up Kubernetes SSO with SAML attachment image

Kubernetes doesn’t support native SAML integration. Learn how to configure SAML single sign on (SSO) for Kubernetes clusters with user impersonation.

:--1:4
Padarn avatar
Padarn

oh thats nice @loren

Padarn avatar
Padarn

thanks foor sharing

2020-11-23

tim.j.birkett avatar
tim.j.birkett

Did anybody else get hit by the EKS AMI - https://github.com/awslabs/amazon-eks-ami/releases/tag/v20201112 - that was fun

Release AMI Release v20201112 · awslabs/amazon-eks-ami

[RECALLED] AMI Release v20201112 amazon-eks-gpu-node-1.18-v20201112 amazon-eks-gpu-node-1.17-v20201112 amazon-eks-gpu-node-1.16-v20201112 amazon-eks-gpu-node-1.15-v20201112 amazon-eks-arm64-node-1…

pjaudiomv avatar
pjaudiomv

im totally using that ami

Release AMI Release v20201112 · awslabs/amazon-eks-ami

[RECALLED] AMI Release v20201112 amazon-eks-gpu-node-1.18-v20201112 amazon-eks-gpu-node-1.17-v20201112 amazon-eks-gpu-node-1.16-v20201112 amazon-eks-gpu-node-1.15-v20201112 amazon-eks-arm64-node-1…

tim.j.birkett avatar
tim.j.birkett

So, it doesn’t impact the service of applications if you run mutiple instances on different nodes… But, what you’ll see is application restarts, nodes randomly become NotReady . If you have a fair bit of monitoring of things like replica counts, kube node status you’ll get a bit of noise. Along with perhaps, developers claiming that they should move back to Beanstalk, Fargate, Cloud Provider X as AWS should not have let a buggy AMI get out…

1
tim.j.birkett avatar
tim.j.birkett
Release AMI Release v20201117 · awslabs/amazon-eks-ami

AMI Release v20201117 amazon-eks-gpu-node-1.18-v20201117 amazon-eks-gpu-node-1.17-v20201117 amazon-eks-gpu-node-1.16-v20201117 amazon-eks-gpu-node-1.15-v20201117 amazon-eks-arm64-node-1.18-v202011…

btai avatar

yeah i was bit by that bug but the 20201117 amis are fine though

2020-11-22

Amit Karpe avatar
Amit Karpe

While upgrading EKS from 1.15 to 1.18 using eks module, do we have to upgrade step by step like 1.15 ==> 1.16 ==> 1.17 ==> 1.18 or I can directly modify cluster_version = "1.18" and tf apply will do all magic?

aaratn avatar
aaratn

I dont think so

aaratn avatar
aaratn

What I usually do is update on console and bump terraform to that version

Amit Karpe avatar
Amit Karpe

What is will pros/corn to use tf to build EKS and use EKS console to upgrade? Any unexpected side effect?

Aleksandr Fofanov avatar
Aleksandr Fofanov

I’m not sure if EKS will allow to do that, but you should upgrade one minor version at a time, because control plain components are only backward-compatible with the previous minor version, so you can’t really jump two versions ahead. https://kubernetes.io/docs/setup/release/version-skew-policy/ Even if EKS allows to do that, they will for sure upgrade one minor version at a time under the hood.

Aleksandr Fofanov avatar
Aleksandr Fofanov

Also, I would check if all of your cluster resources are compatible with the new version before upgrading.

Amit Karpe avatar
Amit Karpe

What kind of cluster resources? How you check? Do your create new EKS or sandbox EKS to test upgrade work or not and then you do it on the main eks cluster?

Amit Karpe avatar
Amit Karpe

I have impression that EKS module will manage upgrade for us. i.e. Instead of running tf apply for 3 times. If we declare cluster_version = "1.18" then EKS module will do it same as 1.15 ==> 1.16 ==> 1.17 ==> 1.18 Do you think my assumption is correct?

Aleksandr Fofanov avatar
Aleksandr Fofanov

Nope, I don’t think that it will do that for you.

aaratn avatar
aaratn

You can always check source of module if there auto upgrade magic in it ( which I am mostly sure isn’t the case)

2
1
Amit Karpe avatar
Amit Karpe

Thank you @aaratn & @Aleksandr Fofanov I think my assumption is wrong!!!

Aleksandr Fofanov avatar
Aleksandr Fofanov

@ Regarding how to find resources which uses deprecated APIs before upgrading.

Aleksandr Fofanov avatar
Aleksandr Fofanov

Since you are jumping from 1.15 to 1.16 it’s worth checking if your cluster have any

1
Aleksandr Fofanov avatar
Aleksandr Fofanov

And also there is pluto tool too

Aleksandr Fofanov avatar
Aleksandr Fofanov
Kubernetes Solved: Easily Find Deprecated API Versions with Pluto attachment image

Pluto is an open source utility to help users easily find deprecated Kubernetes API versions in the Infrastructure-as-Code repositories and Helm releases.

2
tim.j.birkett avatar
tim.j.birkett

Pluto helped us in the 1.15 -> 1.16 migration. With Helm, this plugin helped too: https://github.com/hickeyma/helm-mapkubeapis - When you upgrade to 1.16 your running Kubernetes deployments will be transposed in-place and everything keeps running. The problem comes when you try to make a deployment (or rollback) and it fails halting that applications pipeline, which is of course, the end of the world.

hickeyma/helm-mapkubeapis

This is a Helm plugin which map deprecated or removed Kubernetes APIs in a release to supported APIs - hickeyma/helm-mapkubeapis

:100:1

2020-11-21

2020-11-20

2020-11-19

Tarlan Isaev avatar
Tarlan Isaev

Hi guys, how could I fix this base64 decoding? It spews out this gibberish from Jenkins’ pod :)

printf $(kubectl get secret --namespace default jenkins-160573443 -o jsonpath="{.data.jenkins-admin-password}" | base64 --decode);echo
�K��ly�޷�jg���u�ں"�ϵ�N{߯5��#��
Matt Gowie avatar
Matt Gowie

You can try this:

export SECRET_NAME=$(kubectl get secrets -n namespace | grep pod-name- | cut -d ' ' -f 1)
kubectl describe secrets $SECRET_NAME -n namespace 
:--1:1
Tarlan Isaev avatar
Tarlan Isaev

Drops bunch of tokens and jenkins-password: 10 bytes in bytes

https://pastebin.com/ZfnGaPkm

export SECRET_NAME=$(kubectl get secrets -n default | grep jenkins-1605xxxxxx-54 - Pastebin.com

Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.

Tarlan Isaev avatar
Tarlan Isaev

kubectl get secret shows up jenkins-password in plain text, however, Jenkins doesn’t digest password with the user profile name :) pastebin.com/65bMbuED

Tarlan Isaev avatar
Tarlan Isaev

Got help from Bitnami folks, turns out base64 decoding is buggy on macOS 11.0.1 - have to fix it then. For the testing purposes, I used that suggested website for decoding a password string. Thanks for your help mate :)

“Hi @organicnz, I’m not really sure what is happening there. When the chart is deployed, some notes are shown with information about the commands that you need to execute to obtain the credentials. You can obtain those notes again by executing: helm get notes YOUR_INSTALLATION_NAME You will find a command like the following one: echo Password: $(kubectl get secret –namespace default YOUR_INSTALLATION_NAME -o jsonpath=“{.data.jenkins-password}” | base64 –decode) If you want to do what the command above is doing in a more manual way, you can also get the secret and output as yaml and you will get a plain-text string that is base64-encoded. You need to decode it to get the actual password. If the implementation of your base64 utility is buggy, just for testing purposes, you can use an online tool like this one: https://emn178.github.io/online-tools/base64_decode.html Again, this is just for testing purposes, you should never share your password on the internet.”

kubectl get secret spews out a gibberish from Jenkins · Issue #4444 · bitnami/charts

Hey guys, how to fix this base64 decoding? It spews out this gibberish from bitnami/jenkins pod :) printf $(kubectl get secret –namespace default jenkins-1605736819 -o jsonpath="{.data.jenkin…

Base64 Decode Online

Base64 online decode function

Matt Gowie avatar
Matt Gowie

Hey folks — I want to switch a project off of DataDog log management to ELK due to cost. I’m looking for the best resources to do that — Any recommendations?

I’ve asked this here lightly before and I know the Cloud Posse approach is FluentD => Firehose => ElasticSearch. I’d like to implement something similar with FluentBit > FluentD (project is running Fargate so smaller sidecar containers + aws-for-fluentbit is attractive), but before I dive into implementing all of that I figured I should ask what’re the best resources / OSS / possible terraform modules I should pick up to accomplish this with the least amount of pain.

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

Make sure to evaluate the cost of ES. The amount of storage is limited by the instance size and it’s easy to get up to 500/mo for a cluster that barely stores a week of data and has HA. Then consider how many you want? You don’t want all logs going to the same cluster. Let’s say you have 4 accounts and 4 small clusters with HA. My point is it’s very easy to end up paying thousands of dollars a month for ES and the tight integration with APM and Metrics you get with Datadog is lacking. You’re also on the hook for developing and keeping it all up to date. I am curious if the cost/benefit really works out in favor of ES

Matt Gowie avatar
Matt Gowie

Yeah, valid point. I believe I’m only going to run HA and more than a day of log retention in production environments. The client is single tenant environment per customer (enterprise customers), so really it’ll be expensive anyway we run it AFAICT.

The problem I / the client had with DataDog logs was that we have 3 clusters up right now and the logs cost alone for the month is $1K / cluster, which is pretty insane. I think anyway we do it should be cheaper than that.

Vugar avatar
Vugar

I guess as long as you can make sure that indexes are granular it should make it easy to snapshot indexes to S3 buckets… and remove these afterwards… But then logs are not available for Kibana… so not sure if these will be of any use at all?

Vugar avatar
Vugar

AWS has ultrawarm storage of ES now… but as far as I know you can’t snapshot your indexes while on utlrawarm - but you better to doublecheck I could be wrong on this one

Vugar avatar
Vugar

I wonder if sending these directly to S3 and performing searches via Athena will make it a bit more reasonable in terms of pricing…? But I am not sure if QA and Dev teams will like it… Dear @Erik Osterman (Cloud Posse) have you had any experience with Athena being used for log queries instead of indexing in ES and having all nice features of Kibana?

2020-11-18

 avatar
06:26:19 PM

Best way to manage EKS?

1

2020-11-17

Amit Karpe avatar
Amit Karpe
AWS Container Day: Kubernetes Edition

Join us for AWS Container Day, a fully live, virtual day of sessions all about Amazon EKS and Kubernetes at AWS, hosted by Containers from the Couch. At this Day Zero KubeCon event, the AWS Kubernetes team will be discussing new launches, demoing products and features, covering best practices, and answering your questions live on Twitch. If you have a question before the event, send us a note at [email protected]!

joey avatar
EKS Node Group Resource Tag Inheritance · Issue #13643 · hashicorp/terraform-provider-aws

Community Note Please vote on this issue by adding a :–1: reaction to the original issue to help the community and maintainers prioritize this request Please do not leave "+1" or other comme…

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

@Jeremy (Cloud Posse) what did we end up doing for this?

EKS Node Group Resource Tag Inheritance · Issue #13643 · hashicorp/terraform-provider-aws

Community Note Please vote on this issue by adding a :–1: reaction to the original issue to help the community and maintainers prioritize this request Please do not leave "+1" or other comme…

Jeremy (Cloud Posse) avatar
Jeremy (Cloud Posse)

@Erik Osterman (Cloud Posse) We have not really found a solution, but we do what we can. We manually tag everything we create, and we sort of got around this related issue by specifying a custom launch template that tags everything it can.

[EKS] [request]: Nodegroup should support tagging ASGs · Issue #608 · aws/containers-roadmap

Community Note Please vote on this issue by adding a :–1: reaction to the original issue to help the community and maintainers prioritize this request Please do not leave "+1" or "me to…

:--1:1
joey avatar

doh, i was trying to avoid using a custom launch template but i suppose i could not be lazy and do that

joey avatar

thanks Jeremy and Erik

Jeremy (Cloud Posse) avatar
Jeremy (Cloud Posse)

@joey If you use our terraform-aws-eks-node-group module to create your node group, it will make a launch template which propagates tags if you set the resources_to_tag variable.

cloudposse/terraform-aws-eks-node-group

Terraform module to provision an EKS Node Group. Contribute to cloudposse/terraform-aws-eks-node-group development by creating an account on GitHub.

joey avatar

thank you i got aws_launch_template working

joey avatar

although i’m peeved that you can’t use instance_market_options and spot with aws_launch_template

joey avatar

conveniently i’m primarily using this in a one-off use case where i have to use ondemand instances for now anyways, but i wanted to keep a dev environment that was nearly identical running with spot instances

2020-11-16

rei avatar

Are there any advantages on placing stuff like cert-manager, cluster-autoscaler , external-dns , aws-load-balancer-controller into the kube-system namespace, or isolate all this stuff into it’s own namespace? Almost all tutorials and example code use as default for this services/controllers the kube-system namespace, however there is are advantages on splitting everything into namespaces.

btai avatar

I think for quality of life/debugging purposes, a query via namespaces will be a tad bit faster than query via label within the kube-system. depending on the cluster size, the initial amount of pods running in kube-system can already be pretty high. for example, I like that our logging provider I can just query via namespace:cert-manager as opposed to namespace:kube-system label:app=?? or service=?? where you possibly have to remember how each 3rd party helm chart labels their deployments etc.

rei avatar

This is also my thinking. I like more the idea of grouping everything into its own namespace and there is even no limit in the number .

rei avatar

What make think about this was the following descriptios for the cluster-autoscaler extra argument:

extraArgs:
  [...]
  # If true cluster autoscaler will never delete nodes with pods from kube-system (except for DaemonSet or mirror pods)
  skip-nodes-with-system-pods: "false"

Apparently the kube-system pods are actually handled differently…

2020-11-13

Shreyank Sharma avatar
Shreyank Sharma

Am trying cluster migration in AWS, Both  k8s clusters are in same region. Cluster 1 : Deployed 2 Application with PV reclaim policy one as Delete and another as Retain, and annotated so it will take Restic backup. Cluster 2: Restored those 2 applications, worked fine.

again Cluster 1: Deployed same 2 application with Reclaim policy as Delete and Retain but not annotated so it took snapshot when i backup. Cluster 2: Restore did not work as PV volume is failed to attach with the following Warning FailedAttachVolume pod/<pod-name> AttachVolume.Attach failed for volume "pvc-<id>" : Error attaching EBS volume "vol-<id>" to instance "i-<instance-id>": "UnauthorizedOperation: You are not authorized to perform this operation.

So, Snapshot restore feature will work in the same AWS region or am only getting this error????

2020-11-12

Mikael Fridh avatar
Mikael Fridh
08:38:38 PM

crossposting as might be relevant to kube too. Hope it’s useful to someone.

Sort of related to the Docker pull rate limit recently, although that wasn’t the primary reason I did it, I gathered some info and ideas from all over and set this sucker up to make my local k3d and argocd workflow bareable…. 1 minute 45 seconds instead of 12 minutes now for tearing down and bringing back up my full experimental localhost Kube stack.

https://github.com/frimik/local-docker-registry-proxy

:--1:1
Padarn avatar
Padarn

Is anyone using https://github.com/kubernetes-sigs/external-dns with a private cluster?

If so, I assume this will only work using AWS CNI? So that the DNS can resolve to the private vpc IP, which (some) other places can then route too

kubernetes-sigs/external-dns

Configure external DNS servers (AWS Route53, Google CloudDNS and others) for Kubernetes Ingresses and Services - kubernetes-sigs/external-dns

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

We have used extrnal-dns with calico too

kubernetes-sigs/external-dns

Configure external DNS servers (AWS Route53, Google CloudDNS and others) for Kubernetes Ingresses and Services - kubernetes-sigs/external-dns

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)
cloudposse/helmfiles

Comprehensive Distribution of Helmfiles for Kubernetes - cloudposse/helmfiles

Padarn avatar
Padarn

thanks - how does this work? what IPs does the DNS point to?

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

it’s the IPs of the Service

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

and a service can be of type load balancer which is internal or external

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

the details really depend a lot on your setup

Padarn avatar
Padarn

ahh got it, thank you

rei avatar

We have this type of setup running. Private eks cluster with internal NLB. NLB connects to ingress-nginx which forwards to the corresponding application ingress object. Cert-manager with dns01 and external-dns for pointing domains to the NLB ALIAS record

rei avatar

Based mostly on cloudposse Terraform modules and helmfiles. Although the helmfiles do need tweaking

rei avatar
cloudposse/terraform-aws-vpc-peering

Terraform module to create a peering connection between two VPCs in the same AWS account. - cloudposse/terraform-aws-vpc-peering

cloudposse/terraform-aws-eks-cluster

Terraform module for provisioning an EKS cluster. Contribute to cloudposse/terraform-aws-eks-cluster development by creating an account on GitHub.

cloudposse/terraform-aws-tfstate-backend

Terraform module that provision an S3 bucket to store the terraform.tfstate file and a DynamoDB table to lock the state file to prevent concurrent modifications and state corruption. - cloudposse…

Padarn avatar
Padarn

Wow thanks a lot. Really useful

2020-11-09

rei avatar

Hi, I am interested in knowing how do you organize your IaaC. looking for ideas. Currently we are building our new k8s based infrastructure, thus requiring Terraform, helm, helmfiles and gitlab ci. which is a good pattern to combine all this elements? monorepo? repo with submodules? script/makefile magic? what if the helmfiles and charts repos also contain stuff for the infra and main application?

2020-11-07

2020-11-06

joey avatar

does anyone have any recommendations or examples of best practices on handling patching deployments or otherwise running generic kubectl commands in an IaC pipeline? i use helmfile and terraform github actions for most things. i want to patch a couple things to, for example, coredns to increase replicas and create antiAffinities. that’s just one example; i know there are plenty of other ways to handle various dns things, but it’s on my mind. i don’t think it’d be terribly difficult to use a generic kubectl github action and do some parsing of path and files to make this work, but i potentially lose out of some niceties that i have with helmfile and terraform regarding modules and shared helmfiles.

2020-11-05

2020-11-04

RB avatar

Qovery Engine – open-source multi-cloud deployment - https://github.com/Qovery/engine

Qovery/engine

Deploy your apps on any Cloud providers in just a few seconds - Qovery/engine

RB avatar

So this will deploy kubernetes in your clouds, which deploy worker nodes, which deploy containers on nodes

Qovery/engine

Deploy your apps on any Cloud providers in just a few seconds - Qovery/engine

RB avatar

Now we just need an automated deploy tool to deploy quovery lol

Shreyank Sharma avatar
Shreyank Sharma

Hello Experts.

we are running Kubernetes in AWS deployed using Kops, for backup purpose we are using Velero with Restic integration, I am new to Velero. And I wanted to know under what condition Velero will take EBS snapshot and and under what condition Velero will backup using Restic repo. for PV’s. Because we are having multiple PV’s and all the PV’s are annotated with kubectl annotate pod/<pod-name> [backup.Velero.io/backup-volumes=><pvcname](http://backup\.Velero\.io/backup\-volumes=), but some PV’s are backed-up using EBS Snapshot and some are backed up using Restic.

Thank you.

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

fwiw, kops is constantly snapshotting etcd and backing it up to S3

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

aha, you want to do PV snapshots, so ignore my comment.

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)
Kubernetes 1.17 Feature: Kubernetes Volume Snapshot Moves to Beta

Authors: Xing Yang, VMware & Xiangqian Yu, Google The Kubernetes Volume Snapshot feature is now beta in Kubernetes v1.17. It was introduced as alpha in Kubernetes v1.12, with a second alpha with breaking changes in Kubernetes v1.13. This post summarizes the changes in the beta release. What is a Volume Snapshot? Many storage systems (like Google Cloud Persistent Disks, Amazon Elastic Block Storage, and many on-premise storage systems) provide the ability to create a “snapshot” of a persistent volume.

Matt Gowie avatar
Matt Gowie

Hey @Andriy Knysh (Cloud Posse) @Erik Osterman (Cloud Posse) — Low priority, so take your time in getting back, but I see you guys use {{event.tags.cluster_name}} a bunch in your DD monitors.yaml. I’m not finding that variable available in the message content for my monitor, but my metrics/events do have that tag. Did you folks have to do something specific to enable more variables in scope of that message content? I’m struggling with that right now.

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

See the PR from yesterday

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)
Update Datadog monitors by aknysh · Pull Request #20 · cloudposse/terraform-datadog-monitor

what Update datadog monitors In the example. refactor the monitors into separate YAML files per category why When monitor type = query alert, Datadog does not look into the event tags, so we can…

Matt Gowie avatar
Matt Gowie

Ha got it. Thanks man.

Andriy Knysh (Cloud Posse) avatar
Andriy Knysh (Cloud Posse)

the tags are only available when monitors types are metrics or event

Andriy Knysh (Cloud Posse) avatar
Andriy Knysh (Cloud Posse)

for query type monitors, not available

Matt Gowie avatar
Matt Gowie

That’s what I gathered as well, but wasn’t sure if ya’ll were enabling something I was missing. Glad I caught right after you learned the same thing.

Eric Berg avatar
Eric Berg

Both of our ingresses are set up like this: ELB -> service -> pods. Does the ingress just pass the request to the service and let the service determine which node the pod will run on? I’m trying to get requests that come in to a node to pass the request to a service pod on the same node.

Mikael Fridh avatar
Mikael Fridh

This is new actually, finally coming to Kubernetes; https://kubernetes.io/docs/concepts/services-networking/service-topology/

Service Topology

FEATURE STATE: Kubernetes v1.17 [alpha] Service Topology enables a service to route traffic based upon the Node topology of the cluster. For example, a service can specify that traffic be preferentially routed to endpoints that are on the same Node as the client, or in the same availability zone. Introduction By default, traffic sent to a ClusterIP or NodePort Service may be routed to any backend address for the Service.

1
Eric Berg avatar
Eric Berg

That suggests that it’s not (easily) possible, currently.

Mikael Fridh avatar
Mikael Fridh
Service Topology

FEATURE STATE: Kubernetes v1.17 [alpha] Service Topology enables a service to route traffic based upon the Node topology of the cluster. For example, a service can specify that traffic be preferentially routed to endpoints that are on the same Node as the client, or in the same availability zone. Introduction By default, traffic sent to a ClusterIP or NodePort Service may be routed to any backend address for the Service.

Mikael Fridh avatar
Mikael Fridh


The EndpointSlice Controller: This controller maintains EndpointSlices for Services and the Pods they reference. This is controlled by the EndpointSlice feature gate. It has been enabled by default since Kubernetes 1.18.
So… 1.18 ? or maybe 1.19 is required … damn, now I’m unsure.
Kube-Proxy: When kube-proxy is configured to use EndpointSlices, it can support higher numbers of Service endpoints. This is controlled by the EndpointSliceProxying feature gate on Linux and WindowsEndpointSliceProxying on Windows. It has been enabled by default on Linux since Kubernetes 1.19

Mikael Fridh avatar
Mikael Fridh

Have this on TODO, hadn’t yet tried it.

Eric Berg avatar
Eric Berg

Well, this gives me good info to go on. Thanks. We’re at EKS 1.17 and I noticed that there are a few changes to get to 1.18 that I have to take a deeper look at. I’ll add this to the pile.

Mikael Fridh avatar
Mikael Fridh

this seems like 1.18 is good to go

Mikael Fridh avatar
Mikael Fridh


To enable service topology, enable the ServiceTopology and EndpointSlice feature gate for all Kubernetes components:

Mikael Fridh avatar
Mikael Fridh

both of those are enabled default on 1.18

Mikael Fridh avatar
Mikael Fridh

the second one I mentioned for kube-proxy (EndpointSliceProxying) isn’t mentioned as needed as far as I can tell.

Eric Berg avatar
Eric Berg

A little more research led me to understand that, in my current config (load balancer -> service), the path to the service pod that handles the request is already optimized, since the ingress-nginx controller populates the LB with the service pod IP:PORT configs, so leaving the LB, the requests go right to the service pods on whatever node the ELB chooses. If my understanding is correct, then I don’t need to implement any additional affinity or routing. Does that sound right to yoU?

Mikael Fridh avatar
Mikael Fridh

If the pods are immediately in the LB then I guess you’re fine :).

Mikael Fridh avatar
Mikael Fridh

I recall way back a really nifty example kubernetes service someone built… it was a simple http server responding back with the request headers, content etc, including extra detailed info about the current pod which responded…. anyone have a clue which one it was?.. podinfo of course… my brain suddenly woke up from being teased enough

:--1:1
rei avatar

It is a simple deployment with the whois docker container

Eric Berg avatar
Eric Berg

Is it called echo-server?

rei avatar

It’s the whoami container: https://hub.docker.com/r/containous/whoami

rei avatar

You will something like this:

Hostname: sample-app-6c66c9c587-ssknb
IP: 127.0.0.1
IP: 10.aaa.aaa.aaa
RemoteAddr: 10.bbb.bbb.bbb:51158
GET / HTTP/1.1
Host: <insert-domain-here.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.111 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9,de;q=0.8
Sec-Fetch-Dest: document
Sec-Fetch-Mode: navigate
Sec-Fetch-Site: none
Sec-Fetch-User: ?1
Upgrade-Insecure-Requests: 1
X-Amzn-Trace-Id: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
X-Forwarded-For: 10.ccc.ccc.ccc, 10.ddd.ddd.ddd
X-Forwarded-Host: <insert-domain-here.com>
X-Forwarded-Port: 443
X-Forwarded-Proto: https
X-Original-Forwarded-For: 10.ccc.ccc.ccc
X-Real-Ip: 10.ccc.ccc.ccc
X-Request-Id: <some-id>
X-Scheme: https
:--1:1
Mikael Fridh avatar
Mikael Fridh

Yeah that’s a good one, I remember that one too from way back. Thanks. Helps when you want to see how middleware, ingress or proxies behave.

2020-11-03

Roderik van der Veer avatar
Roderik van der Veer

What is the goto PVC -> S3 backup solution (helm chart) nowadays?

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

is s3 strictly the requirement? or is the requirement to backup the PV

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

The reason I ask is native snapshots is a feature (beta) of 1.17

https://kubernetes.io/blog/2019/12/09/kubernetes-1-17-feature-cis-volume-snapshot-beta/

Roderik van der Veer avatar
Roderik van der Veer

Not a requirement at all, just need a backup of the files off the cluster

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

There’s also Velero by VMware. https://github.com/vmware-tanzu/velero

vmware-tanzu/velero

Backup and migrate Kubernetes applications and their persistent volumes - vmware-tanzu/velero

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

(I haven’t tried either one)

Roderik van der Veer avatar
Roderik van der Veer

Found stash but it had some weird licensing. We deploy clusters on demand and need some “I messed up” backups

2020-11-02

Matt Gowie avatar
Matt Gowie

Anyone ever have a zombie k8s resource problem? We have a resource that we cannot delete (the delete processes, but the resource doesn’t go anywhere). Any debugging / troubleshooting tips to get at the internals of k8s to address that? More info in thread.

Matt Gowie avatar
Matt Gowie

I have an ingress resource which was deployed via Helm. The helm chart has been uninstalled, but the ingress resource is still around. It was connected to the v1 alb-ingress-controller + external-dns.

A team member has been banging her head against this for a day or so now and she hasn’t been able to delete the resource. We’ve uninstalled our ingress controller and external-dns and neither of them seem to be the culprit that is leaving it around.

When we try to destroy the ingress using kubectl, the command succeeds, but the resource isn’t actually removed.

Matt Gowie avatar
Matt Gowie

Definitely a problem specific to this single cluster as a helm uninstall of a similar chart in another cluster has no problems.

Matt Gowie avatar
Matt Gowie

Really looking for advice on how folks would investigate the internals here — Not looking for a bullet proof solution to this specific issue.

kskewes avatar
kskewes

Sometimes there is a finalizer that gets stuck, can happen for namespaces etc.

:100:1
3
1
Matt Gowie avatar
Matt Gowie

That ingress does have a finalizers entry in the metadata. How do you typically deal with that? I’ll start googling around that.

Matt Gowie avatar
Matt Gowie

Ah I delete the finalizers block and that should do the trick?

Matt Gowie avatar
Matt Gowie

Giving that a shot now.

Matt Gowie avatar
Matt Gowie

And that did the trick — Awesome. Thank you @kskewes!

1
kskewes avatar
kskewes

Oh awesome good job. Sorry was on mobile and juggling stuff.

Andrey Nazarov avatar
Andrey Nazarov

We had some troubles with finalazers as well. Couldn’t remove the resource - it was recreated automatically right after the deletion. Cool, that you’ve already found a solution.

Andrey Nazarov avatar
Andrey Nazarov

It was a third-party resource which troubled us and the solution was to patch it like:

kubectl patch crd/restoresessions.stash.appscode.com -p '{"metadata":{"finalizers":[]}}' --type=merge
Matt Gowie avatar
Matt Gowie

Yeah, I did a very similar patch. It’s impressive because I had removed the underlying CRD which was responsible for finalizing it… and yet it still was sticking around.

2020-11-01

    keyboard_arrow_up