#kubernetes (2020-11)
Archive: https://archive.sweetops.com/kubernetes/
2020-11-01
2020-11-02
Anyone ever have a zombie k8s resource problem? We have a resource that we cannot delete (the delete processes, but the resource doesn’t go anywhere). Any debugging / troubleshooting tips to get at the internals of k8s to address that? More info in thread.
I have an ingress resource which was deployed via Helm. The helm chart has been uninstalled, but the ingress resource is still around. It was connected to the v1 alb-ingress-controller + external-dns.
A team member has been banging her head against this for a day or so now and she hasn’t been able to delete the resource. We’ve uninstalled our ingress controller and external-dns and neither of them seem to be the culprit that is leaving it around.
When we try to destroy the ingress using kubectl, the command succeeds, but the resource isn’t actually removed.
Definitely a problem specific to this single cluster as a helm uninstall of a similar chart in another cluster has no problems.
Really looking for advice on how folks would investigate the internals here — Not looking for a bullet proof solution to this specific issue.
That ingress does have a finalizers
entry in the metadata. How do you typically deal with that? I’ll start googling around that.
Ah I delete the finalizers block and that should do the trick?
Giving that a shot now.
Oh awesome good job. Sorry was on mobile and juggling stuff.
We had some troubles with finalazers as well. Couldn’t remove the resource - it was recreated automatically right after the deletion. Cool, that you’ve already found a solution.
It was a third-party resource which troubled us and the solution was to patch it like:
kubectl patch crd/restoresessions.stash.appscode.com -p '{"metadata":{"finalizers":[]}}' --type=merge
Yeah, I did a very similar patch. It’s impressive because I had removed the underlying CRD which was responsible for finalizing it… and yet it still was sticking around.
2020-11-03
What is the goto PVC -> S3 backup solution (helm chart) nowadays?
is s3 strictly the requirement? or is the requirement to backup the PV
The reason I ask is native snapshots is a feature (beta) of 1.17
https://kubernetes.io/blog/2019/12/09/kubernetes-1-17-feature-cis-volume-snapshot-beta/
Not a requirement at all, just need a backup of the files off the cluster
There’s also Velero by VMware. https://github.com/vmware-tanzu/velero
Backup and migrate Kubernetes applications and their persistent volumes - vmware-tanzu/velero
(I haven’t tried either one)
Found stash but it had some weird licensing. We deploy clusters on demand and need some “I messed up” backups
2020-11-04
Qovery Engine – open-source multi-cloud deployment - https://github.com/Qovery/engine
Deploy your apps on any Cloud providers in just a few seconds - Qovery/engine
So this will deploy kubernetes in your clouds, which deploy worker nodes, which deploy containers on nodes
Deploy your apps on any Cloud providers in just a few seconds - Qovery/engine
Hello Experts.
we are running Kubernetes in AWS deployed using Kops,
for backup purpose we are using Velero with Restic integration,
I am new to Velero.
And I wanted to know under what condition Velero will take EBS snapshot and and under what condition Velero will backup using Restic repo. for PV’s.
Because we are having multiple PV’s and all the PV’s are annotated with kubectl annotate pod/<pod-name> [backup.Velero.io/backup-volumes=](http://backup.Velero.io/backup-volumes=)<pvcname>
, but some PV’s are backed-up using EBS Snapshot and some are backed up using Restic.
Thank you.
fwiw, kops is constantly snapshotting etcd and backing it up to S3
aha, you want to do PV snapshots, so ignore my comment.
Authors: Xing Yang, VMware & Xiangqian Yu, Google The Kubernetes Volume Snapshot feature is now beta in Kubernetes v1.17. It was introduced as alpha in Kubernetes v1.12, with a second alpha with breaking changes in Kubernetes v1.13. This post summarizes the changes in the beta release. What is a Volume Snapshot? Many storage systems (like Google Cloud Persistent Disks, Amazon Elastic Block Storage, and many on-premise storage systems) provide the ability to create a “snapshot” of a persistent volume.
Hey @Andriy Knysh (Cloud Posse) @Erik Osterman (Cloud Posse) — Low priority, so take your time in getting back, but I see you guys use {{event.tags.cluster_name}}
a bunch in your DD monitors.yaml. I’m not finding that variable available in the message
content for my monitor, but my metrics/events do have that tag. Did you folks have to do something specific to enable more variables in scope of that message
content? I’m struggling with that right now.
See the PR from yesterday
what Update datadog monitors In the example. refactor the monitors into separate YAML files per category why When monitor type = query alert, Datadog does not look into the event tags, so we can…
Ha got it. Thanks man.
the tags are only available when monitors types are metrics or event
for query type monitors, not available
That’s what I gathered as well, but wasn’t sure if ya’ll were enabling something I was missing. Glad I caught right after you learned the same thing.
Both of our ingresses are set up like this: ELB -> service -> pods. Does the ingress just pass the request to the service and let the service determine which node the pod will run on? I’m trying to get requests that come in to a node to pass the request to a service pod on the same node.
This is new actually, finally coming to Kubernetes; https://kubernetes.io/docs/concepts/services-networking/service-topology/
FEATURE STATE: Kubernetes v1.17 [alpha] Service Topology enables a service to route traffic based upon the Node topology of the cluster. For example, a service can specify that traffic be preferentially routed to endpoints that are on the same Node as the client, or in the same availability zone. Introduction By default, traffic sent to a ClusterIP or NodePort Service may be routed to any backend address for the Service.
That suggests that it’s not (easily) possible, currently.
Nah, it does seem a little cryptic but actually read on… https://kubernetes.io/docs/concepts/services-networking/service-topology/#prefer-node-local-zonal-then-regional-endpoints
FEATURE STATE: Kubernetes v1.17 [alpha] Service Topology enables a service to route traffic based upon the Node topology of the cluster. For example, a service can specify that traffic be preferentially routed to endpoints that are on the same Node as the client, or in the same availability zone. Introduction By default, traffic sent to a ClusterIP or NodePort Service may be routed to any backend address for the Service.
The EndpointSlice Controller: This controller maintains EndpointSlices for Services and the Pods they reference. This is controlled by the EndpointSlice feature gate. It has been enabled by default since Kubernetes 1.18.
So… 1.18 ? or maybe 1.19 is required … damn, now I’m unsure.
Kube-Proxy: When kube-proxy is configured to use EndpointSlices, it can support higher numbers of Service endpoints. This is controlled by the EndpointSliceProxying feature gate on Linux and WindowsEndpointSliceProxying on Windows. It has been enabled by default on Linux since Kubernetes 1.19
Have this on TODO, hadn’t yet tried it.
Well, this gives me good info to go on. Thanks. We’re at EKS 1.17 and I noticed that there are a few changes to get to 1.18 that I have to take a deeper look at. I’ll add this to the pile.
this seems like 1.18 is good to go
To enable service topology, enable the ServiceTopology and EndpointSlice feature gate for all Kubernetes components:
both of those are enabled default on 1.18
the second one I mentioned for kube-proxy (EndpointSliceProxying) isn’t mentioned as needed as far as I can tell.
A little more research led me to understand that, in my current config (load balancer -> service), the path to the service pod that handles the request is already optimized, since the ingress-nginx controller populates the LB with the service pod IP:PORT configs, so leaving the LB, the requests go right to the service pods on whatever node the ELB chooses. If my understanding is correct, then I don’t need to implement any additional affinity or routing. Does that sound right to yoU?
If the pods are immediately in the LB then I guess you’re fine :).
I recall way back a really nifty example kubernetes service someone built… it was a simple http server responding back with the request headers, content etc, including extra detailed info about the current pod which responded…. anyone have a clue which one it was?.. podinfo of course… my brain suddenly woke up from being teased enough
It is a simple deployment with the whois docker container
Is it called echo-server?
It’s the whoami container: https://hub.docker.com/r/containous/whoami
You will something like this:
Hostname: sample-app-6c66c9c587-ssknb
IP: 127.0.0.1
IP: 10.aaa.aaa.aaa
RemoteAddr: 10.bbb.bbb.bbb:51158
GET / HTTP/1.1
Host: <insert-domain-here.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.111 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9,de;q=0.8
Sec-Fetch-Dest: document
Sec-Fetch-Mode: navigate
Sec-Fetch-Site: none
Sec-Fetch-User: ?1
Upgrade-Insecure-Requests: 1
X-Amzn-Trace-Id: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
X-Forwarded-For: 10.ccc.ccc.ccc, 10.ddd.ddd.ddd
X-Forwarded-Host: <insert-domain-here.com>
X-Forwarded-Port: 443
X-Forwarded-Proto: https
X-Original-Forwarded-For: 10.ccc.ccc.ccc
X-Real-Ip: 10.ccc.ccc.ccc
X-Request-Id: <some-id>
X-Scheme: https
Yeah that’s a good one, I remember that one too from way back. Thanks. Helps when you want to see how middleware, ingress or proxies behave.
2020-11-05
2020-11-07
2020-11-09
Hi, I am interested in knowing how do you organize your IaaC. looking for ideas. Currently we are building our new k8s based infrastructure, thus requiring Terraform, helm, helmfiles and gitlab ci. which is a good pattern to combine all this elements? monorepo? repo with submodules? script/makefile magic? what if the helmfiles and charts repos also contain stuff for the infra and main application?
2020-11-12
crossposting as might be relevant to kube too. Hope it’s useful to someone.
Sort of related to the Docker pull rate limit recently, although that wasn’t the primary reason I did it, I gathered some info and ideas from all over and set this sucker up to make my local k3d and argocd workflow bareable…. 1 minute 45 seconds instead of 12 minutes now for tearing down and bringing back up my full experimental localhost Kube stack.
Is anyone using https://github.com/kubernetes-sigs/external-dns with a private cluster?
If so, I assume this will only work using AWS CNI? So that the DNS can resolve to the private vpc IP, which (some) other places can then route too
Configure external DNS servers (AWS Route53, Google CloudDNS and others) for Kubernetes Ingresses and Services - kubernetes-sigs/external-dns
We have used extrnal-dns
with calico too
Configure external DNS servers (AWS Route53, Google CloudDNS and others) for Kubernetes Ingresses and Services - kubernetes-sigs/external-dns
Here’s our #helmfile for it https://github.com/cloudposse/helmfiles/tree/master/releases/external-dns
Comprehensive Distribution of Helmfiles for Kubernetes - cloudposse/helmfiles
thanks - how does this work? what IPs does the DNS point to?
it’s the IPs of the Service
and a service can be of type load balancer which is internal or external
the details really depend a lot on your setup
ahh got it, thank you
We have this type of setup running. Private eks cluster with internal NLB. NLB connects to ingress-nginx which forwards to the corresponding application ingress object. Cert-manager with dns01 and external-dns for pointing domains to the NLB ALIAS record
Based mostly on cloudposse Terraform modules and helmfiles. Although the helmfiles do need tweaking
- https://github.com/cloudposse/terraform-aws-vpc-peering.git?ref=tags/0.6.0
- https://github.com/cloudposse/terraform-aws-eks-cluster.git?ref=tags/0.29.0
- https://github.com/cloudposse/terraform-aws-tfstate-backend.git?ref=tags/0.26.1
The last module solves the chicken-egg problem to map serviceAccounts with IAM service roles and the corresponding policies
Terraform module to create a peering connection between two VPCs in the same AWS account. - cloudposse/terraform-aws-vpc-peering
Terraform module for provisioning an EKS cluster. Contribute to cloudposse/terraform-aws-eks-cluster development by creating an account on GitHub.
Terraform module that provision an S3 bucket to store the terraform.tfstate
file and a DynamoDB table to lock the state file to prevent concurrent modifications and state corruption. - cloudposse…
Wow thanks a lot. Really useful
2020-11-13
Am trying cluster migration in AWS, Both k8s clusters are in same region. Cluster 1 : Deployed 2 Application with PV reclaim policy one as Delete and another as Retain, and annotated so it will take Restic backup. Cluster 2: Restored those 2 applications, worked fine.
again
Cluster 1: Deployed same 2 application with Reclaim policy as Delete and Retain but not annotated so it took snapshot when i backup.
Cluster 2: Restore did not work as PV volume is failed to attach with the following Warning FailedAttachVolume pod/<pod-name> AttachVolume.Attach failed for volume "pvc-<id>" : Error attaching EBS volume "vol-<id>" to instance "i-<instance-id>": "UnauthorizedOperation: You are not authorized to perform this operation.
So, Snapshot restore feature will work in the same AWS region or am only getting this error????
2020-11-16
Are there any advantages on placing stuff like cert-manager
, cluster-autoscaler
, external-dns
, aws-load-balancer-controller
into the kube-system
namespace, or isolate all this stuff into it’s own namespace?
Almost all tutorials and example code use as default for this services/controllers the kube-system
namespace, however there is are advantages on splitting everything into namespaces.
I think for quality of life/debugging purposes, a query via namespaces will be a tad bit faster than query via label within the kube-system. depending on the cluster size, the initial amount of pods running in kube-system can already be pretty high. for example, I like that our logging provider I can just query via namespace:cert-manager
as opposed to namespace:kube-system label:app=?? or service=??
where you possibly have to remember how each 3rd party helm chart labels their deployments etc.
This is also my thinking. I like more the idea of grouping everything into its own namespace and there is even no limit in the number .
What make think about this was the following descriptios for the cluster-autoscaler
extra argument:
extraArgs:
[...]
# If true cluster autoscaler will never delete nodes with pods from kube-system (except for DaemonSet or mirror pods)
skip-nodes-with-system-pods: "false"
Apparently the kube-system
pods are actually handled differently…
2020-11-17
Join us for AWS Container Day, a fully live, virtual day of sessions all about Amazon EKS and Kubernetes at AWS, hosted by Containers from the Couch. At this Day Zero KubeCon event, the AWS Kubernetes team will be discussing new launches, demoing products and features, covering best practices, and answering your questions live on Twitch. If you have a question before the event, send us a note at [email protected]!
https://github.com/hashicorp/terraform-provider-aws/issues/13643 anyone have any workarounds for this?
Community Note Please vote on this issue by adding a reaction to the original issue to help the community and maintainers prioritize this request Please do not leave "+1" or other comme…
@Jeremy G (Cloud Posse) what did we end up doing for this?
Community Note Please vote on this issue by adding a reaction to the original issue to help the community and maintainers prioritize this request Please do not leave "+1" or other comme…
@Erik Osterman (Cloud Posse) We have not really found a solution, but we do what we can. We manually tag everything we create, and we sort of got around this related issue by specifying a custom launch template that tags everything it can.
Community Note Please vote on this issue by adding a reaction to the original issue to help the community and maintainers prioritize this request Please do not leave "+1" or "me to…
doh, i was trying to avoid using a custom launch template but i suppose i could not be lazy and do that
thanks Jeremy and Erik
@joey If you use our terraform-aws-eks-node-group module to create your node group, it will make a launch template which propagates tags if you set the resources_to_tag
variable.
Terraform module to provision an EKS Node Group. Contribute to cloudposse/terraform-aws-eks-node-group development by creating an account on GitHub.
thank you i got aws_launch_template working
although i’m peeved that you can’t use instance_market_options
and spot
with aws_launch_template
conveniently i’m primarily using this in a one-off use case where i have to use ondemand instances for now anyways, but i wanted to keep a dev environment that was nearly identical running with spot instances
2020-11-18
Best way to manage EKS?
2020-11-19
Hi guys, how could I fix this base64 decoding? It spews out this gibberish from Jenkins’ pod :)
printf $(kubectl get secret --namespace default jenkins-160573443 -o jsonpath="{.data.jenkins-admin-password}" | base64 --decode);echo
�K��ly��jg���u�ں"�ϵ�N{߯5��#��
You can try this:
export SECRET_NAME=$(kubectl get secrets -n namespace | grep pod-name- | cut -d ' ' -f 1)
kubectl describe secrets $SECRET_NAME -n namespace
Drops bunch of tokens
and jenkins-password: 10 bytes
in bytes
Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
kubectl get secret
shows up jenkins-password
in plain text, however, Jenkins doesn’t digest password with the user
profile name :) pastebin.com/65bMbuED
Got help from Bitnami folks, turns out base64 decoding is buggy on macOS 11.0.1 - have to fix it then. For the testing purposes, I used that suggested website for decoding a password string. Thanks for your help mate :)
“Hi @organicnz, I’m not really sure what is happening there. When the chart is deployed, some notes are shown with information about the commands that you need to execute to obtain the credentials. You can obtain those notes again by executing: helm get notes YOUR_INSTALLATION_NAME You will find a command like the following one: echo Password: $(kubectl get secret –namespace default YOUR_INSTALLATION_NAME -o jsonpath=“{.data.jenkins-password}” | base64 –decode) If you want to do what the command above is doing in a more manual way, you can also get the secret and output as yaml and you will get a plain-text string that is base64-encoded. You need to decode it to get the actual password. If the implementation of your base64 utility is buggy, just for testing purposes, you can use an online tool like this one: https://emn178.github.io/online-tools/base64_decode.html Again, this is just for testing purposes, you should never share your password on the internet.”
Hey guys, how to fix this base64 decoding? It spews out this gibberish from bitnami/jenkins pod :) printf $(kubectl get secret –namespace default jenkins-1605736819 -o jsonpath="{.data.jenkin…
Base64 online decode function
Hey folks — I want to switch a project off of DataDog log management to ELK due to cost. I’m looking for the best resources to do that — Any recommendations?
I’ve asked this here lightly before and I know the Cloud Posse approach is FluentD => Firehose => ElasticSearch
. I’d like to implement something similar with FluentBit > FluentD (project is running Fargate so smaller sidecar containers + aws-for-fluentbit is attractive), but before I dive into implementing all of that I figured I should ask what’re the best resources / OSS / possible terraform modules I should pick up to accomplish this with the least amount of pain.
Make sure to evaluate the cost of ES. The amount of storage is limited by the instance size and it’s easy to get up to 500/mo for a cluster that barely stores a week of data and has HA. Then consider how many you want? You don’t want all logs going to the same cluster. Let’s say you have 4 accounts and 4 small clusters with HA. My point is it’s very easy to end up paying thousands of dollars a month for ES and the tight integration with APM and Metrics you get with Datadog is lacking. You’re also on the hook for developing and keeping it all up to date. I am curious if the cost/benefit really works out in favor of ES
Yeah, valid point. I believe I’m only going to run HA and more than a day of log retention in production environments. The client is single tenant environment per customer (enterprise customers), so really it’ll be expensive anyway we run it AFAICT.
The problem I / the client had with DataDog logs was that we have 3 clusters up right now and the logs cost alone for the month is $1K / cluster, which is pretty insane. I think anyway we do it should be cheaper than that.
I guess as long as you can make sure that indexes are granular it should make it easy to snapshot indexes to S3 buckets… and remove these afterwards… But then logs are not available for Kibana… so not sure if these will be of any use at all?
AWS has ultrawarm storage of ES now… but as far as I know you can’t snapshot your indexes while on utlrawarm - but you better to doublecheck I could be wrong on this one
I wonder if sending these directly to S3 and performing searches via Athena will make it a bit more reasonable in terms of pricing…? But I am not sure if QA and Dev teams will like it… Dear @Erik Osterman (Cloud Posse) have you had any experience with Athena being used for log queries instead of indexing in ES and having all nice features of Kibana?
2020-11-20
2020-11-21
2020-11-22
While upgrading EKS from 1.15 to 1.18 using eks module, do we have to upgrade step by step like 1.15 ==> 1.16 ==> 1.17 ==> 1.18 or I can directly modify cluster_version = "1.18"
and tf apply will do all magic?
I dont think so
What I usually do is update on console and bump terraform to that version
What is will pros/corn to use tf to build EKS and use EKS console to upgrade? Any unexpected side effect?
I’m not sure if EKS will allow to do that, but you should upgrade one minor version at a time, because control plain components are only backward-compatible with the previous minor version, so you can’t really jump two versions ahead. https://kubernetes.io/docs/setup/release/version-skew-policy/ Even if EKS allows to do that, they will for sure upgrade one minor version at a time under the hood.
Also, I would check if all of your cluster resources are compatible with the new version before upgrading.
What kind of cluster resources? How you check? Do your create new EKS or sandbox EKS to test upgrade work or not and then you do it on the main eks cluster?
Update: EKS won’t allow that.
Because Amazon EKS runs a highly available control plane, you can update only one minor version at a time. See Kubernetes Version and Version Skew Support Policy for the rationale behind this requirement. Therefore, if your current version is 1.16 and you want to upgrade to 1.18, then you must first upgrade your cluster to 1.17 and then upgrade it from 1.17 to 1.18. If you try to update directly from 1.16 to 1.18, then the update version command throws an error.
from here: https://docs.aws.amazon.com/eks/latest/userguide/update-cluster.html
When a new Kubernetes version is available in Amazon EKS, you can update your cluster to the latest version.
I have impression that EKS module will manage upgrade for us. i.e. Instead of running tf apply
for 3 times. If we declare cluster_version = "1.18"
then EKS module will do it same as 1.15 ==> 1.16 ==> 1.17 ==> 1.18
Do you think my assumption is correct?
Nope, I don’t think that it will do that for you.
You can always check source of module if there auto upgrade magic in it ( which I am mostly sure isn’t the case)
Thank you @aaratn & @Aleksandr Fofanov I think my assumption is wrong!!!
@Amit Karpe Regarding how to find resources which uses deprecated APIs before upgrading.
Instead of doing this manually, we built Kube-No-Trouble to do it for you.
Since you are jumping from 1.15 to 1.16 it’s worth checking if your cluster have any
And also there is pluto
tool too
Pluto is an open source utility to help users easily find deprecated Kubernetes API versions in the Infrastructure-as-Code repositories and Helm releases.
Pluto helped us in the 1.15 -> 1.16 migration. With Helm, this plugin helped too: https://github.com/hickeyma/helm-mapkubeapis - When you upgrade to 1.16 your running Kubernetes deployments will be transposed in-place and everything keeps running. The problem comes when you try to make a deployment (or rollback) and it fails halting that applications pipeline, which is of course, the end of the world.
This is a Helm plugin which map deprecated or removed Kubernetes APIs in a release to supported APIs - hickeyma/helm-mapkubeapis
2020-11-23
Did anybody else get hit by the EKS AMI - https://github.com/awslabs/amazon-eks-ami/releases/tag/v20201112 - that was fun
[RECALLED] AMI Release v20201112 amazon-eks-gpu-node-1.18-v20201112 amazon-eks-gpu-node-1.17-v20201112 amazon-eks-gpu-node-1.16-v20201112 amazon-eks-gpu-node-1.15-v20201112 amazon-eks-arm64-node-1…
im totally using that ami
[RECALLED] AMI Release v20201112 amazon-eks-gpu-node-1.18-v20201112 amazon-eks-gpu-node-1.17-v20201112 amazon-eks-gpu-node-1.16-v20201112 amazon-eks-gpu-node-1.15-v20201112 amazon-eks-arm64-node-1…
So, it doesn’t impact the service of applications if you run mutiple instances on different nodes… But, what you’ll see is application restarts, nodes randomly become NotReady
. If you have a fair bit of monitoring of things like replica counts, kube node status you’ll get a bit of noise. Along with perhaps, developers claiming that they should move back to Beanstalk, Fargate, Cloud Provider X as AWS should not have let a buggy AMI get out…
The latest AMI works fine: https://github.com/awslabs/amazon-eks-ami/releases/tag/v20201117
AMI Release v20201117 amazon-eks-gpu-node-1.18-v20201117 amazon-eks-gpu-node-1.17-v20201117 amazon-eks-gpu-node-1.16-v20201117 amazon-eks-gpu-node-1.15-v20201117 amazon-eks-arm64-node-1.18-v202011…
yeah i was bit by that bug but the 20201117 amis are fine though
2020-11-24
Greetings! I was wondering if anyone had any chance to play with crossplane? Would you know if it is somewhat comparable to TF cloud operator?
Manage any infrastructure your applications need directly from Kubernetes
Do you have a link to TF cloud operator?
Manage any infrastructure your applications need directly from Kubernetes
I’m thinking of looking at crossplane. It looks pretty good.
Yep here is TF cloud operator link: https://github.com/hashicorp/terraform-k8s
Terraform Cloud Operator for Kubernetes. Contribute to hashicorp/terraform-k8s development by creating an account on GitHub.
I wonder if crossplane has anything similar to TF state… haven’t had a chance to look into documentation yet
Hey all, working with creating a new cluster, basically running into an issue where basically the alb-ingress-controller can’t see any subnets on an ingress being created despite those subnets existing, any ideas?
aws-load-balancer-controller-5d96f6c4f6-vq86z controller {"level":"error","ts":1606243773.951048,"logger":"controller","msg":"Reconciler error","controller":"ingress","name":"grafana","namespace":"monitoring","error":"couldn't auto-discover subnets: unable to discover at least one subnet"}
(Note the alb-ingress-controller
is now the https://github.com/kubernetes-sigs/aws-load-balancer-controller)
A Kubernetes controller for Elastic Load Balancers - kubernetes-sigs/aws-load-balancer-controller
Yup, that’s the one I’m using ^^
I think this usually happens if the subnets aren’t tagged properly
I used that combined with this guide to set it up https://kubernetes-sigs.github.io/aws-load-balancer-controller/latest/guide/controller/installation/
What should the subnets be tagged with? I have 3 public and 3 private subnets, attached to a single vpc that the eks cluster is a part of
Here’s the terraform used to create the vpcs / subnets
module "us-prod-eks-vpc" {
source = "terraform-aws-modules/vpc/aws"
providers = { aws = aws.us }
name = "us-prod-eks-vpc"
cidr = "10.0.0.0/16"
azs = ["us-east-1a", "us-east-1b", "us-east-1c"]
private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
public_subnets = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]
enable_nat_gateway = true
enable_vpn_gateway = false
tags = {
Name = "us-prod-eks-vpc"
Environment = "prod-us"
Region = "us-east-1"
}
}
You can load balance application traffic across pods using the AWS Application Load Balancer (ALB). To learn more, see What is an Application Load Balancer? in the Application Load Balancers User Guide . You can share an ALB across multiple applications in your Kubernetes cluster using Ingress groups. In the past, you needed to use a separate ALB for each application. The controller automatically provisions AWS ALBs in response to Kubernetes Ingress objects. ALBs can be used with pods deployed to nodes or to AWS Fargate. You can deploy an ALB to public or private subnets.
I’ll take a look, thanks
Okay so I did the tags and they indeed were missing from the subnets, added those in, still results in the same error :s
public_subnet_tags = {
"kubernetes.io/role/elb" = "1"
}
private_subnet_tags = {
"kubernetes.io/role/internal-elb" = "1"
}
basically were added in
try adding this tag to your VPC
locals {
tags = { "kubernetes.io/cluster/${local.name}" = "shared"
}
}
If anyone wants to open up a PR to document this - that would probably save many others grief :-)
Hi guys - can someone hep explain to me how port-forward works under the hood. I got a bit confused when I saw that the rbac permissions required need “create” permissions:
rule {
apiGroups = [""]
resources = ["pods/portforward"]
verbs = ["get", "list", "create"]
}
Hi all, is anyone know any tools that can provide SAML authentication in Kubernetes EKS cluster It may possible using HashiCorp Boundary but I want to explore other tools…
does something like https://github.com/heptiolabs/gangway help?
An application that can be used to easily enable authentication flows via OIDC for a kubernetes cluster. - heptiolabs/gangway
its OpenID based but maybe it can help with your use case
Thanks I will check…
Kubernetes doesn’t support native SAML integration. Learn how to configure SAML single sign on (SSO) for Kubernetes clusters with user impersonation.
oh thats nice @loren
thanks foor sharing
2020-11-25
Any idea, if we corrupt the configmap aws-auth, then how to recover it? Once configmap aws-auth settings got corrupted no one can access EKS Cluster. Any workaround?
one user can still access to your cluster, it’s the user you used for creation
Thank you. We able to revert the bad changes by EKS creator user
perfect
2020-11-26
Do you guys (iptables or network policy) block the EC2 metadata api or redirect to a metadata proxy for containers to remain “sane” when providing iam roles via the “native” eks service role method?
When we were using kiam
on kops
, we would do this, but haven’t carried it over to EKS.
If it would help, I can dig up the rules we used
No need Erik. Thanks.
I think the cleanest way is a metadata proxy.
If blocking metadata completely some apps might complain if they expect or need certain metadata. However. In most cases the needs are simple and providing an explicit AWS_REGION environment variable does the trick.
I think if I was to provide some form of multi tenant cluster with EC2 like promises (which I’m not really in the business of doing right now) I might be obliged to provide some emulation of the regular EC2 metadata.
2020-11-27
2020-11-28
Hi all - what are some approaches for application config management in kubernetes? A few topics I’m interested in
• dynamic configuration (say for example configuration a feature flag in an app)
• deploying applications that share configuration (or secrets) Just looking for some projects to look into to get a feel for what people are doing
Good question for office hours
Some ideas are the various operators for managing secrets like external secrets from SSM, ASM, Vault
The Reloader project from stakater to automatically roll replica sets when secrets or config maps change
#helmfile for managing configuration of helm releases
Dynamic config: LaunchDarkly, Split, or whatever hosted provider. I saw many re-implementations fail spectacularly
Deploying apps sharing the same config: beware creating a distributed monolith! I’d vote for helmfile / ParameterStore / shared ConfigMap.
Flagger is an open source alternative for launchdarkly https://github.com/weaveworks/flagger
Progressive delivery Kubernetes operator (Canary, A/B Testing and Blue/Green deployments) - weaveworks/flagger
Alternative-ish.
Flagger does Blue/Green in all its forms (canary, A/B, mirroring). Let’s take an example with 1% seeing the new feature. That is a random 1% of requests.
With LaunchDarkly (or whatever else feature flagging tool) you get to chose the 1%: instead of random you could show the feature to users whose email ends in [mycorp.com](http://mycorp.com)
, or the marketing team, or that specific QA team that will test the thing, or users who are on the free plan, or users who opt in to seeing beta features, or the client who this feature is developed for, and so on.
You get a lot more control on risk allocation.
Thanks guys, a lot of good reading to look into
I’ll move it to #office-hours for any follow up
https://github.com/Unleash/unleash - has caught the attention of some developers I work with for feature flagging. I’d be keen to hear from anyone that might have used it? Or, share experience when I have gained it…
I used to work for the transport side of booking.com (rentalcars.com at the time) and they had an in-house A/B experiments framework in use on the frontend (horribly conditional JSPs). I think Optimizely (SaaS) is fairly popular in the front-end “which colour blue” arena.
Unleash is the open source feature toggle service. - Unleash/unleash
nice!
2020-11-29
Has me nodding along, kube already feels like legacy to me, it will be replaced by something sooner than later (by the founder of tailscale, which just came up in #random)… https://blog.dave.tf/post/new-kubernetes/
If we were to start over from first principles, what would I do differently than k8s?
This is what led me to checking out tailscale
If we were to start over from first principles, what would I do differently than k8s?
Some interesting ideas in there for sure. But don’t think that they’ll be implemented too quickly. Maybe we’ll see something competitive in a couple years.
i consider that reasonably soon
Yeah solid point