#kubernetes (2020-05)
Archive: https://archive.sweetops.com/kubernetes/
2020-05-01
Passed the CKA, whoop
thanks guys
that’s awesome! let’s hear more about it on #office-hours next week if you’re around
sure, I missed the last office hours as I was taking the exam then
2020-05-04
configuring ingress-nginx on kubernetes, but page still can’t see the endpoints - backend is working, showing this is nginx page
404 Not Found
404 is being taken always if there is domainname mismatch
though nginx ingress has a default ingress
can you check pods logs or using stern
the thing is that it was working before, I just wanted to reinstall ingress. I can ping and reach services inside cluster
404 is being taken always if there is domainname mismatch
I only see 404 error, cant reach other pages. maybe nginx ingress doesnt use ingress file or something? logs show nothing
Hello,
Does anyone have experience using Rancher to manage their EKS clusters ? What I am specifically looking for is if there’s a way for me to specify namespaces and resource quotas in a yaml etc. for my clusters and then feed that into Rancher ? I am just getting started with this tool and looks like everything happens through the UI thus far.
2020-05-05
Hello may I ask what do you use to authenticate against AWS EKS so you can use kubectl etc. Is there an alternative to https://docs.aws.amazon.com/eks/latest/userguide/add-user-role.html ?
The aws-auth ConfigMap is applied as part of the guide which provides a complete end-to-end walkthrough from creating an Amazon EKS cluster to deploying a sample Kubernetes application. It is initially created to allow your worker nodes to join your cluster, but you also use this ConfigMap to add RBAC access to IAM users and roles. If you have not launched worker nodes and applied the
for example i want to use AD or SSO or what do you use generally.
anyone here ever futz around with the Istio ingress deployment before?
Yep, check out https://github.com/RothAndrew/istio-practice/tree/master/eks
uses Gateway, VirtualService, Cert-Manager with LetsEncrypt. Would love any feedback you might have
Repo to collect the things I do to practice with Istio - RothAndrew/istio-practice
quick question then, if you deploy the operator via istioctl and use the demo profile, the ingress gateway it creates is usable?
or should you create a gateway for your published apps like you have done and have multiple gateways?
yep, for sure
it is usable
Also, each profile is just a set of defaults that get set, that you are free to override. In my deployments I have deployed demo
with half a dozen or so overrides, like enabling HTTPS and SDS
so, should there be a gateway per namespace?
It’s really up to you. You can totally get away with just one gateway for the whole cluster. That way you have a standard set of rules, like always redirecting HTTP to HTTPS for example.
The limitation that I have discovered is, that if you are using SDS with cert-manager, then each Gateway gets one-and-only-one Certificate, and the Gateway and Certificate must be in the same namespace as the ingress deployment, which is almost always istio-system
You can assign as many dnsNames
as you want in the Certificate resource
And thanks for this istio guide. I dig it for certain. With minor modifications it works brilliantly with a local kind cluster running metallb and defreitas/dns-proxy-server
Nice. Did you document any of those mods? I’d love to include that as another option. MetalLB looks awesome for LoadBalancers in other clouds such as Hetzner
https://github.com/zloeber/CICDHelper -> I just finished up my first round of making this hacky thing work
Fairly large set of scripts for crafting and working with devops tools - zloeber/CICDHelper
I do perform an istioctl operator deployment with a custom manifest generated from the demo profile to enable the ingress for prometheus, grafana, and kiali but that never seems to work properly from what I can tell.
so I gave up on it and just used your example to craft a quick helmfile to do the bookinfo deployment as an example instead
as always, thanks for the inspiration good sir.
(the istio profile example used was only tested for a kind cluster thus far, need to give k3d a whirl at some point too…)
And MetalLB is pretty nifty for testing, not entirely certain how performant it is but generically it does work
can you increase memory limit on pods w/o triggering a restart?
or daemon set
Not that I’m aware @btai That would be considered vertical pod autoscaling
start your search with that term and let me know if I’m wrong (I’d rather be corrected than make wrong assumptions :))
what does everyone else do when their ingress controller is nearing a mem limit w/o incurring downtime
and no HPA?
personally it’s easy for us to provision a new cluster w/ new limits and move traffic to that
but i was wondering for those that don’t do ephemeral clusters
is every nginx-ingress pod reaching memory limit?
what’s considered “downtime” in this situation? no request dropped?
yeah all my pods are close
isn’t it enough to ensure you have a rolling-update strategy in place and then change the limits?
that should be ~zero downtime (possibly some dropped connections)
I would almost certainly deploy ingress with autoscaling moving forward
@Erik Osterman (Cloud Posse) you would think but we have had alot of connection errors doing tat in the past. will need to add some telemetry around it, very possible not at fault of the ingress controller
i don’t doubt there would be some connection errors, but I don’t consider that downtime
that’s failover working the way it should
you can also throttle the rate of the roll out
2020-05-06
https://cloudposse.com/devops/pod-disruption-budget-gotchas/ by @Jeremy G (Cloud Posse)
EKS users:
I’m curious how others handle deploying things like an nginx config for a reverse proxy to a third-party endpoint for something like email click tracking or subdomain redirects. Do you just add it to the ingress-nginx config? Deploy a separate nginx pod with its own config file? Something else? Any input?
What are your concerns? … e.g. why would email click tracking need to be handled differently.
e.g. architecturally, i’ve seen this implemented with lambdas + kinesis, but that might not need to be an optimization most need
These are configs for things that aren’t tied to a specific app of ours, so it’s just a little unclear where to put them.
With that in mind I’m just thinking about what would be easiest to maintain and add new stuff to as needed.
2020-05-07
Does anyone have input or a link to what could be a checklist of tasks for setting up a basic EKS/k8s cluster? I’m generally bad at abstracting ideas away until I’m knee deep in them, I can only think of general tasks like 1) build nodes 2) user auth 3) logging 4) metrics collection
^^ This is with the presumption the vpc/infra is all setup
^^ Pretty much hah
assuming that you provisioned not only EKS cluster, but all the IAM roles and SSO stuff for humans (to be able to access the cluster) and for apps (EKS service account IAM roles), then we usually deploy these k8s releases:
- `external-dns`
- `nginx-ingress`
- `cert-manager`
- `reloader`
- `metrics-server`
- `kubernetes-dashboard`
- `efs-provisioner`
- `aws-secret-operator`
- `prometheus stuff
`
Anyone have experience exposing status.hostIP
to a third-party application to hit a DaemonSet that does not natively support referencing environment variables with Downward API (use-case: Datadog Agent and Kong)? https://github.com/kubernetes/kubernetes/issues/74265 asked for this to be revisited but in the meantime…
What would you like to be added Currently, HostAliases provide a way to inject entries into /etc/hosts files inside Pods. While this is useful for previously-known static IPs, there are times when …
like
env:
- name: DD_AGENT_HOST
valueFrom:
fieldRef:
fieldPath: status.hostIP
in your container env?
What would you like to be added Currently, HostAliases provide a way to inject entries into /etc/hosts files inside Pods. While this is useful for previously-known static IPs, there are times when …
Right but Kong does not support referencing the Downward API (that declaration) so looking for an alternative solution outside of forking the supported Helm Chart for DD agent or the codebase for Kong
ah dang, sorry, i am of no immediate use here
no worries, thanks for helping!
2020-05-08
A new blog I dropped recently about screwing around with Istio on k3d and kind: https://zacharyloeber.com/2020/05/the-istio-rabbithole/
The Istio Rabbithole - Zachary Loeber’s Personal Site
@roth.andy
The Istio Rabbithole - Zachary Loeber’s Personal Site
2020-05-09
Hey guys, devops noob here…
I followed this tutorial to dockerize a react app https://mherman.org/blog/dockerizing-a-react-app/
and it turns out that -it
(interactive mode) is required for docker run
commands
Now I’m trying to follow this tutorial to deploy via Kubernetes dashboard https://www.youtube.com/watch?time_continue=279&v=je5WRKxOkWQ
My app keeps crashing with the same error that occurs if I don’t add -it
to docker run
So does anyone here know how I could add -it
when deploying via Kubernetes dashboard?
Let’s look at how to Dockerize a React app.
is the app a server app or browser app?
Let’s look at how to Dockerize a React app.
if a Docker container starts and the process exits right away, you would see the same behavior you described
for example, for a server app, you start an HTTP listener which prevents the app from exiting
-it
mode starts an interactive session with the container, so even if the app exits, the container is being kept alive
I believe it’s a server app @Andriy Knysh (Cloud Posse)
I’ll try to find the error hold on
> [email protected] start /app
> react-scripts start
ℹ 「wds」: Project is running at <http://172.17.0.2/>
ℹ 「wds」: webpack output is served from
ℹ 「wds」: Content not from webpack is served from /app/public
ℹ 「wds」: 404s will fallback to /
Starting the development server...
Then the app never launches, vs if I do -it
in the docker run command it launches the app accordingly.
but I can’t find a way to specify -it when running the pod through the kubernetes dash board @Andriy Knysh (Cloud Posse)
you don’t need to specify that
something is wrong with the app when it runs in a container
a Node app should work in a container, and you should be able to access it from your local computer via the port binding (should be able to open a browser on the host port and see the app)
did you test that?
Yeah
It’s literally just the default create-react-app app
npm start
opens it in localhost:3000
did you test it in a docker container on your local computer?
for example, using Docker compose like this:
version: '3.1'
services:
"app":
image: app
build: .
expose:
- "3000"
ports:
- "3000:3000"
volumes:
- "./:/usr/src/app"
yeah I have a docker compose file taken from the tutorial in my first link
and when you run the composition, you can see your app on localhost:3000 (or 3001 in your case)?
docker-compose up -d --build
works fine for me
I’ll see what port hold on
or come to think of it, it just builds another image
which has the same effect of only working when docker run
has the -it
command after
my docker compose file:
version: '3.7'
services:
sample:
container_name: sample
build:
context: .
dockerfile: Dockerfile
volumes:
- '.:/app'
- '/app/node_modules'
ports:
- 3001:3000
environment:
- CHOKIDAR_USEPOLLING=true
stdin_open: true
tty: true
@Andriy Knysh (Cloud Posse)
Josephs-MacBook-Air:web josephbennett$ docker-compose up -d --build
Building sample
Step 1/9 : FROM node:13.12.0-alpine
---> 483343d6c5f5
Step 2/9 : WORKDIR /app
---> Using cache
---> 961768ca865e
Step 3/9 : ENV PATH /app/node_modules/.bin:$PATH
---> Using cache
---> 8c67f044ee11
Step 4/9 : COPY package.json ./
---> Using cache
---> 38c4fc32e5b5
Step 5/9 : COPY package-lock.json ./
---> Using cache
---> 1e03505a795d
Step 6/9 : RUN npm install --silent
---> Using cache
---> 4c69c439a90f
Step 7/9 : RUN npm install [email protected] -g --silent
---> Using cache
---> 8de2388a9ba5
Step 8/9 : COPY . ./
---> Using cache
---> 2cc9fce978f4
Step 9/9 : CMD ["npm", "start"]
---> Using cache
---> 1b726fc6c6ce
Successfully built 1b726fc6c6ce
Successfully tagged web_sample:latest
my-react-app is up-to-date
Josephs-MacBook-Air:web josephbennett$ docker run web_sample:latest
> [email protected] start /app
> react-scripts start
ℹ 「wds」: Project is running at <http://172.17.0.2/>
ℹ 「wds」: webpack output is served from
ℹ 「wds」: Content not from webpack is served from /app/public
ℹ 「wds」: 404s will fallback to /
Starting the development server...
Josephs-MacBook-Air:web josephbennett$ docker run -it web_sample:latest
^ it works only after that last line
& that’s what I can’t seem to replicate in the Kubernetes console, just figuring out how to add the -it
thanks for the help btw
docker-compose up
should build the image and start the container https://docs.docker.com/compose/gettingstarted/ https://stackoverflow.com/questions/36249744/interactive-shell-using-docker-compose
On this page you build a simple Python web application running on Docker Compose. The application uses the Flask framework and maintains a hit counter in Redis. While the sample…
in your example, it just builds the image and exits
(that’s why you have to run docker run
after)
so something is wrong with your app or docker compose
try to remove
stdin_open: true
tty: true
and run again
you should be able to run docker-compose up
(it should build the image and start the container, with the -d
option it should start the container and exit, but you should be able to see the container running) and then open a browser and see your app running on localhost:3001
only after that you need to think about deploying to Kubernetes (it has nothing to do with docker -it
arguments)
oohhh gotcha, thanks
so I guess that there’s something up with create-react-app’s default application that doesn’t allow it to be dockerized
@Joey The documentation you posted https://mherman.org/blog/dockerizing-a-react-app/ explains at “What’s happening here?” #2
-it starts the container in interactive mode. Why is this necessary? As of version 3.4.1, react-scripts exits after start-up (unless CI mode is specified) which will cause the container to exit. Thus the need for interactive mode.
You will need to enable the ci mode and then it will work without -it, you can do that by setting the env variable CI to true.
Let’s look at how to Dockerize a React app.
Thanks! I’ll try it when I get the chance. Sounds promising
Let’s look at how to Dockerize a React app.
thanks @maarten for finding the reason why the app exits after start
@Joey you should put the app in the CI mode, then run docker-compose up -d
and then see the app running in the browser at localhost:3001
simply just adding CI=true
before npm start
nice catch @maarten!
2020-05-10
2020-05-11
Hey guys, do you know if it’s possible to specify the DNS whenever you create a new service in Kubernetes? If not, then what’s a stable way to stay routed if your service fails?
e.g. let’s say we create a service with the DNS http://abc123-456us-east-2.elb.amazonaws.com<i class="em em-3000|abc123-456us-east-2.elb.amazonaws.com"</i>3000>, and we have our URL www.mysite.com pointing to it… Then we create a backup service with the DNS http://def456-789us-east-2.elb.amazonaws.com<i class="em em-3000|def456-789us-east-2.elb.amazonaws.com"</i>3000>. Now we have to take the time to go on godaddy or wherever, to direct www.mysite.com to point to our backup DNS…
Sorry if this is a super noob question, but is there something that I can do in case our service fails? Or should I just assume that the service will live on?
use the external-dns
controller with route53
move your dns to something like route53 for a start
then you can use something like https://github.com/kubernetes-sigs/external-dns
Configure external DNS servers (AWS Route53, Google CloudDNS and others) for Kubernetes Ingresses and Services - kubernetes-sigs/external-dns
and then you can do even more fancy stuff like using weighted records and associating healthchecks with cname’s, because you probably want to have www.mysite.com CNAME to something like point to primariy.cluster.mysite.com weight 100 and backup.cluster.mysite.com weight something else each with their independent healthchecks and maybe you want to delegate only cluster.mysite.com to route 53 instead of your whole domain
i probably wouldn’t want to point my mysite.com cnames directly to my clb/alb/nlb’s because they’re probably going to get clobbered at some point and the beauty of alb-ingress-controller and external-dns is that’ll all just magically work
thanks for all the suggestions guys, I’ll do the fancy stuff later. I’m googling what a route 53 is and how to move my DNS to it thanks!
i’m having trouble keeping my joey’s straight here
2020-05-12
Interesting Project: https://github.com/spotahome/service-level-operator
Manage application’s SLI and SLO’s easily with the application lifecycle inside a Kubernetes cluster - spotahome/service-level-operator
This is a cool project. We’ve deployed it and it works well.
Manage application’s SLI and SLO’s easily with the application lifecycle inside a Kubernetes cluster - spotahome/service-level-operator
We didn’t end up doing much with it yet, but have a PR open for it here: https://github.com/cloudposse/helmfiles/pull/186
what [service-level-operator] Helmfile added why Helmfile for service-level-operator chart
Has anyone done feature branch deployments within AWS EKS? I’m planning on doing this, but I see a potential issue with AWS ALB route limits.
yes, we do this on EKS, but we still use nginx-ingress
do you need to use the ALB with other AWS services?
If not, why not just use nginx-ingress with NLBs
Use NLB with random ports?
to use NLB with nginx-ingress
, you just need to set 1 annotation. It’s a very quick win.
No need to worry about random ports. Or are you talking about some application requiremnet you have to use random ports?
I will have a deep look into that option.
you could use nginx-ingress as a reverse proxy to your application ports.
Want to create nodegroup via eksctl create nodegroup command, got error
Error: timed out (after 25m0s) waiting for at least 1 nodes to join the cluster and become ready in "standard-workers"
any ideas? It didnt appear in AWS EKS panel Error: timed out (after 25m0s) waiting for at least 1 nodes to join the cluster and become ready in “
Amazon EKS node drainer with AWS Lambda. Contribute to awslabs/amazon-eks-serverless-drainer development by creating an account on GitHub.
Does anyone here know of an easy way for me to pull from a private docker repo, via dashboard?
I keep getting permission denied and idk how to log in to docker on my kubernetes dashboard
2020-05-13
nvm figured it out! kubectl create secret docker-registry regcred --docker-server=[docker.io/](http://docker.io/)<user>/<image> --docker-username=<your-username> --docker-password=<your-pword> --docker-email=<your-email>
then in the deployment’s .yaml:
imagePullSecrets:
- name: regcred
Define Kubernetes apps and components using familiar languages
the vicious cycle of switching from writing code to configuration language back to writing code
Define Kubernetes apps and components using familiar languages
ya, seriously
jenkins feels like an interesting case study related to this:
- first there were freestyle jobs that everyone created with clickops
- then came the Jenkinsfile littered with imperative groovy script
- people loved it. wrote “pipelines as applications” so complicated that no one knew how they worked and they couldn’t be tested. people revolted.
- then jenkins came out with declarative pipelines. people loved it. they converted all their groovy pipelines to the declarative style.
- then came gocd, argocd, flux, spinnaker, etc. everyone started to hate the declarative jenkins pipelines.
- yadda yadda yadda
in infrastructure as code. first we had things like boto (python) and wrote infrastructure as code in a pure language. that became untennable. so we all gathered around terraform and cloudformation based on the lessons learned. then that became limiting. so pulumi came out to show that we really need is to return to pure programming of infrastructure as code. aws followed suit with CDK.
everything comes full circle.
Hey guys, I’m trying to make my web page only viewable by me. So I’m modifying the ELBs security group….
This works:
Type - All Traffic
Protocol - All
Port range - All
Destination - 0.0.0.0/0
This doesn’t work:
Type - All Traffic
Protocol - All
Port range - All
Destination - <my ip address>/32
& I don’t know why. This ELB was generated from Kubernetes, so is there anything that I can do in that application?
i suspect you’re probably more interested in the ‘source’ of the traffic rather than the destination
@joey the inbound rules don’t seem to be making a difference
I can still access the page after I delete them
you can access the page from something other than your ip?
actually… it’s buffering now I think there was just a delay
oh man @joeys for the win!!!
yeah it definitely makes a difference changing the inbound
working on my computer but not my phone now (as expected)
Do you know if there’s a way to add other people’s IP addresses?
Or would they just have to log into the console themselves?
destination is for your server connecting to other stuff, e.g. twitter api or onlyfans.com api
source is for anything coming IN to your server
gotcha
once a source-based tcp connection is established, your server can return data to it
makes sense
to add other people.. well.. depends on how creative you want to get
on second thought I think it’s totally doable just to type it in manually
then add /32
about to try with my phone
yes
easiest way is just get their ip and add the /32 (cidr for ipv4 single ip address) for their ip’s yourself
boom! It works!!!
beyond that, google has your answers
Thanks man! I really appreciate it
alright
you could get all sorts of creative and create an api with lambda that has access to update the security group when some criteria is met on your page that someone updates or something crazy stupid like that
good stuff to think about for the future
my project is incredibly rushed though lol
it really depends on what you’re trying to do, and thus no one here can necessarily tell you what you need to do in those cases
2020-05-14
I’m starting to think of moving my existing Helm charts to Terraform. Reasons:
- Helm badly designed, e.g. if an initial deployment fails, none after will succeed, while Terraform for starters knows its job is to apply locally described desired state to a remote stateful API.
- It’s easier to onboard people if they only need one tool.
- Terraform displays diffs of what it wants to do before doing that.
- Terraform allows to inspect and manage its state.
- Terraform does not force us to template with the worst templating language I’ve ever seen, gotmpl. Resources can be defined in its language.
- Terraform has escape hatches, like provisioners.
- Terraform has modules, which work a bit like very cumbersome macros, while gotmpl has subtemplates, which work like macros with broken parameters/global environment (pick 1).
- Terraform can create Helm releases better than Helm can do (Helm cannot pass a templated value to a sub-chart that doesn’t expect that value to be templated)
- Terraform can use remote state to source values from different state.
- I already render some values files with Terraform, then copy them to the helm charts repo. Those files contain multiple copies of the same value in many places because Helm Cannot Template Values.
My idea for deploying temporary releases from the CI is to use independent S3 objects per state.
Have you looked at https://github.com/roboll/helmfile ? (and here is our collection of them https://github.com/cloudposse/helmfiles/tree/master/releases)
Deploy Kubernetes Helm Charts. Contribute to roboll/helmfile development by creating an account on GitHub.
it solves a lot of issues from your list (but not all, and not “with the worst templating language I’ve ever seen” )
but I agree with your points that having just one tool (Terraform, and it’s a very good tool) offers many advantages
i kind of expect in the best case it will be “i had 8 problems with 1 tool” to “i have 5 problems with 2 tools”
I’ve actually deployed full team projects using pure terraform before
Honestly, there was a certain elegance to using metadata for implicit dependency chaining that was hard to deny.
But the devs hated HCL and almost immediately demanded some additional simplistic yaml formatting/pipelines for their dev environment to deploy configmap changes from
My use case was to deploy the cluster with workload in a single pipeline to facilitate an ephemeral cluster deployment
almost everything I read about doing such a thing told me that the cluster deployment and workload deployment to the cluster should be separate states (I ignored all the warnings and it worked fine for me though…)
Where I found that helm excelled was the post deployment tests I was able to run. Both tf and helm can deploy a workload that never actually works and be considered successful. I suppose that post-deployment e2e testing functionality could easily be done any number of other ways. I know that it is not baked into terraform the way it is into a purpose built Kubernetes tool is.
I’m having a hard time understanding your first statement though. ‘…if an initial deployment fails, none after will succeed..’
usually that is what you would want in your pipeline right?
perhaps you are overcomplicating your charts?
for instance, I almost never use subcharts to reduce overall complexity and downstream issues.
helm also can do diffs
and it stores state in secrets within kubernetes then does three way merging. I don’t think that terraform remote state and helm three-way merging is a like-for-like comparison. Remote state is offset by the fact that it is an additional outside dependency
I’ve already shared this in another workspace but: My biggest pain-point with the TF K8s provider is dealing with any project with CRDs (Istio/Spinnaker et al). To work around that I had to use the Helm TF provider which is excellent! I am still struggling with tools that come packed with their own CLIs and I hope the community shifts to embrace operators ASAP
Excellent point, any CRDs inherently are not supported in the terraform kubernetes provider. You only get common kubernetes resources to work with.
I’m having a hard time understanding your first statement though. ‘…if an initial deployment fails, none after will succeed..’
The reality looks like:
- A developer presses the deploy button in the CI to deploy their branch
- Release fails but the app runs
- They don’t look and keep deploying when needed
- Nothing happens
But the devs hated HCL and almost immediately demanded some additional simplistic yaml formatting/pipelines for their dev environment to deploy configmap changes from
I also hate HCL but to prefer YAML over HCL is some serious stockholm syndrome.
If you add a helm test to your chart and subsequent deployment pipeline code you should be able to capture some of the false positives (if I read things right)
I ran into that issue at least. Deployments worked just fine but there were actual underlying container or service issues so no one was the wiser when some component wouldn’t be working. The helm test can be a simple pod that wget’s the service endpoint (or similar)
helm tests are an option but i may as well turn them into monitors once i have them
I’d say both are important
my idea is that if someone goes so far as to write a helm test (i had one.. once), it can as well be a monitor cronjob running every minute, sending metrics
I’d put the test directly in a pipeline for validation of deployment success and if a cronjob were to be used as the monitoring solution (not what I’d personally recommend) I’d make that part of the chart itself.
otherwise, including a prometheus rule as part of the chart to trigger alerts would be more holistic I’d think.
@Karoline Pauls, You always spark super interesting conversations btw
Do I? I thought I was just one person pissed at Jenkins.
what better reason to innovate better solutions than being riled up at the current ones?
regarding the cronjob, in my Salt days I would sometimes write custom Datadog healthchecks (in Python) that would run on instances, hence the idea to have an analogue on kubernetes.
The issue is, as I’m trying to get Jenkins to do what I need (currently a manual deploy step with a toggle for online/offline migrations, dev/prod, etc), it is burning me out. My external ALB ingress controller experiment saw no commits in a month.
I never got to actually deploy custom healthchecks in datadog (it seems to require a local agent which I didn’t have available to me in the last environment I worked in). I used terraform to setup metrics based alerts instead though. There are also some datadog integration project out there for auto-creating some alerts for kube based deployments I thought (https://github.com/fairwindsops/astro)
Emit Datadog monitors based on Kubernetes state. Contribute to FairwindsOps/astro development by creating an account on GitHub.
not what you are likely looking for though, sorry
Does anyone else find immutable pod templates in a statefulset a pain? (i know now of –cascade=false)
2020-05-15
Anybody have experience with pods that are crashing due to heap out of memory, and had to try to capture the core dump?
I was thinking to change the location of the core dump to write to a persistent volume, but running into an issue because /proc/sys/kernel/core_pattern is read-only and I don’t know why. I’m running on EKS.
Locally I can edit that file only if i run docker with the “–privileged” flag, so my question is how do I do the equivalent with deployments/pods in EKS
It seems that we are in the middle of a mini acquisition spree for Kubernetes startups, specifically those that can help with Kubernetes security. In the latest development, Venafi, a vendor of certificate and key management for machine-to-machine connections, is acquiring Jetstack, a U.K. startup …
2020-05-17
Hi all! What do you advise me, guys, for rotating logs that are being stored in hostPath directory? After pod is terminated, logs retain on a node and will be removed only after node is terminated. Currently we are not able to change log streams to stdout, but we are working on it.
Kubernetes handles the log rotation automatically
There are a few tunable parameters for this
Note, that containers started outside of kubernetes (e.g. docker in docker with jenkins) are not automatically log rotated
Thanks. Yes, thas is correct when we talk about containers that are able to redirect logs to stdout/stderror. But That application is not able to send logs to stdout/stderror. That application stores its logs on the container filesystem. Just writes into files /var/log/php/application_errors.log and /var/log/php/application_events.log
So, I’ve mounted hostPath volume from node to each container to directory /var/log/php
Now, I can scrape log using FluentD, that was deployed to my k8s cluster (DaemonSet). And everything is OK but the files are growing each day and should be rotated. It was what I asked about Legacy application is pain
Sounds like you need to setup logrotate jobs on the nodes.
Logrotate is a system utility that manages the automatic rotation and compression of log files. If log files were not rotated, compressed, and periodically pruned, they would eventually consume all available disk space on a system. In this article, we
But you told the application to log to /var/log/php/application_errors.log
, right? In that case you can tell it to log to /dev/stdout
or /dev/stderr
that way it will play nicely with the logging platform
you can also ln -s /var/log/php/application_errors.log /dev/stderr
inside the Dockerfile
I’ve done that exact thing before and it works well. Completely forgot about it earlier.
man 90% of keeping up with modern tech is remembering what to forget and trying not to forget what to remember
Great! Thank you, guys! Will try it today and get back with additional questions
We had the chance to see quite a bit of clusters in our years of experience with kubernetes (both managed and unmanaged - on GCP, AWS and Azure), and we see some mistakes being repeated. No shame in that, we’ve done most of these too! I’ll try to show the ones we see very often and talk a bit about how to fix them.
Excellent list, highly recommended to run through this if you are just getting into kubernetes for your workloads.
We had the chance to see quite a bit of clusters in our years of experience with kubernetes (both managed and unmanaged - on GCP, AWS and Azure), and we see some mistakes being repeated. No shame in that, we’ve done most of these too! I’ll try to show the ones we see very often and talk a bit about how to fix them.
2020-05-18
Hi all, does anyone know what could lead to podinitializing state even after increasing storage, more cpu and memory?
Have a look at the events when you describe the pod, in a case I’ve dealt with in the past it was that it couldn’t allocate an IP
2020-05-19
Hey Guys Need help regarding traffic load i have websites running on K8 which are sql oriented , need to test how they will behaves in heavy user load and how my cluster handle that(need to upgrade node or not) how should i implement it and what tools should i use
you should write a script to send your cluster a bunch of traffic or use something like ab or gatling or vegeta or any number of load testing tools
this sucks..
rpc error: code = Unknown desc = Error response from daemon: Get <https://quay.io/v2/pusher/oauth2_proxy/manifests/v4.0.0>: received unexpected HTTP status: 500 Internal Server Error
How are you all managing external docker images ? Are you putting them in your own registry?
They’re having an outage: https://status.quay.io/
Welcome to Quay.io’s home for real-time and historical data on system performance.
Consider implementing this https://docs.docker.com/registry/recipes/mirror/
Use-case If you have multiple instances of Docker running in your environment, such as multiple physical or virtual machines all running Docker, each daemon goes out to the internet and…
yea I know that they have an outage I am just wondering how people mitigate it in general
so for me I am removing a node every hour from our dev cluster and adding a new one just to see how it would feel to have short lived nodes and now quay was down (500’s) and the node could get the load up and running
so it is my fault but was just curious to hear how people handle docker images are they all hosting them themselves or would they be affected by this as well
The world’s only repository manager with FREE support for popular formats.
this outage has been going on for awhile now https://status.quay.io/incidents/kw2627bsdwd9
yeah… ouch.
2020-05-21
I converted localstack to run in kubernetes for locally testing out AWS scripts on kind clusters. Example includes the use of kompose, helmfile, the raw helm chart, and my own little framework for stitching it all together. https://zacharyloeber.com/2020/05/aws-testing-with-localstack-on-kubernetes/
Aws Testing With Localstack on Kubernetes - Zachary Loeber’s Personal Site
wow, rad idea
Super cool stuff!
2020-05-24
well that’s useful. kind of like argocd or flux?
ya, it seems like it
sometimes I swear that the matrix is glitching on me, I’m deep diving into argocd right now
If I go outside for a walk and see 3 instances where someone is walking a pet cat or iguana or something I’ll know I’m in the matrix…
you happen to find direct links to the CRDs for this thing?
nm, just used my gcloud account to grab the files
Haven’t looked past the news article yet
2020-05-26
Hey k8s folks — Is there a defined best place to get spun up on k8s? I’ve put off ramping up on the topic for a while, but I’d like to dive in while I have some downtime from clients. I’m sure some folks here have good resources or strong opinions on where to head for info.
Hrm… so I think what you’ll find is there is a wide gap between how it’s done on AWS vs Azure vs GKE vs bare metal, etc
Did you have one in mind?
Huh interesting. Yeah, I’d say I’d probably focus on AWS.
Your suggestion would be to tailor learning towards k8s on EKS?
I think there are two sides of it. On the one side, you need to learn k8s the platform (how to run stuff on k8s). that will be the most similar across cloud providers (but not identical).
@joey — Good stuff, I’ll definitely check those out. Thanks for sharing.
Then there’s operating k8s as a platform. that’s where you need to pick one cloud provider and kick the tires.
this is where you get the operational experience.
Makes sense.
and if you’re going to choose a specific cloud provider and do it like that, you might as well be using terraform and/or terragrunt
then I think the “Best Practice” will be determined if you want to go the AWS-native approach with cloud formation, or use the eksctl
. Or if you want to use #terraform.
to use k8s effectively, you’ll inevitably need to provision a bunch of stuff that isn’t handled by EKS (e.g. IAM roles). so using terraform is advisable.
Yeah, I’m already very bought in on Terraform so that would be my approach for sure.
@Erik Osterman (Cloud Posse) Do you have a resource you’d suggest for the “you need to learn k8s the platform (how to run stuff on k8s)” side of things? I think that’s really what I’m looking for.
If you are on a linux platform you can look at using libvirt+terraform to get running pretty quickly with your own local cluster. https://github.com/zloeber/k8s-lab-terraform-libvirt
A Kubernetes lab environment using terraform and libvirt - zloeber/k8s-lab-terraform-libvirt
for getting comfy with kube deployments and getting around I’d just start up a kind,k3d,minikube, or microk8s local cluster and start looking to deploy things to it that you might find yourself deploying for work.
a sufficiently complex app that you could start with and feasibly could be in several types of environments might be airflow
You don’t need cloud resources to dive into the deep end pretty quickly with k8s
@Zachary Loeber Good stuff, I’ll check those out and keep that in mind. Thanks for sharing!
let us know how it goes, glad you are diving deeper into kube, take a deep swig of the koolaide….whatever they laced the kube-koolaide with is addicting
If I get this demo working I’ll be using the new Kubernetes provider for Terraform during my keynote at the Crossplane Community Day virtual event. https://www.eventbrite.com/e/crossplane-community-day-tickets-104465284478 https://twitter.com/mitchellh/status/1265414263281029120
2020-05-27
Can I get a dummy check on my plan for deploying kiam to my K8s cluster?
- Add an additional node pool of small instances to each cluster. 1-2 instances is really all that is needed
- Apply the Instance Profile to the new node pool
- Apply a Taint to the new nodes that tells the cluster not to schedule any pods to them
- Deploy the kiam server to the new nodes using a Toleration
- Deploy the kiam agent to all nodes
- Annotate namespaces that are allowed to use IAM with the
[iam.amazonaws.com/permitted](http://iam.amazonaws.com/permitted): <regex that matches allowed roles>
annotation - Annotate pods inside the permitted namespaces with
[iam.amazonaws.com/role](http://iam.amazonaws.com/role): <role name>
Yup that looks good
As an extra safety precaution you can use iptables to firewall direct access to the real metadata api
Also implicitly, you’ll need to provision the IAM roles that you will need
those are provided. In the account this will be in I don’t have permission to manage IAM
as of 3.5 on kiam you can use service account iam roles instead of instance profiles: https://github.com/uswitch/kiam/blob/master/CHANGELOG.md#v35
Integrate AWS IAM with Kubernetes. Contribute to uswitch/kiam development by creating an account on GitHub.
Not using EKS
allegedly you can do it without eks: https://github.com/aws/amazon-eks-pod-identity-webhook/
i’ve not seen or deployed this tho
Amazon EKS Pod Identity Webhook. Contribute to aws/amazon-eks-pod-identity-webhook development by creating an account on GitHub.
Interesting
i’ve got kiam running on service accounts and it works pretty well - i only did it that way because .net core credential provider in the aws sdk didn’t support webidenties for a while so we couldn’t go full service account roles.
i’m probably going to rip it out soon, as the .net sdk supports it now.
but it’s been solid enough
2020-05-28
Are there any projects out there that make it easy to report on all of the cached images in a cluster, pull them into some other registry, then apply a mutating admission webhook to rewrite the container registry source when the deployments get applied to point to the new image source?
Something far easier that I’ve seen done is to just add a Validating Admission Controller to make sure containers come from the registry you want.
But your description is obviously much more elegant than that
That is part of the preventative portion of a holistic solution for certain
I suppose the tool I’m thinking of would be for migration of outside dependencies to local registries only
this would be pretty cool, would love to hear if you find something before i set off on a path to try to do something related
Just make sure the juice is worth the squeeze. Unless you have compliance/regulatory requirements, something like that will add a huge operational headache
I guess that is not ‘easy’ at all but there should be such a project if not to help avoid things like this
I’d specifically target things like cert-manager, kafka, or any other vendor(ish) images that would typically go through a review and testing process before simply upgrading them (so core services in a deployment, not developer workloads that might get updated multiple times a day)
I have peeked over kube-fledged and it seems to be on the right path towards something like this https://github.com/senthilrch/kube-fledged
A kubernetes add-on for creating and managing a cache of container images directly on the cluster worker nodes, so application pods start almost instantly - senthilrch/kube-fledged
@btai @Pierre Humberdroz
A kubernetes add-on for creating and managing a cache of container images directly on the cluster worker nodes, so application pods start almost instantly - senthilrch/kube-fledged
Oh awesome !
It may be useful for figuring out how to eliminate core service outside dependencies
outside deps == evil
this looks awesome
reminds me a little bit of that uber project too
Three devops holy war creeds; 1. latest tag is evil, 2. outside dependencies are our enemy; 3. incremental improvement of all things….
kube-fledged use case #4: If a cluster administrator or operator needs to roll-out upgrades to an application and wants to verify before-hand if the new images can be pulled successfully.
cool beans
I was wrong about the chart not being available for this: https://github.com/senthilrch/kube-fledged/tree/master/deploy/kubefledged-operator/helm-charts/kubefledged
A kubernetes add-on for creating and managing a cache of container images directly on the cluster worker nodes, so application pods start almost instantly - senthilrch/kube-fledged
the name for “that uber project” is kraken https://github.com/uber/kraken
P2P Docker registry capable of distributing TBs of data in seconds - uber/kraken
I ran across that one as well, the name certainly is apropos…
(too bad there isn’t a simple helm chart deployment for the thing, it is either straight yaml or the flux helm-operator it seems….)
In fact, that is the only project I’ve seen that allows for any direct image cache manipulation within a cluster
wonder what they’re doing at quay to be causing this many major outages
quay acquired by coreos
coreos acquired by redhat
coreos (the os) now EOL
quay the registry? not sure, but seeing how acquisitions go - the project leads are probably all gone. the project itself open sourced. who knows? maybe it’s on life support?
(this is all rampant, unqualified speculation. I have no inside knowledge of what’s going on.)
#codefresh decide to deprecate their registry
I imagine they would eat bandwidth and cloud storage space like no tomorrow
I’ve seen builds based on upstream public images as the base image before and cried on the inside (then promptly eliminated those outside deps…)
i feel like many of the major kubernetes tools have been hosting their images on quay.