hi all, what are you opinion on different networking option in kubernetes on aws. Which is more preferred and felt robust. We did tried with aws-vpc-cni but felt that its not stable enough even with 1.1.0 for kuberntes 1.10.6. This becomes more unstable when all your worker nodes are unstable and started giving exception as sandox ip changed etc..
we then switched to calico, but somehow its observed that its impacting the way pods terminate. If we delete a deployment, pods remain in terminating state for 5+ minutes.
The pods stuck in a terminating state is a very frequently observed problem. Could it be related to the network layer? Maybe - but I would explore other possibilities. To me the network culprit seems like a red herring.
Lots of posts/issues on it. Usually related to zombies.
@rohit.verma we saw something like that with some of k8s pods, in particular
kiam- when deleted, the pods take many minutes to terminate
so maybe it’s an issue with some deployments, not the network itself?
But the pods I am referring here is generic like nginx or spring boot app
Anyways more concerned about a general opinion on different kubernetes networks
We haven’t had the opportunity to explore/optimize the network layer in k8s
Also are you familiar with the dumb-init “fix” ?
This is to address the same symptoms
A minimal init system for Linux containers. Contribute to Yelp/dumb-init development by creating an account on GitHub.
@Andriy Knysh (Cloud Posse) hello again, do you have any doco’s and best practices for promoting kube, within nonp? … ie … dev to staging?
for clarification, are you talking about promoting images and helm charts? or promoting usage of kubernetes within a company
right now we are using diff name spaces in k8s
currently it is within company
So the same cluster for staging and production?
dev and staging
sorry, i let this fall through the cracks.
we don’t have a well documented process for what you want. we’ve implemented and documented it internally for customers, but still need to document it on our site.
we have something rough here: https://docs.cloudposse.com/release-engineering/cicd-process/
also, looks like the video was taken down =/
this is the same thing that I had in mind …
do what is your view databases with persistent volumes?
Use fully managed databases for anything you care about
Use database containers for disposable environments
So for example, when we deploy environments for every PR we use containers
have’t taken a look at it
ya we run a two accounts … were one is prod and one is none-prod
and all our none-prod stuff happens in nonp
We don’t have the promotion process documented but I can share how it looks (we use Codefresh)
I am currently on my phone so will share a little later
np sounds good
using kops or terraform for creating kubernetes Production. What is better and cons ??
@Tee we use terraform to create kops resources, e.g.
and then use
kops to provision k8s clusters
we also have TF modules for EKS
@Tee are you thinking GCP or AWS?
so on AWS, my opinion is that it’s more work than than necessary to manage EKS with terraform. the challenge comes down to upgrading. there’s some discussions on strategies for that.
@Andriy Knysh (Cloud Posse) are ya’ll using it with kops? Looks like it. How does TF generation fit in if at all?
kops, the ability to do rolling-updates is built in; it’s a purpose built tool like kops will do a better job at managing lifecycles.
if fargate announces EKS support at the end of the month, I might change my stance
But the EKS and FARGATE gets pretty expensive
as far i think
humans aren’t cheap either
So what do you suggest for longterm. Not considering the cost. With less bottlenecks and nightmares
not an easy question
I mean in terms of stability
kops is well established and works well, and does lifecycle management
EKS is new and lacks a lot of features, but it will stay and they will improve it
Fargate will improve and cost will be reduced
(we’re not using EKS in production yet, so our story will be biased towards kops)
Oh ok. Thanks @Erik Osterman (Cloud Posse) & @Andriy Knysh (Cloud Posse) for your suggestions.
yea, the point is that with the current state of EKS, you need to do and provision even more resources than using
and it does not support many features
Fargate could improve it, but as many mentioned it’s costly (and it does not exists yet)
That makes sense
I am currently moving all of our infrastructure off Mesosphere DC/OS onto EKS and EKS has been phenomenal in my opinion - just lots of support from many different aspects such as AWS and the Kubernetes community
as well as great folks like Cloud Posse
yea thanks @Matthew
the point is that with EKS, if for example you need to perform a rolling update, it’s not supported out of the gate
so a lot of friction with many things
kops it just works
but sure for longterm EKS/Fargate would be better
Yeah i’ve talked with EKS specialist from AWS and they currently suggest a blue/green strategy for upgrading which can be tedious and at times break backwards compatibility
how do you export a single context of your kubeconfig?
say my local kubeconfig has a
@btai we don’t have multiple contexts. We use
containers + ENV vars pattern (implemented in
geodesic + repo per env + Dockerfile(s)). So in each container (
dev, etc), when we run it, we have all ENV vars defined for that particular env (ENV vars come from Dockerfiles or from SSM if they are secrets). That includes everything for Terraform, kops, k8s, etc.
So when we do for example
kops export kubecfg, the environment knows what context we want
and we can run those
geodesic containers locally and also in CI/CD pipelines (for which we use Codefresh since it can run each pipeline step as a Docker container)
Have you guys used Codefresh enterprise? I know you’re all big into codefresh here. Just curious of any pitfalls or bits of advice you guys have
(enterprise to run on-prem)
So Codefresh enterprise has 3 variations: full SaaS, hybrid and on-prem
we’ve been working exclusively with the enterprise SaaS
what’s the primary driver for going on-prem?
compliance requiring no dependence on external SaaS providers
which compliance certification?
oh wow, you haven’t even taken me out on a date yet to be asking such risquè questions.
lol jk, I think fedramp
ok - that’s a whole ’nother cup of tea
not familiar with
but sounds like you’d need full on-prem.
so i probably wouldn’t enlighten you more than you already know
No worries. We’re new to codefresh–so just probing for any gotcha’s really
@dustinvb can definitely elaborate
are you using the helm based install?
I think at the moment, yes
debating about the release of terraform 0.12 and using all the templating stuff
rather than 2 templating engines.. tiller.. and all that jazz
so i get where you’re coming from - but from what i’ve gleaned the current helm provider is too basic to handle all kinds of helm charts. maybe with 0.12 it’s better off
you’ve seen our helmfiles repo? basically you can’t do half of what we do with helmfile using that provider
Just as a cover my ass that I’m not saying providing any confidential information.. it’s publicly available that we’re on fedramp ^^^ lol
Oh, my bad. I didn’t mean the helm provider.. I meant generating the k8s.yml files on the fly based on the infra-state.. no helm installation anywhere
just a thought at the moment–not necessarily going that direction for sure
but yea, you could basically create terraform modules in place of helm charts
… if terraform templating is sufficient
hehe, yeah, big “if”
it’s been my experience, the “simple” case always works well regardless of the technology
how have you guys been liking helm? any complaints with the tiller stuff, or you guys are experienced enough with it all–nothing really bugs you?
i mean, it sucks about the tiller and all
but i look at helm more like an interface
and the interface won’t change dramatically, but the underlying implementation is getting a big overhaul as you’re probably aware
as part of that tiller is going away and the template engine going pluggable
“tillerless helm” is the buzz
as a way to manage a complex apps it’s great
and app dependencies
i say (and with some humility) that those before us have invested a lot of time in what it takes to manage software releases
deb, rpm, apk, etc.
we tried to avoid that with just a
Makefile; it worked well until it didn’t. in the end, we needed all that a package manager provides and conceded to package
.apk alpine packages
my point is that just templatizing raw kubernetes resources and applying them seems easy enough and i’m sure you can get away with it for a long time
but then you realize you want to have dependencies, triggers on deployment or uninstall, and rollbacks, etc. then you’re on your own.
the more homegrown/spun, the more the solution diverges from the trajectory the community is taking
because the community is solving problems around a standardized toolset
so i’m curious.. you bring up rollbacks
codefresh/spinnaker’s solutions didn’t offer enough in that aspect?
codefresh relies on the fact that helm does rollbacks automatically
and even bakes that into the UI with one-click rollbacks
they also have some even more cool stuff in the works - but you’ll have to ask them to see it
we have meetings setup with them
very cool! hit me afterwards and let me know how it goes
does all this reveal a well needed niche (product offering) in the CI/CD process for k8s?
since there always ends up being handrolled stuff?
haha, not sure - there are more CI/CD platforms today than ever
i can’t keep them straight anymore
spinnaker is now coming out with an enterprise offering too
haha nice. Well, after the bloodbath, hopefully the best solution reigns supreme
halyard was surprising when I first played with it
and then github actions
then I looked at the helm chart for spinnaker.. and it was just a bunch of
but i agree that there’s still big room for improvement
the fact there is so much handrolling and independent tooling
I think codefresh is well poised to do that as it relates to cicd+kubernetes+helm
does your gut think helm isn’t going anywhere?
until I see an alternative that has anywhere near the critical mass of helm, yes - i think it’s here for the foreseeable future
for example, there’s
ksonnet (based on
jsonnet) which looks interesting
but i think some variation of that could be used as a pluggable engine for helm
also, i don’t want to see proliferation of more packaging systems right now - it’s too early
have you seen this plugin?
this is pretty smart.
I thiiiink I’ve seen this one.. if not it was something similar
basically, it’s a drop in replacement. it still stores all configs in the cluster (per namespace if you want)
you run a temporary tiller locally
this can be run as part of CI
interesting.. hmm.. nice actually!
(though would break the codefresh helm UI, since it would need to talk to the tiller and there would be none running)
Experimental ksonnet plugin for Helm. Contribute to technosophos/helm-ksonnet development by creating an account on GitHub.
In order to provide jsonnet rendering for helm charts a new ReleaseModule similar to the Rudder ReleaseModule should be developed. This module would take charts and render them as Jsonnet templates…
Guess my hopes of seeing ksonnet as a template engine in helm were misguided
I know Lua is coming. I’d heard such great things about jsonnet, that I assumed it would be well suited. But Lua I guess is a better understood embeddable language
Last I had to write Lua was 14 years ago when dealing with Nginx
For the codefresh peeps out there.. does it matter what/how the ingress controller looks when using codefresh for deployments?
Was reading through: https://docs.traefik.io/user-guide/kubernetes/#traffic-splitting and it came to mind
Nope, we use for example the CloudFlare Acesss/Argo ingress and nginx-ingress controller in the same cluster
have you seen this https://www.youtube.com/watch?v=kOa_llowQ1c
and i think he’s presenting the simple side that should be presented
and here comes the but…..
but in the real world of deploying complex applications with interdependencies, secrets, configurations, etc… it devolves into something much more complicated
his presentations are always awesome
and the gap to cross from the hello world examples to customer apps is huge
he makes it look so “easy button”
PLEASE SOMEONE SHOW ME HOW TO MAKE THIS EASIER
i want to
i hate this
and here’s the rest of all the other apps
so with one of my customers we are working on two distinct steps… once to build the app, then a seperate one to update (deploy) the app in an ongoing fashion
so we’re using helm, and some hate on helm for one reason or another. but one things for sure, this is hiding an even more enormous pile of YAML/go templating on the backend.
i hate kelsey hightower in the best way possible
this is interesting https://aws.amazon.com/blogs/opensource/continuous-delivery-eks-jenkins-x/
Amazon Elastic Container Service for Kubernetes (Amazon EKS) provides a container orchestration platform for building and deploying modern cloud applications using Kubernetes. Jenkins X is built on Kubernetes to provide automated CI/CD for such applications. Together, Amazon EKS and Jenkins X provide a continuous delivery platform that allows developers to focus on their applications. This […]
i did not think it could do so much, it creates pipelines for infrastructure itself (prod and staging), and pipelines for the app, and even spawns a separate testing/staging env in k8s for each PR, and comments on GitHub on PRs (like
atlantis), and creates GitHub repos with Helm charts for the infrastructure (prod and staging)
one thing it can’t do is to upgrade the k8s cluster b/c it itself sits in the same cluster
@here Any recommendations for learning distributed systems from basics to advance?
noticed, lot of people knows the tools but not the concepts…
@ramesh.mimit I found this site very interesting and with lots of resources about distributed systems, and real-life examples from many companies
@Andriy Knysh (Cloud Posse) thanks..
“Google Kubernetes Engine’s third consecutive day of service disruption”
anyone use the official python kube library?
can you load the config from a dict?
why not use config profiles instead?~
the underlying aws SDK should then handle everything automatically~
the kube config?
heh, my bad @btai
how would i run kubectl within a container running from a job?
here’s an example doing it from a deployment: https://github.com/onfido/k8s-rabbit-pod-autoscaler
doing it from a job wouldn’t be any different
just need the proper role bindings
in this case,
kubectl is gettin called from in the
so if i have the wrong role bindings
would i be getting this error:
The connection to the server localhost:8080 was refused - did you specify the right host or port?
all i know is when we implemented it for redis using the strategy above (for rabbit), we didn’t need to specify any hosts
it just autodiscovers it
the pod autodiscovers
ok thats what i was hoping for
it also provides a kube context
so the pod itself didnt have any kubeconfig or kube api secrets
yea, it didn’t have anythign like that
yeah i have a job basically doing the same thing
executing a shell script that makes a kubectl call
but i get the above error
we tested it on gke and kops
yeah it works in kops
you have rbac enabled in kops?
although the kops
doesnt have rbac enabled
so do i create a clusterrolebinding for the job?
sorry, still new to k8s
hypothetically if i create a cluster role binding with the namespace and name that matches the job, that should work?
More or less
I don’t know the specific matching selectors that are available
This is near. Copy secrets from a centralized system of record. https://github.com/mittwald/kubernetes-replicator/
Kubernetes controller for synchronizing secrets & config maps across namespaces - mittwald/kubernetes-replicator
How are you guys handling busy helm deployments where the tiller is busy attending to other deployments…
Error: could not find a ready tiller pod
@Max Moon @dustinvb
Introduce –replicas option to configure amount of Tiller instances on the cluster. Fixes #2334. The next PR will be about distributed lock, this one is just exterior.
I haven’t ran into this scaling issue yet.
—replicas option looks nice
@michal.matyjek @Daren have you run into this?
I have not run into this yet either
I have not
how many deployments are we talking about?
so we’re running
helm on every PR synchronization for unlimited staging environments
so we’re getting it
e.g. 2 developers push at around the same time
hey all - curious what the verdict is on kiam vs kube2iam… it seems like kiam was created to address some issues with kube2iam - is kiam the way to go these days?
@sarkis yea, kube2iam is dead and should not be used. It’s a massive liability to even deploy in an AWS account. If you run more than N hosts (N ~10), you’ll DoS AWS APIs and they rate limit you.
kiam addresses this by having a client/server model. clients run on all nodes (agents), and talk to the server. the server is responsible for fetching the credentials which reduces rate of requests
it also caches
I think there’s been some frustration related to the rate of development on Kiam, but the worse bugs are fixed.
Also, I don’t know of any alternatives to
kube2iam for AWS
thanks @Erik Osterman (Cloud Posse)!
You can use a PodPreset object to inject information like secrets, volume mounts, and environment variables etc into pods at creation time.
set the channel topic: