Kubernetes resource and cost management
@webb we are going to take a stab at the
helmfile today for
@Maxim Mironenko (Cloud Posse) is going to work on it
@Maxim Mironenko (Cloud Posse) has joined the channel
we’re going to integrate it with our version of grafana/prometheus
@Maxim Mironenko (Cloud Posse) might be reaching out if he gets stuck
sweet! please to meet you @Maxim Mironenko (Cloud Posse). @Ajay Tripathy and I are here if we can help in any way!
@Ajay Tripathy @webb will he need an IAM role for the chart? … to be able to injest cost data and/or AWS account data?
I’ll let @Ajay Tripathy confirm but you should just need the ability to allow Tiller to install charts, at least in the namespace kubecost will run in
No IAM account needed for out of the box billing data!
and for the ability to pull in cost data for stuff outside of k8s? (e.g. rds)
…or is that an enterprise feature
that will require a key to access your accounts billing data but it’s not required at installation…
out of the box we just use this AWS/GCP public billing api
does it support pod annotations? (we use
it does look at pod annotations/labels for cost allocation…
how are you using
kiam in this context?
@Maxim Mironenko (Cloud Posse) we’ll need to submit a PR to support annotations here for
by placing the
[iam.amazonaws.com/role](http://iam.amazonaws.com/role) annotation on a pod, we’re able to grant specific permissions to a pod (E.g. readonly AWS access)
@Ajay Tripathy do you have a minimal IAM policy for kubecost? we don’t want to grant all readonly b/c we have a lot of secrets in SSM
Ajay is taking a look now. I’m pretty sure we don’t need secrets read permission. Are there others that might be problematic?
Hi @Erik Osterman (Cloud Posse), we don’t need to read kubernetes secrets. I believe we currently use all the others detailed here https://github.com/kubecost/cost-analyzer-helm-chart/blob/master/cost-analyzer/templates/cost-analyzer-cluser-role-template.yaml for insights. Are there specific concerns?
this doesn’t have to do with kubernetes secrets
this as to do with how to access AWS resources securely from kubernetes pods
…if we are to use kubecost to ingest data from AWS APIs, we need credentials
hardcoding credentials is an anti-pattern
(e.g. do not ever set
as I recall, a recent release of kubecost added the ability to ingest resources running in an account outside of what’s running inside of the k8s cluster (e.g. an RDS database)
in order to be able to do that, we’ll need to setup an IAM role with sufficient permissions
anyways, it’s a very easy thing for @Maxim Mironenko (Cloud Posse) to open a PR for. . .
more importantly, I was hoping to find out what IAM permissions were needed (or basically, which resources it currently supports indexing)
So, the current integration with billing data does set the AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY. We’d accept a PR to handle STS tokens– agreed it should not be hard, just hasn’t come up before. The required IAM permissions are AmazonEC2ReadOnlyAccess read and AmazonAthenaFullAccess .
Do you use the official AWS SDK?
(if so, then it works automatically; however if
kubecost adds extra validation that
AWS_SECRET_ACCESS_KEY are set, then that may break it since they won’t be set)
yes, for golang.
ok, go sdk supports it.
@Erik Osterman (Cloud Posse) this access isn’t required for the initial kubecost installation. This wouldn’t be blocking you at this point would it?
No, not blocking per say
Just was hoping to knock it all out at once
@Ajay Tripathy and I will discuss today. Might be something we can support quickly. Did you guys want to submit a PR?
@webb a quick call with @Ajay Tripathy and we can probably sort it all out
@Erik Osterman (Cloud Posse) put some time on your calendar for 4:45– happy to help.
err, 4:45 pm PST today, to be clear.
Hey @Ajay Tripathy! May I ask to check for PR:
Hey @Maxim Mironenko (Cloud Posse) – taking a look
seems to still not run after the spacing fix– I can take a look
@Ajay Tripathy you can hold off
@Maxim Mironenko (Cloud Posse) is going to pair with @Igor Rodionov on the helm stuff (he’s just getting up to speed on helm)
…they are working on it today (OMST)
@Ajay Tripathy fix applied to PR, should work now
@Ajay Tripathy @webb we’ve had some challenges getting it up and running
@Igor Rodionov can share more details, but in short the web UI is not working correctly & no log events
also, the chart lacks an ingress
we can submit PRs as necessary, but I think what would really help @Maxim Mironenko (Cloud Posse) and @Igor Rodionov is to see what it should look like when working
and we can work backwards from there
@Igor Rodionov how can we be most helpful? Would you want to jump on phone/video call
@Erik Osterman (Cloud Posse) it’s true that we don’t ship with an ingress out of the box today
we did get this far
So that would say that KSM+Prometheus was installed correctly… that’s good
Are you able to successfully port-forward?
Is this on AWS?
we are not doing portforwarding
our objective is to expose it behind IAP (as part of our portal)
but right now it’s public on our test account
Is there an endpoint you can share?
i’ll DM you
We typically have teams get port forwarding working and then stand up an end point soon after.
@Maxim Mironenko (Cloud Posse) and @Igor Rodionov we’re able to successfully load your UI but it looks like one query (idleness) is returning null. We’re investigating why now.
@Erik Osterman (Cloud Posse) @Maxim Mironenko (Cloud Posse) do you know why this prometheus query
node_cpu_seconds_total would not be returning data on your cluster? Maybe node_exporter doesn’t have the permissions needed?
hm… we need to check that
really we expected that helm install will guarantee all required permissions
we expected that as well. we haven’t seen this before. we’ll continue investigating on our end. it does seem to be related to node exporter from what we’ve seen so far.
but just to be clear… the app loads fine for us it’s just this one issue that we’re seeing..
how about to schedule the meeting to debug this togeather?
the problem is that there are poor logging in cost-analizer server
so we do not where to look
also I do not know how you configured scrappers for prometheus
if you can speedup us with that - would be perfect
yes — happy to meet, are you free in 20 mins? we’ll investigate further before then.
can we schedule it your evening?
in my zone it is 23:57
and I have few calls before sleep (
how about your 20:00 ?
Yes, we can speak this evening. @Ajay Tripathy has to go to the airport around that time though. Could we speak at 19:30 Pacific?
I will wake up that time
Sg, we’re also looking this problem now. It looks like you may have had an existing node exporter deployment on this cluster? Does that sound right?
We have a full kube-prometheus deployment which includes node exporter
Ok, that appears to be causing the issue. Still investigating.
@Erik Osterman (Cloud Posse) @Igor Rodionov we’ve been able to reproduce. We don’t reinstall node_exporter if there’s an existing installation in your cluster. That works fine with the default install. But for some reason the configuration on your node exporter isn’t allowing metrics to land in prometheus. Regardless, we’ve pushed a change so that the app still functions without any issues if you restart the
kubecost-cost-analyzer pod. We’ll discuss this underlying problem further with Igor tonight. Let me know if you have any questions!
maybe it’s cause our node exporter is wired up with kube-prometheus
yet we don’t have
kubecost pointed to that prometheus (which is our ultimate goal, but we thought we’d try to first get it up with the built-in prometheus and grafana)
Yeah, that sounds like it could be the cause… we’ll look into some more before our call with Igor. Positive is that not having this data just slightly limits functionality… it shouldn’t break anything
hmm, we’re on zoom
you on another meeting id?
@Igor Rodionov @Maxim Mironenko (Cloud Posse) @Erik Osterman (Cloud Posse) quick update… we were able to confirm why you were missing a couple metrics on the Kubecost frontend. The node-exporter metrics in question were introduced in v0.16.0 on 2018-05-15. It appears this test cluster is running node-exporter:v0.15.2. As mentioned last week, our app falls back gracefully but you would get a number of new metrics/fixes with an node-exporter upgrade. Anyways, just wanted to share this to close the case on root cause — no action required.
We’re going to upgrade node exporter on our side