SweetOps #kubecost for March, 2019

I’ll let @Ajay Tripathy confirm but you should just need the ability to allow Tiller to install charts, at least in the namespace kubecost will run in

webb

04:56:40 AM

No IAM account needed for out of the box billing data!

Erik Osterman (Cloud Posse)

04:56:54 AM

oh nice!

Erik Osterman (Cloud Posse)

04:57:18 AM

and for the ability to pull in cost data for stuff outside of k8s? (e.g. rds)

Erik Osterman (Cloud Posse)

04:57:31 AM

…or is that an enterprise feature

webb

05:17:31 AM

that will require a key to access your accounts billing data but it’s not required at installation…

webb

05:17:50 AM

out of the box we just use this AWS/GCP public billing api

Erik Osterman (Cloud Posse)

05:17:50 AM

does it support pod annotations? (we use kiam)

webb

05:18:31 AM

it does look at pod annotations/labels for cost allocation…

webb

05:18:45 AM

how are you using kiam in this context?

Erik Osterman (Cloud Posse)

05:20:16 AM

https://github.com/kubecost/cost-analyzer-helm-chart/blob/master/cost-analyzer/templates/cost-analyzer-deployment-template.yaml#L3-L7

kubecost/cost-analyzer-helm-chart

Contribute to kubecost/cost-analyzer-helm-chart development by creating an account on GitHub.

Erik Osterman (Cloud Posse)

05:20:30 AM

@Maxim Mironenko (Cloud Posse) we’ll need to submit a PR to support annotations here for kiam

Erik Osterman (Cloud Posse)

05:20:56 AM

@webb for context, https://github.com/uswitch/kiam#overview

uswitch/kiam

Integrate AWS IAM with Kubernetes. Contribute to uswitch/kiam development by creating an account on GitHub.

Erik Osterman (Cloud Posse)

05:21:58 AM

by placing the [iam.amazonaws.com/role](http://iam.amazonaws.com/role) annotation on a pod, we’re able to grant specific permissions to a pod (E.g. readonly AWS access)

Erik Osterman (Cloud Posse)

05:22:35 AM

@Ajay Tripathy do you have a minimal IAM policy for kubecost? we don’t want to grant all readonly b/c we have a lot of secrets in SSM

webb

05:42:45 AM

Ajay is taking a look now. I’m pretty sure we don’t need secrets read permission. Are there others that might be problematic?

Ajay Tripathy

06:12:38 AM

Hi @Erik Osterman (Cloud Posse), we don’t need to read kubernetes secrets. I believe we currently use all the others detailed here https://github.com/kubecost/cost-analyzer-helm-chart/blob/master/cost-analyzer/templates/cost-analyzer-cluser-role-template.yaml for insights. Are there specific concerns?

kubecost/cost-analyzer-helm-chart

Contribute to kubecost/cost-analyzer-helm-chart development by creating an account on GitHub.

Erik Osterman (Cloud Posse)

06:13:04 AM

this doesn’t have to do with kubernetes secrets

Erik Osterman (Cloud Posse)

06:13:19 AM

this as to do with how to access AWS resources securely from kubernetes pods

Erik Osterman (Cloud Posse)

06:13:45 AM

…if we are to use kubecost to ingest data from AWS APIs, we need credentials

Erik Osterman (Cloud Posse)

06:13:55 AM

hardcoding credentials is an anti-pattern

Erik Osterman (Cloud Posse)

06:14:27 AM

(e.g. do not ever set AWS_ACCESS_KEY_ID or AWS_SECRET_ACCESS_KEY)

Erik Osterman (Cloud Posse)

06:14:57 AM

instead, we rely on the fact that the AWS SDK automatically handles STS tokens (short lived, automatically rotated tokens)

Erik Osterman (Cloud Posse)

06:17:15 AM

kiam is the “glue” that makes all of this possible in k8s on AWS

Erik Osterman (Cloud Posse)

06:16:03 AM

as I recall, a recent release of kubecost added the ability to ingest resources running in an account outside of what’s running inside of the k8s cluster (e.g. an RDS database)

Erik Osterman (Cloud Posse)

06:16:39 AM

in order to be able to do that, we’ll need to setup an IAM role with sufficient permissions

Erik Osterman (Cloud Posse)

06:17:48 AM

anyways, it’s a very easy thing for @Maxim Mironenko (Cloud Posse) to open a PR for. . .

Erik Osterman (Cloud Posse)

06:17:49 AM

more importantly, I was hoping to find out what IAM permissions were needed (or basically, which resources it currently supports indexing)

Ajay Tripathy

06:32:44 AM

So, the current integration with billing data does set the AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY. We’d accept a PR to handle STS tokens– agreed it should not be hard, just hasn’t come up before. The required IAM permissions are AmazonEC2ReadOnlyAccess read and AmazonAthenaFullAccess .

Erik Osterman (Cloud Posse)

06:33:05 AM

Do you use the official AWS SDK?

Erik Osterman (Cloud Posse)

06:33:10 AM

(if so, then it works automatically; however if kubecost adds extra validation that AWS_ACCESS_KEY_ID, and AWS_SECRET_ACCESS_KEY are set, then that may break it since they won’t be set)

Ajay Tripathy

06:33:49 AM

yes, for golang.

Erik Osterman (Cloud Posse)

06:34:26 AM

ok, go sdk supports it.

2019-03-19

webb

03:28:06 PM

@Erik Osterman (Cloud Posse) this access isn’t required for the initial kubecost installation. This wouldn’t be blocking you at this point would it?

Erik Osterman (Cloud Posse)

03:42:34 PM

No, not blocking per say

Erik Osterman (Cloud Posse)

03:42:44 PM

Just was hoping to knock it all out at once

webb

05:15:45 PM

@Ajay Tripathy and I will discuss today. Might be something we can support quickly. Did you guys want to submit a PR?

Erik Osterman (Cloud Posse)

06:17:54 PM

@webb a quick call with @Ajay Tripathy and we can probably sort it all out

Erik Osterman (Cloud Posse)

06:18:09 PM

https://calendly.com/cloudposse

Ajay Tripathy

06:37:55 PM

@Erik Osterman (Cloud Posse) put some time on your calendar for 4:45– happy to help.

Ajay Tripathy

06:38:09 PM

err, 4:45 pm PST today, to be clear.

Erik Osterman (Cloud Posse)

07:34:51 PM

thanks!

2019-03-20

Maxim Mironenko (Cloud Posse)

04:08:56 AM

Hey @Ajay Tripathy! May I ask to check for PR: <https://github.com/kubecost/cost-analyzer-helm-chart/pull/3>

Ajay Tripathy

04:19:00 AM

Hey @Maxim Mironenko (Cloud Posse) – taking a look

Ajay Tripathy

04:32:56 AM

seems to still not run after the spacing fix– I can take a look

Erik Osterman (Cloud Posse)

04:35:58 AM

@Ajay Tripathy you can hold off

Erik Osterman (Cloud Posse)

04:36:06 AM

@Maxim Mironenko (Cloud Posse) is going to pair with @Igor Rodionov on the helm stuff (he’s just getting up to speed on helm)

Erik Osterman (Cloud Posse)

04:36:37 AM

…they are working on it today (OMST)

Ajay Tripathy

06:13:47 AM

Ack, thanks.

2019-03-21

Maxim Mironenko (Cloud Posse)

10:39:30 AM

@Ajay Tripathy fix applied to PR, should work now

Erik Osterman (Cloud Posse)

05:12:14 PM

@Ajay Tripathy @webb we’ve had some challenges getting it up and running

Erik Osterman (Cloud Posse)

05:12:36 PM

@Igor Rodionov can share more details, but in short the web UI is not working correctly & no log events

Erik Osterman (Cloud Posse)

05:12:56 PM

also, the chart lacks an ingress

Erik Osterman (Cloud Posse)

05:13:25 PM

we can submit PRs as necessary, but I think what would really help @Maxim Mironenko (Cloud Posse) and @Igor Rodionov is to see what it should look like when working

Erik Osterman (Cloud Posse)

05:13:32 PM

and we can work backwards from there

webb

05:13:57 PM

@Igor Rodionov how can we be most helpful? Would you want to jump on phone/video call

webb

05:14:18 PM

@Erik Osterman (Cloud Posse) it’s true that we don’t ship with an ingress out of the box today

Erik Osterman (Cloud Posse)

05:14:21 PM

Erik Osterman (Cloud Posse)

05:14:22 PM

we did get this far

webb

05:14:33 PM

hehe

webb

05:14:50 PM

So that would say that KSM+Prometheus was installed correctly… that’s good

webb

05:15:04 PM

Are you able to successfully port-forward?

webb

05:15:45 PM

Is this on AWS?

Erik Osterman (Cloud Posse)

05:15:51 PM

we are not doing portforwarding

Erik Osterman (Cloud Posse)

05:15:58 PM

our objective is to expose it behind IAP (as part of our portal)

Erik Osterman (Cloud Posse)

05:16:15 PM

but right now it’s public on our test account

webb

05:16:45 PM

Is there an endpoint you can share?

Erik Osterman (Cloud Posse)

05:17:10 PM

i’ll DM you

webb

05:17:18 PM

We typically have teams get port forwarding working and then stand up an end point soon after.

webb

05:26:52 PM

@Maxim Mironenko (Cloud Posse) and @Igor Rodionov we’re able to successfully load your UI but it looks like one query (idleness) is returning null. We’re investigating why now.

webb

05:31:39 PM

@Erik Osterman (Cloud Posse) @Maxim Mironenko (Cloud Posse) do you know why this prometheus query node_cpu_seconds_total would not be returning data on your cluster? Maybe node_exporter doesn’t have the permissions needed?

Igor Rodionov

05:39:29 PM

hm… we need to check that

Igor Rodionov

05:39:58 PM

really we expected that helm install will guarantee all required permissions

webb

05:41:20 PM

we expected that as well. we haven’t seen this before. we’ll continue investigating on our end. it does seem to be related to node exporter from what we’ve seen so far.

webb

05:41:48 PM

but just to be clear… the app loads fine for us it’s just this one issue that we’re seeing..

Igor Rodionov

05:41:55 PM

how about to schedule the meeting to debug this togeather?

Igor Rodionov

05:42:21 PM

the problem is that there are poor logging in cost-analizer server

Igor Rodionov

05:42:34 PM

so we do not where to look

Igor Rodionov

05:43:01 PM

also I do not know how you configured scrappers for prometheus

Igor Rodionov

05:43:21 PM

if you can speedup us with that - would be perfect

webb

05:45:56 PM

yes — happy to meet, are you free in 20 mins? we’ll investigate further before then.

Igor Rodionov

05:57:35 PM

can we schedule it your evening?

Igor Rodionov

05:57:51 PM

in my zone it is 23:57

Igor Rodionov

05:58:04 PM

and I have few calls before sleep (

Igor Rodionov

05:58:50 PM

how about your 20:00 ?

webb

06:01:11 PM

Yes, we can speak this evening. @Ajay Tripathy has to go to the airport around that time though. Could we speak at 19:30 Pacific?

Igor Rodionov

06:01:38 PM

sec

Igor Rodionov

06:01:56 PM

Igor Rodionov

06:02:06 PM

I will wake up that time

webb

06:03:35 PM

Sg, we’re also looking this problem now. It looks like you may have had an existing node exporter deployment on this cluster? Does that sound right?

Erik Osterman (Cloud Posse)

06:05:05 PM

Yep!

Erik Osterman (Cloud Posse)

06:05:49 PM

We have a full kube-prometheus deployment which includes node exporter

webb

06:06:59 PM

Ok, that appears to be causing the issue. Still investigating.

webb

06:56:21 PM

@Erik Osterman (Cloud Posse) @Igor Rodionov we’ve been able to reproduce. We don’t reinstall node_exporter if there’s an existing installation in your cluster. That works fine with the default install. But for some reason the configuration on your node exporter isn’t allowing metrics to land in prometheus. Regardless, we’ve pushed a change so that the app still functions without any issues if you restart the kubecost-cost-analyzer pod. We’ll discuss this underlying problem further with Igor tonight. Let me know if you have any questions!

Erik Osterman (Cloud Posse)

07:04:33 PM

Thanks @webb!

Erik Osterman (Cloud Posse)

07:04:49 PM

maybe it’s cause our node exporter is wired up with kube-prometheus

Erik Osterman (Cloud Posse)

07:05:36 PM

yet we don’t have kubecost pointed to that prometheus (which is our ultimate goal, but we thought we’d try to first get it up with the built-in prometheus and grafana)

webb

07:07:57 PM

Yeah, that sounds like it could be the cause… we’ll look into some more before our call with Igor. Positive is that not having this data just slightly limits functionality… it shouldn’t break anything

Erik Osterman (Cloud Posse)

07:08:38 PM

“graceful degradation”

Igor Rodionov

02:36:32 AM

Here

Erik Osterman (Cloud Posse)

02:37:08 AM

@webb

webb

02:37:30 AM

hmm, we’re on zoom

webb

02:37:40 AM

you on another meeting id?

2019-03-25

webb

06:28:40 PM

@Igor Rodionov @Maxim Mironenko (Cloud Posse) @Erik Osterman (Cloud Posse) quick update… we were able to confirm why you were missing a couple metrics on the Kubecost frontend. The node-exporter metrics in question were introduced in v0.16.0 on 2018-05-15. It appears this test cluster is running node-exporter:v0.15.2. As mentioned last week, our app falls back gracefully but you would get a number of new metrics/fixes with an node-exporter upgrade. Anyways, just wanted to share this to close the case on root cause — no action required.

Erik Osterman (Cloud Posse)

06:40:58 PM

We’re going to upgrade node exporter on our side

#kubecost (2019-03)

https://www.kubecost.com/

2019-03-18

2019-03-19

2019-03-20

2019-03-21

2019-03-25