#monitoring (2019-09)

Prometheus, Prometheus Operator, Grafana, Kubernetes

Archive: https://archive.sweetops.com/monitoring/

2019-09-16

Daren avatar
Daren

Oh thats interesting, thanks for sharing!

2019-09-15

Erik Osterman avatar
Erik Osterman

@Daren

2019-09-14

Erik Osterman avatar
Erik Osterman
spotahome/service-level-operator

Manage application’s SLI and SLO’s easily with the application lifecycle inside a Kubernetes cluster - spotahome/service-level-operator

kskewes avatar
kskewes

Great share! Looks very interesting. Neat to have multi burn rate defined too. There’s a semi recent SoundCloud blog talking about how they do it with vanilla Prometheus using recording rules etc.

spotahome/service-level-operator

Manage application’s SLI and SLO’s easily with the application lifecycle inside a Kubernetes cluster - spotahome/service-level-operator

Erik Osterman avatar
Erik Osterman

I’m eager to try this one out. Love how apps can easily define their own SLI/SLO by defining a CRD.

2019-09-13

asmito avatar
asmito

hey guys have anyone before tried https://thanos.io/

Thanos

Thanos - Highly available Prometheus setup with long term storage capabilities

kskewes avatar
kskewes

One of our team has in previous job and we plan to roll out to aggregate up regions. Sounds solid.

Thanos

Thanos - Highly available Prometheus setup with long term storage capabilities

Erik Osterman avatar
Erik Osterman
banzaicloud/banzai-charts

Curated list of Banzai Cloud Helm charts used by the Pipeline Platform - banzaicloud/banzai-charts

Erik Osterman avatar
Erik Osterman

Chart looks pretty straightforward to deploy

kskewes avatar
kskewes

Cheers. We’re using kube-prometheus (jsonnet) and that project has it as a first class extension so should be fine. Just waiting for s3. Then if we can move our logs from elastic to Loki we’re laughing. Use object storage instead of managing redundancy at block layer.

Jeremy Grodberg avatar
Jeremy Grodberg

I notice that the CoreOS Prometheus lists Thanos as a write-only backend.

Erik Osterman avatar
Erik Osterman

@Jeremy Grodberg

Jeremy Grodberg avatar
Jeremy Grodberg

@Erik Osterman I have to wonder how good the performance is and how expensive (real money) it is to use against an S3 back end, but otherwise it looks good on paper. Maybe get @webb to try it to solve the Kubecost history storage problem

webb avatar

@Jeremy Grodberg @asmito we did a deep dive ~2 months ago. Our view was… very promising project but we felt that some of the scaling issues were going to be hard for us to go over. We’re ingesting 100k+ metrics per min. We plan to revisit it soon. Happy to share more detail if it would be helpful.

Jeremy Grodberg avatar
Jeremy Grodberg

Yes, please do share some details. Is the bottleneck the performance of S3 or something else? Did you find a threshold rate of metrics that went from acceptable performance to not?

webb avatar

I’m sorry – I just tried to reference our notes from this experiment and I may have been mistaken actually… while we don’t have exact results on hand today, it looks like our notes show that we needed a more expressive query language for the range/scale of data we were querying. We had a general question mark around scale given that Thanos is a sandbox project, but it looks like there are no specific notes around hitting bottlenecks. My apologies. I expect we’ll revisit this soon, but for now we’re using the Postgres adapter.

2019-09-12

Daniel Minella avatar
Daniel Minella

Are someone taking golden signals metrics from aws alb/elb monitoration?

Erik Osterman avatar
Erik Osterman

@Daniel Minella haven’t heard about that before. What are “Golden Signal Metrics”?

kskewes avatar
kskewes

Probably these ones. https://landing.google.com/sre/sre-book/chapters/monitoring-distributed-systems/ Requests, latency, errors, saturation (Or different words for same things)

Daniel Minella avatar
Daniel Minella

Exactly @kskewes

    keyboard_arrow_up