#sre (2019-09)

Prometheus, Prometheus Operator, Grafana, Kubernetes

Archive: https://archive.sweetops.com/monitoring/

2019-09-12

Daniel Minella avatar
Daniel Minella

Are someone taking golden signals metrics from aws alb/elb monitoration?

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

@Daniel Minella haven’t heard about that before. What are “Golden Signal Metrics”?

kskewes avatar
kskewes

Probably these ones. https://landing.google.com/sre/sre-book/chapters/monitoring-distributed-systems/ Requests, latency, errors, saturation (Or different words for same things)

Daniel Minella avatar
Daniel Minella

Exactly @kskewes

2019-09-13

asmito avatar

hey guys have anyone before tried https://thanos.io/

Thanos

Thanos - Highly available Prometheus setup with long term storage capabilities

kskewes avatar
kskewes

One of our team has in previous job and we plan to roll out to aggregate up regions. Sounds solid.

Thanos

Thanos - Highly available Prometheus setup with long term storage capabilities

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)
banzaicloud/banzai-charts

Curated list of Banzai Cloud Helm charts used by the Pipeline Platform - banzaicloud/banzai-charts

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

Chart looks pretty straightforward to deploy

kskewes avatar
kskewes

Cheers. We’re using kube-prometheus (jsonnet) and that project has it as a first class extension so should be fine. Just waiting for s3. Then if we can move our logs from elastic to Loki we’re laughing. Use object storage instead of managing redundancy at block layer.

Jeremy G (Cloud Posse) avatar
Jeremy G (Cloud Posse)

I notice that the CoreOS Prometheus lists Thanos as a write-only backend.

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

@Jeremy G (Cloud Posse)

Jeremy G (Cloud Posse) avatar
Jeremy G (Cloud Posse)

@Erik Osterman (Cloud Posse) I have to wonder how good the performance is and how expensive (real money) it is to use against an S3 back end, but otherwise it looks good on paper. Maybe get @webb to try it to solve the Kubecost history storage problem

webb avatar

@Jeremy G (Cloud Posse) @asmito we did a deep dive ~2 months ago. Our view was… very promising project but we felt that some of the scaling issues were going to be hard for us to go over. We’re ingesting 100k+ metrics per min. We plan to revisit it soon. Happy to share more detail if it would be helpful.

Jeremy G (Cloud Posse) avatar
Jeremy G (Cloud Posse)

Yes, please do share some details. Is the bottleneck the performance of S3 or something else? Did you find a threshold rate of metrics that went from acceptable performance to not?

webb avatar

I’m sorry – I just tried to reference our notes from this experiment and I may have been mistaken actually… while we don’t have exact results on hand today, it looks like our notes show that we needed a more expressive query language for the range/scale of data we were querying. We had a general question mark around scale given that Thanos is a sandbox project, but it looks like there are no specific notes around hitting bottlenecks. My apologies. I expect we’ll revisit this soon, but for now we’re using the Postgres adapter.

2019-09-14

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)
spotahome/service-level-operator

Manage application’s SLI and SLO’s easily with the application lifecycle inside a Kubernetes cluster - spotahome/service-level-operator

kskewes avatar
kskewes

Great share! Looks very interesting. Neat to have multi burn rate defined too. There’s a semi recent SoundCloud blog talking about how they do it with vanilla Prometheus using recording rules etc.

spotahome/service-level-operator

Manage application’s SLI and SLO’s easily with the application lifecycle inside a Kubernetes cluster - spotahome/service-level-operator

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

I’m eager to try this one out. Love how apps can easily define their own SLI/SLO by defining a CRD.

2019-09-15

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

@Daren

2019-09-16

Daren avatar

Oh thats interesting, thanks for sharing!

    keyboard_arrow_up