#sre (2020-05)

Prometheus, Prometheus Operator, Grafana, Kubernetes

Archive: https://archive.sweetops.com/monitoring/

2020-05-06

Jeremy G (Cloud Posse) avatar
Jeremy G (Cloud Posse)

[thread] @kskewes @webb @asmito (and anyone else who wants to join in) We talked about Thanos 8 months ago as promising but not quite ready. They have of course progressed a lot since then, notably with an S3 backend that @kskewes said he was waiting for. I see support/interest from Banzai and Bitnami. There is experimental support for Thanos in the Prometheus Operator charts, and Prometheus itself has just (May 5) upgraded is Integration listing of Thanos from “read” to ‘read and write”. We are looking to do some kind of Federated Prometheus to monitor multiple clusters and Thanos still looks good to me. Do you have any pros/cons to share?

kskewes avatar
kskewes

We’re now on AWS and Thanos on roadmap. All looks good.

joshmyers avatar
joshmyers

Thanos++

Steve Boardwell avatar
Steve Boardwell

Nice!

webb avatar

We have 50+ teams using our product with Thanos… overall it’s been a great experience. There was a number of memory issues fixed in recent versions. Memory management can still require configuration. We’ve been watching the thanos receiver closely, because the sidecar has a two-hour compaction window / delay by default. Happy to share lots more if you want to discuss!

3
Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

@Steve Boardwell @Jeremy G (Cloud Posse)

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

Thanks @webb

1
Jeremy G (Cloud Posse) avatar
Jeremy G (Cloud Posse)

How are you deploying Thanos? Looks like the bitnami chart is the most popular and well-maintained, but it also seems tied to their ecosystem.

joshmyers avatar
joshmyers

Terraform module for Prometheus + Thanos

Jeremy G (Cloud Posse) avatar
Jeremy G (Cloud Posse)

@joshmyers Terraform? I don’t see how you can do that. Is that code something you can share?

joshmyers avatar
joshmyers

Not to k8s….

joshmyers avatar
joshmyers

I’m talking prometheus, alertmanager, thanos, push gateway deployed to an ASG

joshmyers avatar
joshmyers

Every X mins prom service scoops up a load of config files from S3 and restarts. Every service pushes prom/alert manager config to the bucket. Works well for us, but probably not the CP way

joshmyers avatar
joshmyers

terratest tests the packer AMI and the modules, extensively. None of this can you apply clean on a happy path stuff

Jeremy G (Cloud Posse) avatar
Jeremy G (Cloud Posse)

Right. Thanks!

2020-05-07

2020-05-08

2020-05-13

Jeremy G (Cloud Posse) avatar
Jeremy G (Cloud Posse)

Does anyone have any experience with Epsagon they would like to share?

End-to-End Observability for Microservice Environments | Epsagonattachment image

Monitor, troubleshoot and fix problems in seconds with payload visibility. Get started for free and experience Epsagon’s automated tracing solutions today!

Mike F. avatar
Mike F.

Disclaimer: I worked and helped build a competitor that got absorbed into New Relic. Going int PoCs it was always us vs Epsagon, rarely any of the other competitors. After the shut down of that product, my personal second choice absolutely is Epsagon. I really like the interface, its clean and easy to bubble up interesting data points in the system, and the alerting features are pretty good. On the serverless side of things, the tracing is a very helpful to me when troubleshooting any complex Lambda issue. The interdependence maps it shows are cool. From a deployment perspective its realativly straight forward. I do not like the CloudFormations stack AWS integration, I enjoyed the AWS account dependency from my old product. Setting up the Epsagon libraries is easy enough to get automated off box call tracing. Compared to some of the others (Thundra/Dashbird even the big names) i think Epsagon wins.

2020-05-14

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

Thanks @Mike F. for the details! Very helpful…

1
Mike F. avatar
Mike F.

@Erik Osterman (Cloud Posse) No problem! Love talking about Lambda/Microservice observability Hopefully one day soon we can implement some of this with the DevOps stuff I am doing with @Erick

wave1
1

2020-05-22

Jeremy G (Cloud Posse) avatar
Jeremy G (Cloud Posse)

Has anyone gotten Grafana’s new OAuth role mapping to work? And by work, I mean be useful? I have the role mapping working (users are assigned roles based on their OAuth) but they are each put into their own organization, so the role mapping is not very helpful. I want users to be added to Main Org. with the mapped role, but have not been able to get that to work. Either they are added with the auto_assign_org_role or they are not added at all.

https://grafana.com/docs/grafana/latest/guides/whats-new-in-v6-5/#generic-oauth-role-mapping

What's new in Grafana v6.5

Feature and improvement highlights for Grafana v6.5

Jeremy G (Cloud Posse) avatar
Jeremy G (Cloud Posse)

Turns out I only thought I had role mapping working. There are some bugs in the parser.

What's new in Grafana v6.5

Feature and improvement highlights for Grafana v6.5

    keyboard_arrow_up