#sre (2020-05)
Prometheus, Prometheus Operator, Grafana, Kubernetes
Archive: https://archive.sweetops.com/monitoring/
2020-05-06
[thread] @kskewes @webb @asmito (and anyone else who wants to join in) We talked about Thanos 8 months ago as promising but not quite ready. They have of course progressed a lot since then, notably with an S3 backend that @kskewes said he was waiting for. I see support/interest from Banzai and Bitnami. There is experimental support for Thanos in the Prometheus Operator charts, and Prometheus itself has just (May 5) upgraded is Integration listing of Thanos from “read” to ‘read and write”. We are looking to do some kind of Federated Prometheus to monitor multiple clusters and Thanos still looks good to me. Do you have any pros/cons to share?
@Steve Boardwell
We’re now on AWS and Thanos on roadmap. All looks good.
Thanos++
Nice!
We have 50+ teams using our product with Thanos… overall it’s been a great experience. There was a number of memory issues fixed in recent versions. Memory management can still require configuration. We’ve been watching the thanos receiver closely, because the sidecar has a two-hour compaction window / delay by default. Happy to share lots more if you want to discuss!
@Steve Boardwell @Jeremy G (Cloud Posse)
How are you deploying Thanos? Looks like the bitnami chart is the most popular and well-maintained, but it also seems tied to their ecosystem.
Terraform module for Prometheus + Thanos
@joshmyers Terraform? I don’t see how you can do that. Is that code something you can share?
Not to k8s….
I’m talking prometheus, alertmanager, thanos, push gateway deployed to an ASG
Every X mins prom service scoops up a load of config files from S3 and restarts. Every service pushes prom/alert manager config to the bucket. Works well for us, but probably not the CP way
terratest tests the packer AMI and the modules, extensively. None of this can you apply clean on a happy path stuff
Right. Thanks!
2020-05-07
2020-05-08
2020-05-13
Does anyone have any experience with Epsagon they would like to share?
Monitor, troubleshoot and fix problems in seconds with payload visibility. Get started for free and experience Epsagon’s automated tracing solutions today!
Disclaimer: I worked and helped build a competitor that got absorbed into New Relic. Going int PoCs it was always us vs Epsagon, rarely any of the other competitors. After the shut down of that product, my personal second choice absolutely is Epsagon. I really like the interface, its clean and easy to bubble up interesting data points in the system, and the alerting features are pretty good. On the serverless side of things, the tracing is a very helpful to me when troubleshooting any complex Lambda issue. The interdependence maps it shows are cool. From a deployment perspective its realativly straight forward. I do not like the CloudFormations stack AWS integration, I enjoyed the AWS account dependency from my old product. Setting up the Epsagon libraries is easy enough to get automated off box call tracing. Compared to some of the others (Thundra/Dashbird even the big names) i think Epsagon wins.
2020-05-14
@Erik Osterman (Cloud Posse) No problem! Love talking about Lambda/Microservice observability Hopefully one day soon we can implement some of this with the DevOps stuff I am doing with @Erick
2020-05-22
Has anyone gotten Grafana’s new OAuth role mapping to work? And by work, I mean be useful? I have the role mapping working (users are assigned roles based on their OAuth) but they are each put into their own organization, so the role mapping is not very helpful. I want users to be added to Main Org. with the mapped role, but have not been able to get that to work. Either they are added with the auto_assign_org_role
or they are not added at all.
https://grafana.com/docs/grafana/latest/guides/whats-new-in-v6-5/#generic-oauth-role-mapping
Feature and improvement highlights for Grafana v6.5
Turns out I only thought I had role mapping working. There are some bugs in the parser.
Feature and improvement highlights for Grafana v6.5