#sre (2020-03)
Prometheus, Prometheus Operator, Grafana, Kubernetes
Archive: https://archive.sweetops.com/monitoring/
2020-03-17
![Erik Osterman (Cloud Posse) avatar](https://secure.gravatar.com/avatar/88c480d4f73b813904e00a5695a454cb.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0023-72.png)
![attachment image](https://zdnet4.cbsistatic.com/hub/i/r/2019/11/13/3d552e15-169a-4a9a-8483-1cb9e8d9e0f0/thumbnail/770x578/95786ec225143f47f2118c3ea48410a1/peopleserversistock-879720282.jpg)
Google says crushed rack wheels busted a cooling system, causing CPU performance to be throttled.
2020-03-25
![btai avatar](https://avatars.slack-edge.com/2019-09-04/736463433650_34701761239ea7ba8207_72.jpg)
prometheus-operator users: how much memory have you seen your prometheus operator consume?
![Erik Osterman (Cloud Posse) avatar](https://secure.gravatar.com/avatar/88c480d4f73b813904e00a5695a454cb.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0023-72.png)
![Vincent Fiset avatar](https://secure.gravatar.com/avatar/e02dd8d73faab221d616ee4e920ec71d.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0021-72.png)
On my side its 3Gi on a small cluster… I guess it depends on the cluster size and the amount of metrics generated
![btai avatar](https://avatars.slack-edge.com/2019-09-04/736463433650_34701761239ea7ba8207_72.jpg)
cool thanks guys. I think I may end up having it on its own k8s worker node
![btai avatar](https://avatars.slack-edge.com/2019-09-04/736463433650_34701761239ea7ba8207_72.jpg)
still a ton cheaper than the ~$4k a month we spend on sysdig
2020-03-26
![Vincent Fiset avatar](https://secure.gravatar.com/avatar/e02dd8d73faab221d616ee4e920ec71d.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0021-72.png)
Hi folks, what’s the right way to handle the KubeletDown
alerts that comes with prometheus operator on a public cloud where nodes gets replaced at times ?
- alert: KubeletDown
annotations:
message: Kubelet has disappeared from Prometheus target discovery.
runbook_url: <https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletdown>
expr: |
absent(up{job="kubelet", metrics_path="/metrics"} == 1)
for: 15m
labels:
severity: critical
2020-03-27
![Erik Osterman (Cloud Posse) avatar](https://secure.gravatar.com/avatar/88c480d4f73b813904e00a5695a454cb.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0023-72.png)
Adding @discourse_forum bot
![discourse_forum avatar](https://avatars.slack-edge.com/2020-03-26/1029663249525_451a74d3463357c40dbf_72.png)
@discourse_forum has joined the channel
2020-03-28
![sheldonh avatar](https://secure.gravatar.com/avatar/b909e5a82474e9853ff6a6c6111cf0cf.jpg?s=72&d=https%3A%2F%2Fa.slack-edge.com%2Fdf10d%2Fimg%2Favatars%2Fava_0020-72.png)
What’s your preferred APM platform (no Appdynamics) ? Need container support, .net , Java, more, etc? I want to simplify telemetry and monitoring metrics to a central service and give business a self service telemetry metrics source so it’s all centralized.
I want a system ideally that automatically pulls in aws tags on instances to, do I can stop writing complicated chocolatey packages for configuring the app.
Right now gut feeling is SignalFX ( can manage with terraform to), datadog are the promising solutions.