#sre (2020-03)
Prometheus, Prometheus Operator, Grafana, Kubernetes
Archive: https://archive.sweetops.com/monitoring/
2020-03-17
Google says crushed rack wheels busted a cooling system, causing CPU performance to be throttled.
2020-03-25
prometheus-operator users: how much memory have you seen your prometheus operator consume?
On my side its 3Gi on a small cluster… I guess it depends on the cluster size and the amount of metrics generated
cool thanks guys. I think I may end up having it on its own k8s worker node
still a ton cheaper than the ~$4k a month we spend on sysdig
2020-03-26
Hi folks, what’s the right way to handle the KubeletDown
alerts that comes with prometheus operator on a public cloud where nodes gets replaced at times ?
- alert: KubeletDown
annotations:
message: Kubelet has disappeared from Prometheus target discovery.
runbook_url: <https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletdown>
expr: |
absent(up{job="kubelet", metrics_path="/metrics"} == 1)
for: 15m
labels:
severity: critical
2020-03-27
Adding @discourse_forum bot
@discourse_forum has joined the channel
2020-03-28
What’s your preferred APM platform (no Appdynamics) ? Need container support, .net , Java, more, etc? I want to simplify telemetry and monitoring metrics to a central service and give business a self service telemetry metrics source so it’s all centralized.
I want a system ideally that automatically pulls in aws tags on instances to, do I can stop writing complicated chocolatey packages for configuring the app.
Right now gut feeling is SignalFX ( can manage with terraform to), datadog are the promising solutions.