#sre (2020-03)
Prometheus, Prometheus Operator, Grafana, Kubernetes
Archive: https://archive.sweetops.com/monitoring/
2020-03-17
data:image/s3,"s3://crabby-images/9a0f8/9a0f8d41476ffe9065fbe0b98227d0cdcaa0cd11" alt="Erik Osterman (Cloud Posse) avatar"
data:image/s3,"s3://crabby-images/6f60c/6f60cb942c47fa79f3b4d1b53b27f264d082b3de" alt="attachment image"
Google says crushed rack wheels busted a cooling system, causing CPU performance to be throttled.
2020-03-25
data:image/s3,"s3://crabby-images/e471b/e471bc22e77bf7730ed2046efb99c305a4f8df4f" alt="btai avatar"
prometheus-operator users: how much memory have you seen your prometheus operator consume?
data:image/s3,"s3://crabby-images/9a0f8/9a0f8d41476ffe9065fbe0b98227d0cdcaa0cd11" alt="Erik Osterman (Cloud Posse) avatar"
data:image/s3,"s3://crabby-images/35246/352463456138ff8e72b3987fdb0870141e897cc7" alt="Vincent Fiset avatar"
On my side its 3Gi on a small cluster… I guess it depends on the cluster size and the amount of metrics generated
data:image/s3,"s3://crabby-images/e471b/e471bc22e77bf7730ed2046efb99c305a4f8df4f" alt="btai avatar"
cool thanks guys. I think I may end up having it on its own k8s worker node
data:image/s3,"s3://crabby-images/e471b/e471bc22e77bf7730ed2046efb99c305a4f8df4f" alt="btai avatar"
still a ton cheaper than the ~$4k a month we spend on sysdig
2020-03-26
data:image/s3,"s3://crabby-images/35246/352463456138ff8e72b3987fdb0870141e897cc7" alt="Vincent Fiset avatar"
Hi folks, what’s the right way to handle the KubeletDown
alerts that comes with prometheus operator on a public cloud where nodes gets replaced at times ?
- alert: KubeletDown
annotations:
message: Kubelet has disappeared from Prometheus target discovery.
runbook_url: <https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletdown>
expr: |
absent(up{job="kubelet", metrics_path="/metrics"} == 1)
for: 15m
labels:
severity: critical
2020-03-27
data:image/s3,"s3://crabby-images/9a0f8/9a0f8d41476ffe9065fbe0b98227d0cdcaa0cd11" alt="Erik Osterman (Cloud Posse) avatar"
Adding @discourse_forum bot
data:image/s3,"s3://crabby-images/437c5/437c5f7ff80749c4e31740314c290186d75e89b6" alt="discourse_forum avatar"
@discourse_forum has joined the channel
2020-03-28
data:image/s3,"s3://crabby-images/30994/30994b883331c5aa17117e06b3f5d3e078824456" alt="sheldonh avatar"
What’s your preferred APM platform (no Appdynamics) ? Need container support, .net , Java, more, etc? I want to simplify telemetry and monitoring metrics to a central service and give business a self service telemetry metrics source so it’s all centralized.
I want a system ideally that automatically pulls in aws tags on instances to, do I can stop writing complicated chocolatey packages for configuring the app.
Right now gut feeling is SignalFX ( can manage with terraform to), datadog are the promising solutions.