#monitoring (2020-03)

Prometheus, Prometheus Operator, Grafana, Kubernetes

Archive: https://archive.sweetops.com/monitoring/


sheldonh avatar

What’s your preferred APM platform (no Appdynamics) ? Need container support, .net , Java, more, etc? I want to simplify telemetry and monitoring metrics to a central service and give business a self service telemetry metrics source so it’s all centralized.

I want a system ideally that automatically pulls in aws tags on instances to, do I can stop writing complicated chocolatey packages for configuring the app.

Right now gut feeling is SignalFX ( can manage with terraform to), datadog are the promising solutions.


Erik Osterman avatar
Erik Osterman

Adding @ bot

discourse_forum avatar
10:05:07 PM

@ has joined the channel


Vincent Fiset avatar
Vincent Fiset

Hi folks, what’s the right way to handle the KubeletDown alerts that comes with prometheus operator on a public cloud where nodes gets replaced at times ?

    - alert: KubeletDown
        message: Kubelet has disappeared from Prometheus target discovery.
        runbook_url: <https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletdown>
      expr: |
        absent(up{job="kubelet", metrics_path="/metrics"} == 1)
      for: 15m
        severity: critical


btai avatar

prometheus-operator users: how much memory have you seen your prometheus operator consume?

Erik Osterman avatar
Erik Osterman

A lot! I think we have allocated 14-16G

Vincent Fiset avatar
Vincent Fiset

On my side its 3Gi on a small cluster… I guess it depends on the cluster size and the amount of metrics generated

btai avatar

cool thanks guys. I think I may end up having it on its own k8s worker node

btai avatar

still a ton cheaper than the ~$4k a month we spend on sysdig


Erik Osterman avatar
Erik Osterman
<https://news.google.com/articles/CBMiYWh0dHBzOi8vd3d3LnpkbmV0LmNvbS9hcnRpY2xlL2dvb2dsZS10aGlzLWlzLXdoYXQtY2F1c2VkLWNwdS10aHJvdHRsaW5nLWF0LW91ci1jbG91ZC1kYXRhLWNlbnRlci_SAWxodHRwczovL3d3dy56ZG5ldC5jb20vZ29vZ2xlLWFtcC9hcnRpY2xlL2dvb2dsZS10aGlzLWlzLXdoYXQtY2F1c2VkLWNwdS10aHJvdHRsaW5nLWF0LW91ci1jbG91ZC1kYXRhLWNlbnRlci8?hl=en-US&gl=US&ceid=US%3Aen https://news.google.com/articles/CBMiYWh0dHBzOi8vd3d3LnpkbmV0LmNvbS9hcnRpY2xlL2dvb2dsZS10aGlzLWlzLXdoYXQtY2F1c2VkLWNwdS10aHJvdHRsaW5nLWF0LW91ci1jbG91ZC1kYXRhLWNlbnRlci_SAWxodHRwczovL3d3dy56ZG5ldC5jb20vZ29vZ2xlLWFtcC9hcnRpY2xlL2dvb2dsZS10aGlzLWlzLXdoYXQtY2F1c2VkLWNwdS10aHJvdHRsaW5nLWF0LW91ci1jbG91ZC1kYXRhLWNlbnRlci8?hl=en-US&gl=US&ceid=US%3Aen>
Google: This is what caused CPU throttling at our cloud data center | ZDNet attachment image

Google says crushed rack wheels busted a cooling system, causing CPU performance to be throttled.


Jawwad Yunus avatar
Jawwad Yunus

Hi, I have an urgent requirement. Need to send nagios alerts to multiple different slack channels. Currently, all alerts go to just one channel. Has anyone ever set something like this up before?