#sre (2020-03)

Prometheus, Prometheus Operator, Grafana, Kubernetes

Archive: https://archive.sweetops.com/monitoring/


btai avatar

prometheus-operator users: how much memory have you seen your prometheus operator consume?

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

A lot! I think we have allocated 14-16G

Vincent Fiset avatar
Vincent Fiset

On my side its 3Gi on a small cluster… I guess it depends on the cluster size and the amount of metrics generated

btai avatar

cool thanks guys. I think I may end up having it on its own k8s worker node

btai avatar

still a ton cheaper than the ~$4k a month we spend on sysdig


Vincent Fiset avatar
Vincent Fiset

Hi folks, what’s the right way to handle the KubeletDown alerts that comes with prometheus operator on a public cloud where nodes gets replaced at times ?

    - alert: KubeletDown
        message: Kubelet has disappeared from Prometheus target discovery.
        runbook_url: <https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeletdown>
      expr: |
        absent(up{job="kubelet", metrics_path="/metrics"} == 1)
      for: 15m
        severity: critical


Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

Adding @discourse_forum bot

discourse_forum avatar
10:05:07 PM

@discourse_forum has joined the channel


sheldonh avatar

What’s your preferred APM platform (no Appdynamics) ? Need container support, .net , Java, more, etc? I want to simplify telemetry and monitoring metrics to a central service and give business a self service telemetry metrics source so it’s all centralized.

I want a system ideally that automatically pulls in aws tags on instances to, do I can stop writing complicated chocolatey packages for configuring the app.

Right now gut feeling is SignalFX ( can manage with terraform to), datadog are the promising solutions.