#sre (2020-01)
Prometheus, Prometheus Operator, Grafana, Kubernetes
Archive: https://archive.sweetops.com/monitoring/
2020-01-02
2020-01-15
data:image/s3,"s3://crabby-images/d74ce/d74ce93b772bba387ee55532ef960af18f10908e" alt="Christopher avatar"
Sorry, newbie question… If I wanted to diagnose where a memory usage is going in a PHP application, is that something a tool like New Relic APM can do for me?
data:image/s3,"s3://crabby-images/662c3/662c3185b944a7d273fbaa7d61c4a971edb10194" alt="Pierre Humberdroz avatar"
New Relic APM does not really show these kind of things but it gives you time spans for how long it takes for certain things to load.
data:image/s3,"s3://crabby-images/d74ce/d74ce93b772bba387ee55532ef960af18f10908e" alt="Christopher avatar"
Ahh, okay.. Thanks, that probably won’t help me then. I found another blackfire.io which looks like it might help instead
data:image/s3,"s3://crabby-images/662c3/662c3185b944a7d273fbaa7d61c4a971edb10194" alt="Pierre Humberdroz avatar"
Google’s Stackdriver is the only tool that I know that does profiling on the fly of your applications and this would still mean it only works if you have a single request in a timespan to check.
data:image/s3,"s3://crabby-images/d74ce/d74ce93b772bba387ee55532ef960af18f10908e" alt="Christopher avatar"
I see. I might have to come about this a different way then .. I have no idea where the problem is coming from, so cannot currently isolate it to a single request.
data:image/s3,"s3://crabby-images/662c3/662c3185b944a7d273fbaa7d61c4a971edb10194" alt="Pierre Humberdroz avatar"
maybe you can explain what is happening and I get an idea how you could tackle this issue
data:image/s3,"s3://crabby-images/d74ce/d74ce93b772bba387ee55532ef960af18f10908e" alt="Christopher avatar"
Essentially, we have a WordPress website running on a VPS… And, every day, the log files are full of loads of “exhausted memory” errors in the logs. It only happens in production (probably because of the number of requests, or work load). There is very little to indicate what has caused it, the only mention is a file called wp-db.php
which is a PHP class to interact with MySQL.
Increasing the memory limit for the process does not solve it, it just eats up all of that too.
data:image/s3,"s3://crabby-images/d74ce/d74ce93b772bba387ee55532ef960af18f10908e" alt="Christopher avatar"
I’m mostly a front end developer, so I’m quite far out of my depth here, so sorry about that haha
data:image/s3,"s3://crabby-images/662c3/662c3185b944a7d273fbaa7d61c4a971edb10194" alt="Pierre Humberdroz avatar"
did you verify that the memory limit is higher ? with phpinfo or the likes?
data:image/s3,"s3://crabby-images/d74ce/d74ce93b772bba387ee55532ef960af18f10908e" alt="Christopher avatar"
Yep
data:image/s3,"s3://crabby-images/d74ce/d74ce93b772bba387ee55532ef960af18f10908e" alt="Christopher avatar"
It was previous 512M, upped it to 1G
data:image/s3,"s3://crabby-images/9a0f8/9a0f8d41476ffe9065fbe0b98227d0cdcaa0cd11" alt="Erik Osterman (Cloud Posse) avatar"
so wp-db.php
is the ORM for WP. Wouldn’t surprise me if some query is trying to load all records into memory.
data:image/s3,"s3://crabby-images/9a0f8/9a0f8d41476ffe9065fbe0b98227d0cdcaa0cd11" alt="Erik Osterman (Cloud Posse) avatar"
How large is the blog? Are we truly certain that 1gb is enough for a poorly written query?
data:image/s3,"s3://crabby-images/9a0f8/9a0f8d41476ffe9065fbe0b98227d0cdcaa0cd11" alt="Erik Osterman (Cloud Posse) avatar"
You mention that it only happens in production. Does staging have an equal dataset?
data:image/s3,"s3://crabby-images/d74ce/d74ce93b772bba387ee55532ef960af18f10908e" alt="Christopher avatar"
The amount of content between staging & production is quite similar. It’s an e-commerce store using WooCommerce. There’s about 500 orders per day. With a total of about 180,000 orders on the site.
It’s possible that 1G is not big enough. I could up this to 4G and see if this help. I can’t imagine it needs to be more than 4G. I’ll see if that helps the situation shortly. I’ll also look to see if I can find any rogue queries loading everything in.
Thanks everyone btw
data:image/s3,"s3://crabby-images/662c3/662c3185b944a7d273fbaa7d61c4a971edb10194" alt="Pierre Humberdroz avatar"
I ran into a similar issue there was a Cronjob trying to generate some kind of report across all orders but this was 5 years ago
data:image/s3,"s3://crabby-images/d74ce/d74ce93b772bba387ee55532ef960af18f10908e" alt="Christopher avatar"
Alright, 4G wasn’t enough either!
I’ll see if I can find some information around the queries. Perhaps I can find a pattern
data:image/s3,"s3://crabby-images/662c3/662c3185b944a7d273fbaa7d61c4a971edb10194" alt="Pierre Humberdroz avatar"
are you on nginx or apache2
data:image/s3,"s3://crabby-images/d74ce/d74ce93b772bba387ee55532ef960af18f10908e" alt="Christopher avatar"
Litespeed
data:image/s3,"s3://crabby-images/662c3/662c3185b944a7d273fbaa7d61c4a971edb10194" alt="Pierre Humberdroz avatar"
does it put out logs compatible to nginx or apache ?
data:image/s3,"s3://crabby-images/d74ce/d74ce93b772bba387ee55532ef960af18f10908e" alt="Christopher avatar"
Honestly, I have no idea. If I paste you one of the lines from the log file, would that help?
data:image/s3,"s3://crabby-images/d74ce/d74ce93b772bba387ee55532ef960af18f10908e" alt="Christopher avatar"
2020-01-16T10:39:19+00:00 CRITICAL Out of memory (allocated 1556828160) (tried to allocate 4096 bytes) in /home/website/public_html/releases/1579104310/web/wp/wp-includes/wp-db.php on line 2007
2020-01-16T10:41:19+00:00 CRITICAL Out of memory (allocated 1561346048) (tried to allocate 58720264 bytes) in /home/website/public_html/releases/1579104310/web/wp/wp-includes/wp-db.php on line 2007
data:image/s3,"s3://crabby-images/662c3/662c3185b944a7d273fbaa7d61c4a971edb10194" alt="Pierre Humberdroz avatar"
looks like apache logs
data:image/s3,"s3://crabby-images/662c3/662c3185b944a7d273fbaa7d61c4a971edb10194" alt="Pierre Humberdroz avatar"
did you ever look at this with something like kibana and analyse the looks that way?
data:image/s3,"s3://crabby-images/d74ce/d74ce93b772bba387ee55532ef960af18f10908e" alt="Christopher avatar"
That’s getting way beyond my knowledge of this, and what’s setup.
I inherited this project, mostly do front-end development work, and only get to spend 2 days a month maximum on it.
data:image/s3,"s3://crabby-images/d74ce/d74ce93b772bba387ee55532ef960af18f10908e" alt="Christopher avatar"
I mostly hoped there was an easy solution if I am honest.
I appreciate all your help by the way, i’m learning a lot of new stuff based on your comments.
data:image/s3,"s3://crabby-images/9a0f8/9a0f8d41476ffe9065fbe0b98227d0cdcaa0cd11" alt="Erik Osterman (Cloud Posse) avatar"
By the looks of it raising the memory limit did not help either because it didn’t apply (there are many ways to set it in php), it was unset or changed by something else, or the server simply doesn’t have enough ram.
data:image/s3,"s3://crabby-images/9a0f8/9a0f8d41476ffe9065fbe0b98227d0cdcaa0cd11" alt="Erik Osterman (Cloud Posse) avatar"
Because it crashed at 1.5gb total requested, which is lower than the limit you set
data:image/s3,"s3://crabby-images/9dcb2/9dcb21fc8b97bc99c54633f8353227f74ec9ba10" alt="Roderik van der Veer avatar"
New Relic has a PHP agent which is actually very useful. It will also show you slow queries etc: https://docs.newrelic.com/docs/agents/php-agent/getting-started/introduction-new-relic-php
For an overview of New Relic’s PHP agent (compatibility, requirements, installation, configuration, troubleshooting, known issues), start here.
data:image/s3,"s3://crabby-images/9dcb2/9dcb21fc8b97bc99c54633f8353227f74ec9ba10" alt="Roderik van der Veer avatar"
Switched technologies and i’m still missing this level of tracing in nodejs
data:image/s3,"s3://crabby-images/662c3/662c3185b944a7d273fbaa7d61c4a971edb10194" alt="Pierre Humberdroz avatar"
Elastic APM has the level of tracing @Roderik van der Veer for nodejs
data:image/s3,"s3://crabby-images/9dcb2/9dcb21fc8b97bc99c54633f8353227f74ec9ba10" alt="Roderik van der Veer avatar"
It has the same limitations as the NR one. In PHP, it can show you for each request, what call (written by you, a dependency or, and this is important, php built in funcion) takes how long. The nodejs APM solutions can show you, the route, a database call. but not for example a call to nodejs crypto which takes forever.
data:image/s3,"s3://crabby-images/9dcb2/9dcb21fc8b97bc99c54633f8353227f74ec9ba10" alt="Roderik van der Veer avatar"
but wil give it a go, because it does look nice TBH
data:image/s3,"s3://crabby-images/662c3/662c3185b944a7d273fbaa7d61c4a971edb10194" alt="Pierre Humberdroz avatar"
you can define custom spans in that case.
data:image/s3,"s3://crabby-images/d74ce/d74ce93b772bba387ee55532ef960af18f10908e" alt="Christopher avatar"
@Erik Osterman (Cloud Posse) oh, sorry i just posted an old line from the logs as an example of what they look like. It definitely did increase to 4GB as I have some entries in the log that exhausted all of that. Sorry for the confusion.
2020-01-16
2020-01-17
2020-01-27
data:image/s3,"s3://crabby-images/e471b/e471bc22e77bf7730ed2046efb99c305a4f8df4f" alt="btai avatar"
whats the difference between kube-prometheus and prometheus-operator? I’m assuming prom operator is completely bare bones while kube-prometheus has a default baseline of dashboards and monitors?
data:image/s3,"s3://crabby-images/aa44c/aa44cfbda773005898cc0ffb184ad365bfbec8cd" alt="Santiago Campuzano avatar"
Prometheus Operator creates/configures/manages Prometheus clusters atop Kubernetes - coreos/prometheus-operator
data:image/s3,"s3://crabby-images/e471b/e471bc22e77bf7730ed2046efb99c305a4f8df4f" alt="btai avatar"
The stable/prometheus-operator helm chart provides a similar feature set to kube-prometheus.
data:image/s3,"s3://crabby-images/e471b/e471bc22e77bf7730ed2046efb99c305a4f8df4f" alt="btai avatar"
this tells me kube-prometheus is probably not super useful anymore?
data:image/s3,"s3://crabby-images/aa44c/aa44cfbda773005898cc0ffb184ad365bfbec8cd" alt="Santiago Campuzano avatar"
To be honest, I’ve been working with Prometheus Operator Helm Chart
data:image/s3,"s3://crabby-images/aa44c/aa44cfbda773005898cc0ffb184ad365bfbec8cd" alt="Santiago Campuzano avatar"
Which is Amazing !
data:image/s3,"s3://crabby-images/aa44c/aa44cfbda773005898cc0ffb184ad365bfbec8cd" alt="Santiago Campuzano avatar"
It installs a full fledged Prometheus+Grafana+Alert Manager stack , ready to monitor a K8S cluster
data:image/s3,"s3://crabby-images/aa44c/aa44cfbda773005898cc0ffb184ad365bfbec8cd" alt="Santiago Campuzano avatar"
I’d recommend you going that way
data:image/s3,"s3://crabby-images/e471b/e471bc22e77bf7730ed2046efb99c305a4f8df4f" alt="btai avatar"
@Santiago Campuzano might be a stupid question, does it come w/ default dashboards?
data:image/s3,"s3://crabby-images/aa44c/aa44cfbda773005898cc0ffb184ad365bfbec8cd" alt="Santiago Campuzano avatar"
It’s not a stupid question
data:image/s3,"s3://crabby-images/aa44c/aa44cfbda773005898cc0ffb184ad365bfbec8cd" alt="Santiago Campuzano avatar"
It comes… and they are amazing
data:image/s3,"s3://crabby-images/e471b/e471bc22e77bf7730ed2046efb99c305a4f8df4f" alt="btai avatar"
i dont see any within the grafana ui
data:image/s3,"s3://crabby-images/aa44c/aa44cfbda773005898cc0ffb184ad365bfbec8cd" alt="Santiago Campuzano avatar"
There may be something with Prom Operator config
data:image/s3,"s3://crabby-images/9a0f8/9a0f8d41476ffe9065fbe0b98227d0cdcaa0cd11" alt="Erik Osterman (Cloud Posse) avatar"
kube-prometheus pre-dated prometheus-operator. it was developed originally by TicketMaster, then given to CoreOS, then given to the community.
data:image/s3,"s3://crabby-images/9a0f8/9a0f8d41476ffe9065fbe0b98227d0cdcaa0cd11" alt="Erik Osterman (Cloud Posse) avatar"
prometheus-operator is the way to go today
data:image/s3,"s3://crabby-images/e471b/e471bc22e77bf7730ed2046efb99c305a4f8df4f" alt="btai avatar"
thanks @Santiago Campuzano @Erik Osterman (Cloud Posse) thats what I was trying to figure out
data:image/s3,"s3://crabby-images/e471b/e471bc22e77bf7730ed2046efb99c305a4f8df4f" alt="btai avatar"
i noticed everything kube-prometheus
was not as frequently maintained anymore
data:image/s3,"s3://crabby-images/e471b/e471bc22e77bf7730ed2046efb99c305a4f8df4f" alt="btai avatar"
but this from the README confused me a little:
kube-prometheus combines the Prometheus Operator with a collection of manifests to help getting started with monitoring Kubernetes itself and applications running on top of it.
data:image/s3,"s3://crabby-images/aa44c/aa44cfbda773005898cc0ffb184ad365bfbec8cd" alt="Santiago Campuzano avatar"
Yep… actually.. there’s an open issue for that
data:image/s3,"s3://crabby-images/aa44c/aa44cfbda773005898cc0ffb184ad365bfbec8cd" alt="Santiago Campuzano avatar"
I am a noob who try to setup some monitoring for my cluster & apps. I lost 2 days of work trying to use kube-prometheus because of these lines: https://github.com/coreos/prometheus-operator/blo…
data:image/s3,"s3://crabby-images/aa44c/aa44cfbda773005898cc0ffb184ad365bfbec8cd" alt="Santiago Campuzano avatar"
You’re not the only one
data:image/s3,"s3://crabby-images/e471b/e471bc22e77bf7730ed2046efb99c305a4f8df4f" alt="btai avatar"
almost makes it sound like kube-prometheus
builds on top of prometheus-operator
, which made me think it was providing possibly default dashboards specific to kube
data:image/s3,"s3://crabby-images/e471b/e471bc22e77bf7730ed2046efb99c305a4f8df4f" alt="btai avatar"
haha thanks @Santiago Campuzano
data:image/s3,"s3://crabby-images/aa44c/aa44cfbda773005898cc0ffb184ad365bfbec8cd" alt="Santiago Campuzano avatar"
YW !
data:image/s3,"s3://crabby-images/2495f/2495fe62d3d2920120f045143fcc0623b2457a90" alt="kskewes avatar"
Um. Kube-prometheus is a jsonnet based project that bundles Prometheus operator and a ton of dashboards and alerts, the whole stack. It’s very much alive and maintainers are also maintainers of Prometheus etc.
data:image/s3,"s3://crabby-images/2495f/2495fe62d3d2920120f045143fcc0623b2457a90" alt="kskewes avatar"
Use Prometheus to monitor Kubernetes and applications running on Kubernetes - coreos/kube-prometheus
data:image/s3,"s3://crabby-images/2495f/2495fe62d3d2920120f045143fcc0623b2457a90" alt="kskewes avatar"
It was moved recently so that it could have it’s own releases, though running master is suggested (apps are versioned).
data:image/s3,"s3://crabby-images/c4007/c4007ac3f2ea7b77860a98a8551d584856b49862" alt="Zachary Loeber avatar"
I use both (maybe incorrectly?). I install the operator first then install kube-prometheus as it includes a bunch of bundled exporters, some decent starter prometheus starter config and a good set of default alerts.
data:image/s3,"s3://crabby-images/c4007/c4007ac3f2ea7b77860a98a8551d584856b49862" alt="Zachary Loeber avatar"
kube-prometheus helm chart is like never updated. The expectation is that you will get into the heady realm of jsonnet and build your own custom deployment or something.
data:image/s3,"s3://crabby-images/c4007/c4007ac3f2ea7b77860a98a8551d584856b49862" alt="Zachary Loeber avatar"
Personally ive had to clone and make minor edits to both projects to get a stable deployment for AKS clusters, yuk.
data:image/s3,"s3://crabby-images/e471b/e471bc22e77bf7730ed2046efb99c305a4f8df4f" alt="btai avatar"
@Zachary Loeber I run clusters in aws and azure (AKS). can you elaborate why you needed to make yucky changes to prom-operator specifically for AKS?
data:image/s3,"s3://crabby-images/2495f/2495fe62d3d2920120f045143fcc0623b2457a90" alt="kskewes avatar"
Helm is not maintained by project so I imagine it’s like all the other charts…
Jsonnet is a big step. But I like that you can change stuff. There are no limits and eventually weird differences between environments require ad hoc changes. Having a shared base then minimal patches and secrets files per environment seems to work. There’s some great work done in mixins that are bundled in and also available for including yourself so it’s pretty complete. If some vendor every manages to offer something as complete at a decent price would be a good thing!
2020-01-29
data:image/s3,"s3://crabby-images/c4007/c4007ac3f2ea7b77860a98a8551d584856b49862" alt="Zachary Loeber avatar"
So I’m sending alertmanager alerts to a webhook that triggers an MS Teams notification (though it could be slack or any other route) and I want to also send along an autogenerated link to kibana logs for the namespace. I know how to generate the link but I don’t know the best way to get the cluster specific external dns zone passed through the alerts to construct the link with.
data:image/s3,"s3://crabby-images/c4007/c4007ac3f2ea7b77860a98a8551d584856b49862" alt="Zachary Loeber avatar"
(so something like ‘kibana.<cluster.custom.internal.domain>”)
data:image/s3,"s3://crabby-images/c4007/c4007ac3f2ea7b77860a98a8551d584856b49862" alt="Zachary Loeber avatar"
Am I forced to do label rewrites and appending to make this happen or does alertmanager have any kind of situational awareness lookups it can do that I can tap into for such things?
data:image/s3,"s3://crabby-images/0704f/0704fa2c4de34bfc92a8ecd50096a4fa8404549a" alt="joshmyers avatar"
AFAIK - rewrites
data:image/s3,"s3://crabby-images/c4007/c4007ac3f2ea7b77860a98a8551d584856b49862" alt="Zachary Loeber avatar"
Thanks for confirming what I kinda suspected was the case @joshmyers