#monitoring (2020-01)

Prometheus, Prometheus Operator, Grafana, Kubernetes

Archive: https://archive.sweetops.com/monitoring/

2020-01-30

2020-01-29

Zachary Loeber avatar
Zachary Loeber

So I’m sending alertmanager alerts to a webhook that triggers an MS Teams notification (though it could be slack or any other route) and I want to also send along an autogenerated link to kibana logs for the namespace. I know how to generate the link but I don’t know the best way to get the cluster specific external dns zone passed through the alerts to construct the link with.

Zachary Loeber avatar
Zachary Loeber

(so something like ‘kibana.<cluster.custom.internal.domain>”)

Zachary Loeber avatar
Zachary Loeber

Am I forced to do label rewrites and appending to make this happen or does alertmanager have any kind of situational awareness lookups it can do that I can tap into for such things?

joshmyers avatar
joshmyers

AFAIK - rewrites

Zachary Loeber avatar
Zachary Loeber

Thanks for confirming what I kinda suspected was the case @joshmyers

2020-01-27

btai avatar

whats the difference between kube-prometheus and prometheus-operator? I’m assuming prom operator is completely bare bones while kube-prometheus has a default baseline of dashboards and monitors?

Santiago Campuzano avatar
Santiago Campuzano
coreos/prometheus-operator

Prometheus Operator creates/configures/manages Prometheus clusters atop Kubernetes - coreos/prometheus-operator

btai avatar
The stable/prometheus-operator helm chart provides a similar feature set to kube-prometheus. 
btai avatar

this tells me kube-prometheus is probably not super useful anymore?

Santiago Campuzano avatar
Santiago Campuzano

To be honest, I’ve been working with Prometheus Operator Helm Chart

Santiago Campuzano avatar
Santiago Campuzano

Which is Amazing !

Santiago Campuzano avatar
Santiago Campuzano

It installs a full fledged Prometheus+Grafana+Alert Manager stack , ready to monitor a K8S cluster

Santiago Campuzano avatar
Santiago Campuzano

I’d recommend you going that way

btai avatar

@Santiago Campuzano might be a stupid question, does it come w/ default dashboards?

Santiago Campuzano avatar
Santiago Campuzano

It’s not a stupid question

Santiago Campuzano avatar
Santiago Campuzano

It comes… and they are amazing

btai avatar

i dont see any within the grafana ui

Santiago Campuzano avatar
Santiago Campuzano

There may be something with Prom Operator config

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

kube-prometheus pre-dated prometheus-operator. it was developed originally by TicketMaster, then given to CoreOS, then given to the community.

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

prometheus-operator is the way to go today

btai avatar

thanks @Santiago Campuzano @Erik Osterman (Cloud Posse) thats what I was trying to figure out

btai avatar

i noticed everything kube-prometheus was not as frequently maintained anymore

btai avatar

but this from the README confused me a little:

kube-prometheus combines the Prometheus Operator with a collection of manifests to help getting started with monitoring Kubernetes itself and applications running on top of it.
Santiago Campuzano avatar
Santiago Campuzano

Yep… actually.. there’s an open issue for that

Santiago Campuzano avatar
Santiago Campuzano
Confusing doc prometheus-operator vs kube-prometheus · Issue #2619 · coreos/prometheus-operator

I am a noob who try to setup some monitoring for my cluster & apps. I lost 2 days of work trying to use kube-prometheus because of these lines: https://github.com/coreos/prometheus-operator/blo

Santiago Campuzano avatar
Santiago Campuzano

You’re not the only one

btai avatar

almost makes it sound like kube-prometheus builds on top of prometheus-operator, which made me think it was providing possibly default dashboards specific to kube

1
btai avatar

haha thanks @Santiago Campuzano

Santiago Campuzano avatar
Santiago Campuzano

YW !

kskewes avatar
kskewes

Um. Kube-prometheus is a jsonnet based project that bundles Prometheus operator and a ton of dashboards and alerts, the whole stack. It’s very much alive and maintainers are also maintainers of Prometheus etc.

kskewes avatar
kskewes
coreos/kube-prometheus

Use Prometheus to monitor Kubernetes and applications running on Kubernetes - coreos/kube-prometheus

kskewes avatar
kskewes

It was moved recently so that it could have it’s own releases, though running master is suggested (apps are versioned).

Zachary Loeber avatar
Zachary Loeber

I use both (maybe incorrectly?). I install the operator first then install kube-prometheus as it includes a bunch of bundled exporters, some decent starter prometheus starter config and a good set of default alerts.

Zachary Loeber avatar
Zachary Loeber

kube-prometheus helm chart is like never updated. The expectation is that you will get into the heady realm of jsonnet and build your own custom deployment or something.

Zachary Loeber avatar
Zachary Loeber

Personally ive had to clone and make minor edits to both projects to get a stable deployment for AKS clusters, yuk.

btai avatar

@Zachary Loeber I run clusters in aws and azure (AKS). can you elaborate why you needed to make yucky changes to prom-operator specifically for AKS?

kskewes avatar
kskewes

Helm is not maintained by project so I imagine it’s like all the other charts…

Jsonnet is a big step. But I like that you can change stuff. There are no limits and eventually weird differences between environments require ad hoc changes. Having a shared base then minimal patches and secrets files per environment seems to work. There’s some great work done in mixins that are bundled in and also available for including yourself so it’s pretty complete. If some vendor every manages to offer something as complete at a decent price would be a good thing!

2020-01-17

2020-01-16

2020-01-15

Christopher avatar
Christopher

Sorry, newbie question… If I wanted to diagnose where a memory usage is going in a PHP application, is that something a tool like New Relic APM can do for me?

Pierre Humberdroz avatar
Pierre Humberdroz

New Relic APM does not really show these kind of things but it gives you time spans for how long it takes for certain things to load.

Christopher avatar
Christopher

Ahh, okay.. Thanks, that probably won’t help me then. I found another [blackfire.io> which looks like it might help instead </i](http://blackfire.io)

Pierre Humberdroz avatar
Pierre Humberdroz

Google’s Stackdriver is the only tool that I know that does profiling on the fly of your applications and this would still mean it only works if you have a single request in a timespan to check.

Christopher avatar
Christopher

I see. I might have to come about this a different way then .. I have no idea where the problem is coming from, so cannot currently isolate it to a single request.

Pierre Humberdroz avatar
Pierre Humberdroz

maybe you can explain what is happening and I get an idea how you could tackle this issue

Christopher avatar
Christopher

Essentially, we have a WordPress website running on a VPS… And, every day, the log files are full of loads of “exhausted memory” errors in the logs. It only happens in production (probably because of the number of requests, or work load). There is very little to indicate what has caused it, the only mention is a file called wp-db.php which is a PHP class to interact with MySQL.

Increasing the memory limit for the process does not solve it, it just eats up all of that too.

Christopher avatar
Christopher

I’m mostly a front end developer, so I’m quite far out of my depth here, so sorry about that haha

Pierre Humberdroz avatar
Pierre Humberdroz

did you verify that the memory limit is higher ? with phpinfo or the likes?

Christopher avatar
Christopher

Yep

Christopher avatar
Christopher

It was previous 512M, upped it to 1G

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

so wp-db.php is the ORM for WP. Wouldn’t surprise me if some query is trying to load all records into memory.

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

How large is the blog? Are we truly certain that 1gb is enough for a poorly written query?

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

You mention that it only happens in production. Does staging have an equal dataset?

Christopher avatar
Christopher

The amount of content between staging & production is quite similar. It’s an e-commerce store using WooCommerce. There’s about 500 orders per day. With a total of about 180,000 orders on the site.

It’s possible that 1G is not big enough. I could up this to 4G and see if this help. I can’t imagine it needs to be more than 4G. I’ll see if that helps the situation shortly. I’ll also look to see if I can find any rogue queries loading everything in.

Thanks everyone btw

Pierre Humberdroz avatar
Pierre Humberdroz

I ran into a similar issue there was a Cronjob trying to generate some kind of report across all orders but this was 5 years ago

Christopher avatar
Christopher

Alright, 4G wasn’t enough either!

I’ll see if I can find some information around the queries. Perhaps I can find a pattern

Pierre Humberdroz avatar
Pierre Humberdroz

are you on nginx or apache2

Christopher avatar
Christopher

Litespeed

Pierre Humberdroz avatar
Pierre Humberdroz

does it put out logs compatible to nginx or apache ?

Christopher avatar
Christopher

Honestly, I have no idea. If I paste you one of the lines from the log file, would that help?

Christopher avatar
Christopher
2020-01-16T10:39:19+00:00 CRITICAL Out of memory (allocated 1556828160) (tried to allocate 4096 bytes) in /home/website/public_html/releases/1579104310/web/wp/wp-includes/wp-db.php on line 2007

2020-01-16T10:41:19+00:00 CRITICAL Out of memory (allocated 1561346048) (tried to allocate 58720264 bytes) in /home/website/public_html/releases/1579104310/web/wp/wp-includes/wp-db.php on line 2007
Pierre Humberdroz avatar
Pierre Humberdroz

looks like apache logs

Pierre Humberdroz avatar
Pierre Humberdroz

did you ever look at this with something like kibana and analyse the looks that way?

Christopher avatar
Christopher

That’s getting way beyond my knowledge of this, and what’s setup.

I inherited this project, mostly do front-end development work, and only get to spend 2 days a month maximum on it.

Christopher avatar
Christopher

I mostly hoped there was an easy solution if I am honest.

I appreciate all your help by the way, i’m learning a lot of new stuff based on your comments.

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

By the looks of it raising the memory limit did not help either because it didn’t apply (there are many ways to set it in php), it was unset or changed by something else, or the server simply doesn’t have enough ram.

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

Because it crashed at 1.5gb total requested, which is lower than the limit you set

Roderik van der Veer avatar
Roderik van der Veer

New Relic has a PHP agent which is actually very useful. It will also show you slow queries etc: https://docs.newrelic.com/docs/agents/php-agent/getting-started/introduction-new-relic-php

Introduction to New Relic for PHP | New Relic Documentation

For an overview of New Relic’s PHP agent (compatibility, requirements, installation, configuration, troubleshooting, known issues), start here.

Roderik van der Veer avatar
Roderik van der Veer

Switched technologies and i’m still missing this level of tracing in nodejs

Pierre Humberdroz avatar
Pierre Humberdroz

Elastic APM has the level of tracing @Roderik van der Veer for nodejs

Roderik van der Veer avatar
Roderik van der Veer

It has the same limitations as the NR one. In PHP, it can show you for each request, what call (written by you, a dependency or, and this is important, php built in funcion) takes how long. The nodejs APM solutions can show you, the route, a database call. but not for example a call to nodejs crypto which takes forever.

Roderik van der Veer avatar
Roderik van der Veer

but wil give it a go, because it does look nice TBH

Pierre Humberdroz avatar
Pierre Humberdroz

you can define custom spans in that case.

Christopher avatar
Christopher

@Erik Osterman (Cloud Posse) oh, sorry i just posted an old line from the logs as an example of what they look like. It definitely did increase to 4GB as I have some entries in the log that exhausted all of that. Sorry for the confusion.

:--1:1

2020-01-02

    keyboard_arrow_up