#sre (2020-04)

Prometheus, Prometheus Operator, Grafana, Kubernetes

Archive: https://archive.sweetops.com/monitoring/

2020-04-05

btai avatar

Prometheus + EFS users, so nfs isn’t considered a supported storage for Prometheus i guess? Have you guys have any problems with data corruption/data loss?

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

Dirty shutdowns will leave wall files around

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

Also, I imagine if you had two Prometheus operators writing to the same exact file system you would have corruption

btai avatar

Oof. #2 could happen when I do cluster cutovers

btai avatar

When I spin up a new cluster. There’s a short period of time that new cluster and the old cluster both have prom-operator talking to the same EFS

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

Yup, that would be my guess. That’s going to lead to corruption.

2020-04-15

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)
tricksterproxy/trickster

Open Source HTTP Reverse Proxy Cache and Time Series Dashboard Accelerator - tricksterproxy/trickster

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

Learned about this today in the kubernetes office hours

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)
rchakode/kube-opex-analytics

Kubernetes Cost Allocation and Capacity Planning Analytics Tool. Built-in hourly, daily, monthly reports - Prometheus exporter - Grafana dashboard. - rchakode/kube-opex-analytics

2020-04-16

Abel Luck avatar
Abel Luck

Anyone know of any projects out there that would support longer-term prometheus metrics storage for small deployments?

Thanos is much to complex for us. Timescaledb seems promising, but cannot be used with RDS.

We don’t need HA.

Historical rollups like datadog would be a huge plus.

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

EFS works like a charm

Joe Presley avatar
Joe Presley

What’s EFS? Google only turns up ways to monitor AWS’s EFS.

Zach avatar
Amazon Elastic File System (EFS) | Cloud File Storage

Amazon Elastic File System (Amazon EFS) provides simple, scalable, elastic file storage for use with AWS Cloud services and on-premises resources. It scales elastically on demand without disrupting applications, growing and shrinking automatically as you add and remove files. Amazon EFS file systems are distributed across an unconstrained number of storage servers, enabling file systems to grow to petabyte-scale providing simultaneous access to your data from Amazon EC2 instances and on-premises servers.

Joe Presley avatar
Joe Presley

I understand what AWS’s EFS is, but @Erik Osterman (Cloud Posse) seems to be referring to a monitoring application.

Joe Presley avatar
Joe Presley

It’s possible I misunderstood and that he meant that EFS works well for storing metrics.

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

We use prometheus-operator on EKS with EFS (Amazon’s managed NFS offering).

Abel Luck avatar
Abel Luck

EFS has worked fine? I remember the prometheus team recommending avoiding NFS/EFS due to certain POSIX non-compliance issues

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

But EFS can it be put in the same bucket as general NFS.

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

Its actually posix compliant

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

Plus, you can easily scale IOPS.

2020-04-17

Abel Luck avatar
Abel Luck

We’re ready to move from a homegrown alert system to a more “proper” service like pagerduty/victorops/opsgenie. Does this group have any strong feelings one way or another about one of these (or another) services? Team of 4-8 engineers geo-distributed. We use prometheus/alertmanager and cloudwatch.

I find it kind of silly that in order to do end-to-end tests of the alerting system you have to add another SASS like Dead Man’s Snitch.

joshmyers avatar
joshmyers

I think it comes down to price offerings

joshmyers avatar
joshmyers

PD/VictorOps/Opsgenie are all prety similar in terms of offerings, with PD probably being the fullest featured

joshmyers avatar
joshmyers

Do they all have decent APIs and client tooling for automation?

sheldonh avatar
sheldonh

I will say the UI and notes and all in pager duty was pretty disappointing. Kinda wanted basic formatting even markdown for my notes to make make a log of the steps and my first experiment with it wasn’t very impressive.

These are minor quibbles just saying I was hoping for a little more polish in logging and notes on issue.

kskewes avatar
kskewes

Yes and only dead man’s snitch. Surprised alert systems like pager duty don’t offer this.

joshmyers avatar
joshmyers

What are people using to monitor the apps on Fargate? APM solution like NewRelic/DataDog?

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)

What did you end up doing?

joshmyers avatar
joshmyers

Not much yet, looking like DD. Biggest requirement is JVM metrics (Scala) and application profiling, which AWS don’t offer AFAIK….

2020-04-18

2020-04-20

2020-04-21

Erik Osterman (Cloud Posse) avatar
Erik Osterman (Cloud Posse)
Add RSS feeds to Slack

Have a favorite blog or news site? You can use Slack to subscribe to both RSS and Atom feeds and get updates in the Slack channel of your choice. Note: If you get an error when trying to add a fee…

4

2020-04-24

James avatar

anyone here using http://timber.io/

Timber.io | Log Better

We make incredible logging tools for developers that help them debug Node, Elixir, Ruby, Python and Go applications.

James avatar

i am not so sure, but it looks a dead product

James avatar

documentation outdate . some broken links also

James avatar

but what they offer seem not bad.

    keyboard_arrow_up