#sre (2020-04)
Prometheus, Prometheus Operator, Grafana, Kubernetes
Archive: https://archive.sweetops.com/monitoring/
2020-04-05
data:image/s3,"s3://crabby-images/e471b/e471bc22e77bf7730ed2046efb99c305a4f8df4f" alt="btai avatar"
Prometheus + EFS users, so nfs isn’t considered a supported storage for Prometheus i guess? Have you guys have any problems with data corruption/data loss?
data:image/s3,"s3://crabby-images/9a0f8/9a0f8d41476ffe9065fbe0b98227d0cdcaa0cd11" alt="Erik Osterman (Cloud Posse) avatar"
Dirty shutdowns will leave wall files around
data:image/s3,"s3://crabby-images/9a0f8/9a0f8d41476ffe9065fbe0b98227d0cdcaa0cd11" alt="Erik Osterman (Cloud Posse) avatar"
Also, I imagine if you had two Prometheus operators writing to the same exact file system you would have corruption
data:image/s3,"s3://crabby-images/e471b/e471bc22e77bf7730ed2046efb99c305a4f8df4f" alt="btai avatar"
Oof. #2 could happen when I do cluster cutovers
data:image/s3,"s3://crabby-images/e471b/e471bc22e77bf7730ed2046efb99c305a4f8df4f" alt="btai avatar"
When I spin up a new cluster. There’s a short period of time that new cluster and the old cluster both have prom-operator talking to the same EFS
data:image/s3,"s3://crabby-images/9a0f8/9a0f8d41476ffe9065fbe0b98227d0cdcaa0cd11" alt="Erik Osterman (Cloud Posse) avatar"
Yup, that would be my guess. That’s going to lead to corruption.
2020-04-15
data:image/s3,"s3://crabby-images/9a0f8/9a0f8d41476ffe9065fbe0b98227d0cdcaa0cd11" alt="Erik Osterman (Cloud Posse) avatar"
Open Source HTTP Reverse Proxy Cache and Time Series Dashboard Accelerator - tricksterproxy/trickster
data:image/s3,"s3://crabby-images/9a0f8/9a0f8d41476ffe9065fbe0b98227d0cdcaa0cd11" alt="Erik Osterman (Cloud Posse) avatar"
Learned about this today in the kubernetes office hours
data:image/s3,"s3://crabby-images/9a0f8/9a0f8d41476ffe9065fbe0b98227d0cdcaa0cd11" alt="Erik Osterman (Cloud Posse) avatar"
Kubernetes Cost Allocation and Capacity Planning Analytics Tool. Built-in hourly, daily, monthly reports - Prometheus exporter - Grafana dashboard. - rchakode/kube-opex-analytics
2020-04-16
data:image/s3,"s3://crabby-images/f45ee/f45eef3d6288e2ea0683bb6f37c32e4f596a2278" alt="Abel Luck avatar"
Anyone know of any projects out there that would support longer-term prometheus metrics storage for small deployments?
Thanos is much to complex for us. Timescaledb seems promising, but cannot be used with RDS.
We don’t need HA.
Historical rollups like datadog would be a huge plus.
data:image/s3,"s3://crabby-images/9a0f8/9a0f8d41476ffe9065fbe0b98227d0cdcaa0cd11" alt="Erik Osterman (Cloud Posse) avatar"
EFS works like a charm
data:image/s3,"s3://crabby-images/b82ef/b82efa31774b02e0495c3fd9593957af087dfba0" alt="Joe Presley avatar"
What’s EFS? Google only turns up ways to monitor AWS’s EFS.
data:image/s3,"s3://crabby-images/2ba4c/2ba4c862fb6f67f8701004281ba5028106dba6a8" alt="Zach avatar"
Amazon Elastic File System (Amazon EFS) provides simple, scalable, elastic file storage for use with AWS Cloud services and on-premises resources. It scales elastically on demand without disrupting applications, growing and shrinking automatically as you add and remove files. Amazon EFS file systems are distributed across an unconstrained number of storage servers, enabling file systems to grow to petabyte-scale providing simultaneous access to your data from Amazon EC2 instances and on-premises servers.
data:image/s3,"s3://crabby-images/b82ef/b82efa31774b02e0495c3fd9593957af087dfba0" alt="Joe Presley avatar"
I understand what AWS’s EFS is, but @Erik Osterman (Cloud Posse) seems to be referring to a monitoring application.
data:image/s3,"s3://crabby-images/b82ef/b82efa31774b02e0495c3fd9593957af087dfba0" alt="Joe Presley avatar"
It’s possible I misunderstood and that he meant that EFS works well for storing metrics.
data:image/s3,"s3://crabby-images/9a0f8/9a0f8d41476ffe9065fbe0b98227d0cdcaa0cd11" alt="Erik Osterman (Cloud Posse) avatar"
We use prometheus-operator on EKS with EFS (Amazon’s managed NFS offering).
data:image/s3,"s3://crabby-images/f45ee/f45eef3d6288e2ea0683bb6f37c32e4f596a2278" alt="Abel Luck avatar"
EFS has worked fine? I remember the prometheus team recommending avoiding NFS/EFS due to certain POSIX non-compliance issues
data:image/s3,"s3://crabby-images/9a0f8/9a0f8d41476ffe9065fbe0b98227d0cdcaa0cd11" alt="Erik Osterman (Cloud Posse) avatar"
But EFS can it be put in the same bucket as general NFS.
data:image/s3,"s3://crabby-images/9a0f8/9a0f8d41476ffe9065fbe0b98227d0cdcaa0cd11" alt="Erik Osterman (Cloud Posse) avatar"
Its actually posix compliant
data:image/s3,"s3://crabby-images/9a0f8/9a0f8d41476ffe9065fbe0b98227d0cdcaa0cd11" alt="Erik Osterman (Cloud Posse) avatar"
Plus, you can easily scale IOPS.
2020-04-17
data:image/s3,"s3://crabby-images/f45ee/f45eef3d6288e2ea0683bb6f37c32e4f596a2278" alt="Abel Luck avatar"
We’re ready to move from a homegrown alert system to a more “proper” service like pagerduty/victorops/opsgenie. Does this group have any strong feelings one way or another about one of these (or another) services? Team of 4-8 engineers geo-distributed. We use prometheus/alertmanager and cloudwatch.
I find it kind of silly that in order to do end-to-end tests of the alerting system you have to add another SASS like Dead Man’s Snitch.
data:image/s3,"s3://crabby-images/0704f/0704fa2c4de34bfc92a8ecd50096a4fa8404549a" alt="joshmyers avatar"
I think it comes down to price offerings
data:image/s3,"s3://crabby-images/0704f/0704fa2c4de34bfc92a8ecd50096a4fa8404549a" alt="joshmyers avatar"
PD/VictorOps/Opsgenie are all prety similar in terms of offerings, with PD probably being the fullest featured
data:image/s3,"s3://crabby-images/0704f/0704fa2c4de34bfc92a8ecd50096a4fa8404549a" alt="joshmyers avatar"
Do they all have decent APIs and client tooling for automation?
data:image/s3,"s3://crabby-images/30994/30994b883331c5aa17117e06b3f5d3e078824456" alt="sheldonh avatar"
I will say the UI and notes and all in pager duty was pretty disappointing. Kinda wanted basic formatting even markdown for my notes to make make a log of the steps and my first experiment with it wasn’t very impressive.
These are minor quibbles just saying I was hoping for a little more polish in logging and notes on issue.
data:image/s3,"s3://crabby-images/2495f/2495fe62d3d2920120f045143fcc0623b2457a90" alt="kskewes avatar"
Yes and only dead man’s snitch. Surprised alert systems like pager duty don’t offer this.
data:image/s3,"s3://crabby-images/0704f/0704fa2c4de34bfc92a8ecd50096a4fa8404549a" alt="joshmyers avatar"
What are people using to monitor the apps on Fargate? APM solution like NewRelic/DataDog?
data:image/s3,"s3://crabby-images/9a0f8/9a0f8d41476ffe9065fbe0b98227d0cdcaa0cd11" alt="Erik Osterman (Cloud Posse) avatar"
What did you end up doing?
data:image/s3,"s3://crabby-images/0704f/0704fa2c4de34bfc92a8ecd50096a4fa8404549a" alt="joshmyers avatar"
Not much yet, looking like DD. Biggest requirement is JVM metrics (Scala) and application profiling, which AWS don’t offer AFAIK….
2020-04-18
2020-04-20
2020-04-21
data:image/s3,"s3://crabby-images/9a0f8/9a0f8d41476ffe9065fbe0b98227d0cdcaa0cd11" alt="Erik Osterman (Cloud Posse) avatar"
Pro tip: subscribe to the RSS feed for status pages you depend on. Send those updates to a slack channel. https://slack.com/help/articles/218688467-Add-RSS-feeds-to-Slack e.g.
• https://www.githubstatus.com/history.rss
• https://status.aws.amazon.com/rss/ec2-us-east-1.rss
Have a favorite blog or news site? You can use Slack to subscribe to both RSS and Atom feeds and get updates in the Slack channel of your choice. Note: If you get an error when trying to add a fee…
2020-04-24
data:image/s3,"s3://crabby-images/bf8f0/bf8f0b464a9e679c5c844395abd5a1d8b73ba1a7" alt="James avatar"
anyone here using http://timber.io/
We make incredible logging tools for developers that help them debug Node, Elixir, Ruby, Python and Go applications.
data:image/s3,"s3://crabby-images/bf8f0/bf8f0b464a9e679c5c844395abd5a1d8b73ba1a7" alt="James avatar"
i am not so sure, but it looks a dead product
data:image/s3,"s3://crabby-images/bf8f0/bf8f0b464a9e679c5c844395abd5a1d8b73ba1a7" alt="James avatar"
documentation outdate . some broken links also
data:image/s3,"s3://crabby-images/bf8f0/bf8f0b464a9e679c5c844395abd5a1d8b73ba1a7" alt="James avatar"
but what they offer seem not bad.