#sre (2020-04)
Prometheus, Prometheus Operator, Grafana, Kubernetes
Archive: https://archive.sweetops.com/monitoring/
2020-04-05
Prometheus + EFS users, so nfs isn’t considered a supported storage for Prometheus i guess? Have you guys have any problems with data corruption/data loss?
Dirty shutdowns will leave wall files around
Also, I imagine if you had two Prometheus operators writing to the same exact file system you would have corruption
Oof. #2 could happen when I do cluster cutovers
When I spin up a new cluster. There’s a short period of time that new cluster and the old cluster both have prom-operator talking to the same EFS
Yup, that would be my guess. That’s going to lead to corruption.
2020-04-15
Open Source HTTP Reverse Proxy Cache and Time Series Dashboard Accelerator - tricksterproxy/trickster
Learned about this today in the kubernetes office hours
Kubernetes Cost Allocation and Capacity Planning Analytics Tool. Built-in hourly, daily, monthly reports - Prometheus exporter - Grafana dashboard. - rchakode/kube-opex-analytics
2020-04-16
Anyone know of any projects out there that would support longer-term prometheus metrics storage for small deployments?
Thanos is much to complex for us. Timescaledb seems promising, but cannot be used with RDS.
We don’t need HA.
Historical rollups like datadog would be a huge plus.
EFS works like a charm
What’s EFS? Google only turns up ways to monitor AWS’s EFS.
Amazon Elastic File System (Amazon EFS) provides simple, scalable, elastic file storage for use with AWS Cloud services and on-premises resources. It scales elastically on demand without disrupting applications, growing and shrinking automatically as you add and remove files. Amazon EFS file systems are distributed across an unconstrained number of storage servers, enabling file systems to grow to petabyte-scale providing simultaneous access to your data from Amazon EC2 instances and on-premises servers.
I understand what AWS’s EFS is, but @Erik Osterman (Cloud Posse) seems to be referring to a monitoring application.
It’s possible I misunderstood and that he meant that EFS works well for storing metrics.
We use prometheus-operator on EKS with EFS (Amazon’s managed NFS offering).
EFS has worked fine? I remember the prometheus team recommending avoiding NFS/EFS due to certain POSIX non-compliance issues
But EFS can it be put in the same bucket as general NFS.
Its actually posix compliant
Plus, you can easily scale IOPS.
2020-04-17
We’re ready to move from a homegrown alert system to a more “proper” service like pagerduty/victorops/opsgenie. Does this group have any strong feelings one way or another about one of these (or another) services? Team of 4-8 engineers geo-distributed. We use prometheus/alertmanager and cloudwatch.
I find it kind of silly that in order to do end-to-end tests of the alerting system you have to add another SASS like Dead Man’s Snitch.
I think it comes down to price offerings
PD/VictorOps/Opsgenie are all prety similar in terms of offerings, with PD probably being the fullest featured
Do they all have decent APIs and client tooling for automation?
I will say the UI and notes and all in pager duty was pretty disappointing. Kinda wanted basic formatting even markdown for my notes to make make a log of the steps and my first experiment with it wasn’t very impressive.
These are minor quibbles just saying I was hoping for a little more polish in logging and notes on issue.
Yes and only dead man’s snitch. Surprised alert systems like pager duty don’t offer this.
What are people using to monitor the apps on Fargate? APM solution like NewRelic/DataDog?
What did you end up doing?
Not much yet, looking like DD. Biggest requirement is JVM metrics (Scala) and application profiling, which AWS don’t offer AFAIK….
2020-04-18
2020-04-20
2020-04-21
Pro tip: subscribe to the RSS feed for status pages you depend on. Send those updates to a slack channel. https://slack.com/help/articles/218688467-Add-RSS-feeds-to-Slack e.g.
• https://www.githubstatus.com/history.rss
• https://status.aws.amazon.com/rss/ec2-us-east-1.rss
Have a favorite blog or news site? You can use Slack to subscribe to both RSS and Atom feeds and get updates in the Slack channel of your choice. Note: If you get an error when trying to add a fee…
2020-04-24
anyone here using http://timber.io/
We make incredible logging tools for developers that help them debug Node, Elixir, Ruby, Python and Go applications.
i am not so sure, but it looks a dead product
documentation outdate . some broken links also
but what they offer seem not bad.