#sre (2020-08)
Prometheus, Prometheus Operator, Grafana, Kubernetes
Archive: https://archive.sweetops.com/monitoring/
2020-08-03
that’s awesome
2020-08-06
2020-08-07
On recent #office-hours we talked about opsgenie
automation. Today we released :tada: first version of our module to manage it with terraform
, we currently use that to manage most of our opsgenie
setup.
https://github.com/cloudposse/terraform-opsgenie-incident-management
Contribute to cloudposse/terraform-opsgenie-incident-management development by creating an account on GitHub.
See recording: https://cloudposse.wistia.com/medias/9d4ase4qjy
2020-08-08
2020-08-16
Just installed OpenShift dedicated on AWS, any better advice to install Instana APM monitoring tool on it? I wanna monitor application by Instana.
2020-08-23
Hello All, How can I monitor EMR job failure by Job Name for example I would like to receive alert ony when any job starting with name “Prod-XXX” fails on the cluster .
I would look at EventBridge, you can get all the EMR cluster events off of it. Ship to a lambda or something, process the event and send another event to your alerting API.
Lists the AWS services and event types supported by Amazon EventBridge.
Thanks @Zachary Loeber
In my last gig, we actually wrote something that would query the AWS API for status for all of our EMR jobs and we posted events to Datadog, based on the results. This was 3 or 4 years ago that this was written, but it did give much better info about the status of our EMR jobs.