#sre (2020-08)
Prometheus, Prometheus Operator, Grafana, Kubernetes
Archive: https://archive.sweetops.com/monitoring/
2020-08-03
data:image/s3,"s3://crabby-images/9f7d3/9f7d37e6df4fb280d718c728e563fdba7ce5b9ba" alt="Chris Fowles avatar"
that’s awesome
2020-08-06
2020-08-07
data:image/s3,"s3://crabby-images/89018/89018ad8dde0ee3728e9eec41a81bc510865f9bb" alt="Marcin Brański avatar"
On recent #office-hours we talked about opsgenie
automation. Today we released :tada: first version of our module to manage it with terraform
, we currently use that to manage most of our opsgenie
setup.
https://github.com/cloudposse/terraform-opsgenie-incident-management
Contribute to cloudposse/terraform-opsgenie-incident-management development by creating an account on GitHub.
data:image/s3,"s3://crabby-images/9a0f8/9a0f8d41476ffe9065fbe0b98227d0cdcaa0cd11" alt="Erik Osterman (Cloud Posse) avatar"
See recording: https://cloudposse.wistia.com/medias/9d4ase4qjy
data:image/s3,"s3://crabby-images/2ba4c/2ba4c862fb6f67f8701004281ba5028106dba6a8" alt="Zach avatar"
2020-08-08
2020-08-16
data:image/s3,"s3://crabby-images/8d7d2/8d7d21bfed29fe20b3ae51c69817cc3a4e77c91e" alt="mado avatar"
Just installed OpenShift dedicated on AWS, any better advice to install Instana APM monitoring tool on it? I wanna monitor application by Instana.
2020-08-23
data:image/s3,"s3://crabby-images/86832/8683298b93c2f744980b840c62f6cee1e51fb509" alt="msharma24 avatar"
Hello All, How can I monitor EMR job failure by Job Name for example I would like to receive alert ony when any job starting with name “Prod-XXX” fails on the cluster .
data:image/s3,"s3://crabby-images/2ba4c/2ba4c862fb6f67f8701004281ba5028106dba6a8" alt="Zach avatar"
I would look at EventBridge, you can get all the EMR cluster events off of it. Ship to a lambda or something, process the event and send another event to your alerting API.
data:image/s3,"s3://crabby-images/2ba4c/2ba4c862fb6f67f8701004281ba5028106dba6a8" alt="Zach avatar"
Lists the AWS services and event types supported by Amazon EventBridge.
data:image/s3,"s3://crabby-images/86832/8683298b93c2f744980b840c62f6cee1e51fb509" alt="msharma24 avatar"
Thanks @Zachary Loeber
data:image/s3,"s3://crabby-images/56555/565555f1bf8827aeb2cf27e19cca07b056239417" alt="Eric Berg avatar"
In my last gig, we actually wrote something that would query the AWS API for status for all of our EMR jobs and we posted events to Datadog, based on the results. This was 3 or 4 years ago that this was written, but it did give much better info about the status of our EMR jobs.