#sre (2020-08)

Prometheus, Prometheus Operator, Grafana, Kubernetes

Archive: https://archive.sweetops.com/monitoring/

2020-08-03

Chris Fowles avatar
Chris Fowles

that’s awesome

2020-08-06

2020-08-07

Marcin Brański avatar
Marcin Brański

On recent #office-hours we talked about opsgenie automation. Today we released :tada: first version of our module to manage it with terraform , we currently use that to manage most of our opsgenie setup. https://github.com/cloudposse/terraform-opsgenie-incident-management

cloudposse/terraform-opsgenie-incident-management

Contribute to cloudposse/terraform-opsgenie-incident-management development by creating an account on GitHub.

Zach avatar

links in the readme for the examples lead to 404s

1

2020-08-08

2020-08-16

mado avatar

Just installed OpenShift dedicated on AWS, any better advice to install Instana APM monitoring tool on it? I wanna monitor application by Instana.

2020-08-23

msharma24 avatar
msharma24

Hello All, How can I monitor EMR job failure by Job Name for example I would like to receive alert ony when any job starting with name “Prod-XXX” fails on the cluster .

Zach avatar

I would look at EventBridge, you can get all the EMR cluster events off of it. Ship to a lambda or something, process the event and send another event to your alerting API.

1
msharma24 avatar
msharma24

Thanks @Zachary Loeber

Eric Berg avatar
Eric Berg

In my last gig, we actually wrote something that would query the AWS API for status for all of our EMR jobs and we posted events to Datadog, based on the results. This was 3 or 4 years ago that this was written, but it did give much better info about the status of our EMR jobs.

2020-08-24

    keyboard_arrow_up