#sre (2022-05)
Prometheus, Prometheus Operator, Grafana, Kubernetes
Archive: https://archive.sweetops.com/monitoring/
2022-05-26
I wish datadog had put more effort into the core features of an incident tool. Their incident tool is refreshingly simple, but missing some key things for it to be viable.
Was hoping to get folks on it as a “simple version of OpsGenie” but it’s missing:
• Dedicated app/alerting (at least as for incident stuff)
• Easy slack integration. If i open in a channel, I can’t have all updates piped through. I have to have it create a dedicated channel and at my place that’s not possible.
• Links to other things in datadog don’t automatically prettify.
• No escalation policy/team schedule for handling.
So many things missing. Seems like it would be really nice experience being in a single place if wasn’t just a barebones way to organize a chat.
Not to mention it’s missing terraform support :-)
Is there an API to manage it?
Wow. Good point. I’m using pulumi for datadog monitors and didn’t think of that. Looking now
Nope.
Really a fail.
I’m about to switch over to OpsGenie since my current $work
doesn’t use PagerDuty. PagerDuty was a freaking nightmare with terraform, so I’m hoping OpsGenie makes this a better experience.
Overall what I find the hardest is coming up with opinion on how to manage it. There are a hundred ways to do things in OpsGenie
No canonical way
Now we have that for our purposes with OpsGenie so I am happy
Would welcome anything/article on tips.
I’m just trying to get a team who’s never done organized incident handling into something.
Right now my best option is just a slack workflow with an incident thread. I want to expose the workflow issues in a much more clear way though and make it as simple as I can. OpsGenie seems like the best option right now to make things very trackable
limited Terraform support, I’m adding accounts automatically to Datadog with integration resources.
something I hate is that you need a lot of stuff to ship logs out to datadog, I wish there was something easier, although you can use terraform or cloudformation to ship logs
but to be fair thier SIEM and CSPM both are awesome, comparing to NewRelic, but I will take a look at OpsGenie