SRE Weekly Issue #360

A message from our sponsor, Rootly:

Manage incidents directly from Slack with Rootly 🚒.

Rootly automates manual tasks like creating an incident channel, Jira ticket and Zoom rooms, inviting responders, creating statuspage updates, postmortem timelines and more. Want to see why companies like Canva and Grammarly love us?:

https://rootly.com/demo/

Articles

Another case of “pilot error” vs “systemic problems”. It’s interesting to me how the organizational pressures the pilots were facing mirror many stories I’ve seen in tech firms, especially startups.

  Admiral Cloudberg

This article recommends improving MTTA (mean time to assemble) by modeling our dispatch systems on the emergency services for a large city.

  Robert Ross

Lots of great stuff to aspire to, with a big emphasis on observability.

   Adriana Villela and Ana Margarita Medina — The New Stack
  Full disclosure: Honeycomb, my employer, is mentioned.

I really love the concept of “incident legalism” introduced in this article. I’ve definitely been there.

Anyone who has coordinated over Slack during the incident has felt the pain of the ambiguity of Slack messages.

But communicating with specificity has a cost.

  Lorin Hochstein

I remember this one! I was trying to listen to music at the time. Turns out it was DNS (and a git repo).

  Erik Lindblad — Spotify

If you’re gonna group your incidents, use tags, not exclusive groups.

  Lorin Hochstein

SRE WEEKLY

Published
Categorized as SRE