SRE Weekly Issue #352

A message from our sponsor, Rootly:

Manage incidents directly from Slack with Rootly 🚒.

Rootly automates manual tasks like creating an incident channel, Jira ticket and Zoom rooms, inviting responders, creating statuspage updates, postmortem timelines and more. Want to see why companies like Canva and Grammarly love us?:

https://rootly.com/demo/

Articles

Incident duration and severity are not related, and we have the in-depth data to prove it.

It’s time for another VOID report! I’m glad this project is still going strong.

  Courtney Nash — Verica

I haven’t been paying attention to the recent attempts to legislate cloud provider reliability, and this article was a great catch-up. There’s a lot going on here.

  Jeff Martens — Metrist

I’m still trying to figure out how I feel about this one, but I’m definitely glad I read it.

  Fred Hebert

FireHydrant published this report with statistics from over 50,000 incidents experienced by their customers.

  FireHydrant

Want to get a solid understanding of how the Linux shells work, including file descriptors, process management, and sessions? This one goes really deep with lots of example programs.

  Viacheslav Biriukov

Check it out, Google search finally has a proper status page!

  Google

It’s one of those “awesome ___” repos on GitHub, this time for resources about writing SLOs.

  Steve Azzopardi (@steveazz)

If you’re going to classify incidents by “root cause”, try these on for size: “production pressure”, “goal conflicts”, and more in this article.

  Lorin Hochstein

Sure, the pilots were engaging in an activity that could be considered dubious. But what’s really worth digging into in this air accident is how surprise may have led them to forget their training on how to recover stable flight.

More on the same accident:

Wikipedia
Mentour Pilot

  Admiral Cloudberg

SRE WEEKLY

Published
Categorized as SRE