SRE Weekly Issue #352

Articles

VOID 2022 Report Now Available

Incident duration and severity are not related, and we have the in-depth data to prove it.

It’s time for another VOID report! I’m glad this project is still going strong.

Courtney Nash — Verica

The US, UK, and EU Want to Regulate Cloud Reliability. Is That Necessary?

I haven’t been paying attention to the recent attempts to legislate cloud provider reliability, and this article was a great catch-up. There’s a lot going on here.

Jeff Martens — Metrist

The Law of Stretched [Cognitive] Systems

I’m still trying to figure out how I feel about this one, but I’m definitely glad I read it.

Fred Hebert

The Incident Benchmark Report from FireHydrant

FireHydrant published this report with statistics from over 50,000 incidents experienced by their customers.

FireHydrant

What every SRE should know about GNU/Linux shell related internals

Want to get a solid understanding of how the Linux shells work, including file descriptors, process management, and sessions? This one goes really deep with lots of example programs.

Viacheslav Biriukov

Introducing the Google Search Status Dashboard

Check it out, Google search finally has a proper status page!

Google

Awesome SLOs

It’s one of those “awesome ___” repos on GitHub, this time for resources about writing SLOs.

Steve Azzopardi (@steveazz)

Incident categories I’d like to see

If you’re going to classify incidents by “root cause”, try these on for size: “production pressure”, “goal conflicts”, and more in this article.

Lorin Hochstein

The Four One Zero Club: The crash of Pinnacle Airlines flight 3701

Sure, the pilots were engaging in an activity that could be considered dubious. But what’s really worth digging into in this air accident is how surprise may have led them to forget their training on how to recover stable flight.

A message from our sponsor, Rootly:

Articles

Related