SRE Weekly Issue #334

View on sreweekly.com

I’ll be on vacation starting next Sunday (yay!). That means the next two issues will be prepared in advance, so there won’t be an Outages section.

Articles

Handling third-party provider outages

Should you go multi-cloud? What should you do during an incident involving a third-party dependency? What about after? Read this one for all that and more.

Lisa Karlin Curtis — incident.io
Full disclosure: Fastly, my employer, is mentioned.

Common ground breakdown in Uvalde

An introduction to the concept of common ground breakdown, using the Uvalde shooting in the US as a case study.

Lorin Hochstein

r/sre – How do you handle weekly commitments during your on call rotation?

The comments section is full of some pretty great advice, including questions you can ask while interviewing to suss out whether the on-call culture is going to be livable.

u/dicksoutfoeharambe (and others) — reddit

Lessons from the TSB failure: a perfect storm of waterfall failures

From the archives, this is an analysis of a report on the 2018 major outage at TSB Bank in the UK.

Jon Stevens-Hall

What is Backoff For?

You can determine whether backoff will actually help your system, and this article does a great job of telling you how.

Marc Brooker

An Incident Command Training Handbook

I’ve read (and written) plenty of IC training guides, but this is the first time I’ve come across the concept of a “Hands-Off Update”. I’m definitely going to use that!

Dan Slimmon

No observability without theory

This is a really great exlpanation of observability from an angle I haven’t seen before.

a metric dashboard only contributes to observability if its reader can interpret the curves they’re seeing within a theory of the system under study.

Dan Slimmon

Outages

Twitter
Google Search

Did you catch the Google search outage? I’ve never seen one like it — that’s how rare they are. Google shared a tidbit of information about what went wrong — and it wasn’t the datacenter explosion folks speculated about.

Peloton
SRE WEEKLY

A message from our sponsor, Rootly:

Articles

Outages

Related