SRE Weekly Issue #298

View on sreweekly.com

Email subscribers, my apologies for the double-send last week. I upgraded WordPress and subsequently further cemented my distrust of all version upgrades ever.

I carefully tested a fix in staging before rolling it out gradually in preparation for this week’s issue. Just kidding, I hacked on it live until I got it fixed. Sorry about all those testing tweets. #testinproduction #yolo #SREWeeklydoesnotpracticeSRE

Articles

Google Cloud Platform November 16th Outage Follow-up

This is Google’s detailed report from their outage last week. This one’s really worth a read; I promise you won’t be disappointed!

Google

OOPS writeups

I really like this guide and template for writing incident reports. Each section comes with an explanation of what goes there with examples.

Lorin Hochstein

How Reliability and Product Teams Collaborate at Booking.com

Booking.com developed their Reliability Collaboration Model to guide the engagement between SRE and product development teams and the responsibilities assigned to each.

Emmanuel Goossaert — Booking.com

Ably best-practices to optimize on-call shift rotations

Especially timely now, in the thick of the holiday on-call period.

James Frost — Ably

6 Steps SREs Should Take to Prepare for Black Friday and Cyber Monday 2021

Great tips. I hope your Black Friday / Cyber Monday is going well!

Quentin Rousseau — Rootly

This article is published by my sponsor, Rootly, but their sponsorship did not influence its inclusion in this issue.

What SRE is not

I thought it might be better to try a new approach: defining what SRE was by looking at what it’s not. Or to put it another way, what can you remove from SRE and have it still be SRE?

Niall Murphy

“What could we have done differently?”

Instead of asking that question this article urges understanding what happened.

Another reason that imagining future scenarios is better that counterfactuals about past scenarios is that our system in the future is different from the one in the past.

Lorin Hochstein

Outages

GitHub CoinbaseSRE WEEKLY

A message from our sponsor, Rootly:

Articles

Outages

Related