SRE Weekly Issue #323

A message from our sponsor, Rootly:

Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging and adding responders, postmortem timeline, setting up reminders, and more. Book a demo (+ get a snazzy Rootly lego set):
https://rootly.com/demo/

Articles

I chatted with Emily Arnott of Blameless for a solid hour about everything from the origins of this newsletter and how I make it, to my thoughts on SRE and where it’s going. Somehow she managed to fit it all into this article. Thanks, Emily!

  Emily Arnott — Blameless

The section on TTR (Time To Recovery) really caught my eye, both by confirming that MTTR is generally not a useful metric, and also finding one case where TTR does seem to be predictive.

The Spotify engineering blog seems to be down as of this publishing, so here’s the archive.org version.

  Clint Byrum — Spotify

SRE concepts apply wonderfully well to compliance and governance. Each field has a lot to learn from the other.

  Jennifer Riggins — The New Stack

More than ever, we should all be focused on shipping great products, retaining high-demand engineers, and building trust with customers. And investing in a thoughtful incident management strategy is one way to get there. Let’s explore how.

  Robert Ross — FireHydrant

At this week’s DevOps Enterprise Summit (DOES) Europe, Vanguard talked about how they made the move from traditional architecture to the majority in the cloud, adopted site reliability engineering and even built their own customer-facing SaaS.

  Jennifer Riggins — The New Stack

This article has a great discussion of the risks of larger, less frequent deploys. It goes on to explain how they transitioned to smaller and more frequent deploys while focusing on safety.

  Will Sewell — Monzo

What makes this article special is its focus on addressing the common concerns that people have when you try to get them to own their code for its full lifecycle. It offers practical advice to win folks over.

  Martha Lambert — incident.io

Sounds like there were some pretty great talks at SRECon. I gotta admit, I’m kinda having some FOMO.

  Emily Arnott — Blameless

Outages

WhatsApp
Adidas Confirmed
Google Cloud Networking
SRE WEEKLY

Published
Categorized as SRE