SRE Weekly Issue #323

Articles

I chatted with Emily Arnott of Blameless for a solid hour about everything from the origins of this newsletter and how I make it, to my thoughts on SRE and where it’s going. Somehow she managed to fit it all into this article. Thanks, Emily!

Emily Arnott — Blameless

Failing Forward — How We Grow from Incidents

The section on TTR (Time To Recovery) really caught my eye, both by confirming that MTTR is generally not a useful metric, and also finding one case where TTR does seem to be predictive.

The Spotify engineering blog seems to be down as of this publishing, so here’s the archive.org version.

Clint Byrum — Spotify

Can SRE Bring Governance and Compliance into the Future?

SRE concepts apply wonderfully well to compliance and governance. Each field has a lot to learn from the other.

Jennifer Riggins — The New Stack

The not-so-obvious positive outcomes of great incident management

More than ever, we should all be focused on shipping great products, retaining high-demand engineers, and building trust with customers. And investing in a thoughtful incident management strategy is one way to get there. Let’s explore how.

Robert Ross — FireHydrant

Vanguard’s Iterative Enterprise SRE Transformation

At this week’s DevOps Enterprise Summit (DOES) Europe, Vanguard talked about how they made the move from traditional architecture to the majority in the cloud, adopted site reliability engineering and even built their own customer-facing SaaS.

Jennifer Riggins — The New Stack

How we deploy to production over 100 times a day

This article has a great discussion of the risks of larger, less frequent deploys. It goes on to explain how they transitioned to smaller and more frequent deploys while focusing on safety.

Will Sewell — Monzo

How to empower your team to own incident response

What makes this article special is its focus on addressing the common concerns that people have when you try to get them to own their code for its full lifecycle. It offers practical advice to win folks over.

Martha Lambert — incident.io

SREcon 2022 Americas Wrap Up

Sounds like there were some pretty great talks at SRECon. I gotta admit, I’m kinda having some FOMO.

Emily Arnott — Blameless

Outages

WhatsApp
Adidas Confirmed
Google Cloud Networking
SRE WEEKLY

A message from our sponsor, Rootly:

Articles

Outages

Related