SRE Weekly Issue #319

We can learn from the process another engineer takes to debug a problem. But often, a ticket or problem description is stripped of the process and just has the answer, hampering learning.

Lorin Hochstein — The ReadME Project (GitHub)

The Merpay SRE Team: Past and future

We’re still not 100% there as a team, but I hope this article will serve as a reference for anyone who might create an SRE team in the future.

@tjun — Mercari

Incident Analysis 101: Techniques for Sharing Incident Findings

This article gives 6 different ways to organize the findings from your retrospective to share with different audiences.

Vanessa Huerta Granda — Jeli

Gyros and Gimbals, oh my! — The James Webb Space Telescope

There’s a great reliability story in the way that the Hubble telescope and the Apollo missions used gimbals — and in the way that the JWST doesn’t.

Robert Barron — IBM

Outages

Hulu
IRS

The US Internal Revenue Service’s systems went down on the due date for tax filing.

Instagram
SRE WEEKLY

A message from our sponsor, Rootly:

Articles

Outages

Related