SRE Weekly Issue #322

View on sreweekly.com

Bit of a short issue this week. This morning, I stepped on my phone, crushing it mightily beneath my bootheel. Unfortunately a lot of my automation for reviewing articles is on there… thank goodness I have functioning backups.

Articles

SRE and Fighting Games

What? Actually, it’s a pretty good analogy.

Emily Arnott — Blameless

Embedded SRE at Mercari

Mercari has this update to their previous article on their embedded SRE team with more details on how their embedding model works.

Taichi Nakashima — Mercari

Tail Latency Might Matter More Than You Think

Interesting things happen when you combine tail latency with a microservice architecture.

Marc Brooker

Reducing our pager load

Their starting point was paging for every single exception raised by their application. Here’s how they tempered that a bit to get a handle on their paging volume.

Lisa Karlin Curtis — incident.io

Google’s Site Reliability Engineering hierarchy (Remixed)

This article draws from the “SRE Hierarchy” in Google’s SRE book (which itself is a reference to Maslow’s hierarchy of needs). It recasts the SRE hierarchy as a path to maturity.

Ash P. — SREPath

Incident Report: Google Meet Livestream outage on April 25

Google posted this summary of an incident from late April. A configuration change had the unintended effect of causing livestream view requests to fail.

Google

Outages

Xbox

I don’t normally bother with game outages, but this one caught my eye. During the 4-day outage, customers were unable to play Xbox games that they had already purchased.

Twitter
Coinbase
SRE WEEKLY

A message from our sponsor, Rootly:

Articles

Outages

Related