Bit of a short issue this week. This morning, I stepped on my phone, crushing it mightily beneath my bootheel. Unfortunately a lot of my automation for reviewing articles is on there… thank goodness I have functioning backups.
Articles
What? Actually, it’s a pretty good analogy.
Emily Arnott — Blameless
Mercari has this update to their previous article on their embedded SRE team with more details on how their embedding model works.
Taichi Nakashima — Mercari
Interesting things happen when you combine tail latency with a microservice architecture.
Marc Brooker
Their starting point was paging for every single exception raised by their application. Here’s how they tempered that a bit to get a handle on their paging volume.
Lisa Karlin Curtis — incident.io
This article draws from the “SRE Hierarchy” in Google’s SRE book (which itself is a reference to Maslow’s hierarchy of needs). It recasts the SRE hierarchy as a path to maturity.
Ash P. — SREPath
Google posted this summary of an incident from late April. A configuration change had the unintended effect of causing livestream view requests to fail.
Outages
I don’t normally bother with game outages, but this one caught my eye. During the 4-day outage, customers were unable to play Xbox games that they had already purchased.