The game Last Epoch launched in February, and they had a rocky start. This huge retrospective post tells the story of what happened and how they fixed it.
EHG_Kain — Last Epoch
Cloudflare’s Phoenix system can find and recover failed servers, reducing toil.
Jet Mariscal, Aakash Shah, and Yilin Xiong — Cloudflare
More than just another glossary of SL*s, this one also has examples and best practices.
Sara Miteva — Checkly
Spurred from a question in the SRECon attendee survey, this one really gets you thinking: how does the current “generation” of SREs differ from those that came before?
Paige — PagerDuty
This one’s about finding out what execs need in incidents and figuring out how to get everone’s needs met.
Chris Evans — incident.io
This post explains how Cloudflare gathers information about their alerts and improves them to benefit reliability and on-call health.
Monika Singh — Cloudflare
This one contains formulas for calculating compound SLOs when downstream dependencies are parallel or serial.
Alex Ewerlöf
SRE WEEKLY