This is a review of Marianne Bellotti’s Kill It With Fire a book about modernizing legacy systems. It focuses heavily on operational concepts and “the system around the system”, with a heavy SRE influence.
Laura Nolan — ;login:
Originally drafted in 2016, this blog post is even more relevant now. Beyond just the “why”, it has several ideas for interview questions to get you started.
Tell a good story, and you can make things happen.
As SREs, we often know what needs to be done, but convincing others is a hard-won skill.
In this video report of a commercial aviation accident, there’s a neat discussion of resiliency toward the end. There were several other layers of protection that (probably) would have caught and prevented this incident if the A320 captain hadn’t intervened. And even though no accident occurred, there was still a “near miss” investigation.
Although conversation about observability often ignores SREs, SREs have a central role to play in observability success.
Quentin Rousseau — Rootly
In a microservice architecture, having retries several levels deep can be a recipe for nastiness.
Oren Eini — RavenDB
This report has some detail on two major incidents experienced by GitHub last month.
Scott Sanders — GitHub
AWS (Japan region)
Google Cloud Pub/Sub