SRE Weekly Issue #286

A message from our sponsor, StackHawk:

Trying to scale AppSec across engingeering is no joke. Check out the 3 main reasons developers struggle with AppSec and how to make it better.
https://sthwk.com/3-reasons

Articles

This is a review of Marianne Bellotti’s Kill It With Fire a book about modernizing legacy systems. It focuses heavily on operational concepts and “the system around the system”, with a heavy SRE influence.

Laura Nolan — ;login:

Originally drafted in 2016, this blog post is even more relevant now. Beyond just the “why”, it has several ideas for interview questions to get you started.

Charity Majors

Tell a good story, and you can make things happen.

As SREs, we often know what needs to be done, but convincing others is a hard-won skill.

Lorin Hochstein

In this video report of a commercial aviation accident, there’s a neat discussion of resiliency toward the end. There were several other layers of protection that (probably) would have caught and prevented this incident if the A320 captain hadn’t intervened. And even though no accident occurred, there was still a “near miss” investigation.

Mentor Pilot

Although conversation about observability often ignores SREs, SREs have a central role to play in observability success.

Quentin Rousseau — Rootly

In a microservice architecture, having retries several levels deep can be a recipe for nastiness.

Oren Eini — RavenDB

This report has some detail on two major incidents experienced by GitHub last month.

Scott Sanders — GitHub

Outages

AWS (Japan region)
Instagram
Twitter
Google Cloud Pub/Sub
SRE WEEKLY

Published
Categorized as SRE