SRE Weekly Issue #401

A message from our sponsor, FireHydrant:

Join FireHydrant Dec.14 for a conversation about on-call culture and its effect on engineering organizations, featuring special guests from Outreach and Udemy. Gain a better understanding of what makes excellent on-call culture and how to implement practices to improve yours.

Maybe you’re thinking of skipping over “yet another article about blamelessness”? Don’t. This one has some great examples and stories and is well worth a read.

  Michael Hart

I’m definitely guilty of a couple of these.

  Code Reliant

New podcast relevant to our interests!

In this series, you’ll hear insightful conversations with engineers, product managers, co-founders and more, all about the debatable topic of incident management.

  Luis Gonzalez —

A puzzling performance regression in EBS volumes, seemingly reproducible across instances. Anyone else seeing anything like this?

  Dustin Brown — dolthub

This article presents a framework for scaling SRE teams by defining SRE processes, automating, and iterating.

   Stelios Manioudakis — DZone

Some tips on what makes a good alert and how to design your alerts to be actually useful, rather than just noise.

  Leon Adato — Kentik

Why would you want multiple different targets for the same SLO? Read this one to find out.

  Alex Ewerlöf

Conflict-free Replicated Data Types are powerful, but they have downsides explained in this article, so it’d be great if we could avoid them when possible.

  Zak Knill


Categorized as SRE