SRE Weekly Issue #389

A message from our sponsor, Rootly:

When incidents impact your customers, failing to communicate with them effectively can erode trust even further and compound an already difficult situation. Learn the essentials of customer-facing incident communication in Rootly’s latest blog post:
https://rootly.com/blog/the-medium-is-the-message-how-to-master-the-most-essential-incident-communication-channels

Articles

Here’s four of the lessons I learned that should help you build a successful SRE organization.

Focus on Developer Training
Focus on the Right Abstractions
Focus on Self Service
Automate Yourself out of a job

  Sven Hans Knecht

In this blog post, we’ll talk about two incident management structure models — distributed and centralized, including the pros and cons of each, and examples of what each structure looks like in our community.

  Robert Ross — FireHydrant

The Rasmussen model conceptualizes the limits of a system along 3 boundaries: Cost, System Performance, and Human Capacity.

  Nishant Modak — Last9

Wow, this is a really interesting incident. it has all the hallmarks of a nightmare sev1: time pressure, unknown problem, inventing new procedures on the spot, multiple different teams/specialties having to work together, etc.

  Jorg Wenninger — CERN

What do you do when many engineers all need to take the same day off each week for religious reasons?

  TimeWeSp

Toyota recently halted production in their factories due to a problem in their order system, about which they shared some interesting details.

  Toyota

Here’s a guidebook on how to handle being the first SRE at a company.

  Sven Hans Knecht

SRE WEEKLY

Published
Categorized as SRE