SRE Weekly Issue #277

Articles

FINRA Orders Record Financial Penalties Against Robinhood Financial LLC

Remember all those Robinhood outages? The US financial regulatory agency is making Robinhood repay folks for the losses they sustained as a result and also fining them for other reasons.

Michelle Ong, Ray Pellecchia, Angelita Plemmer Williams, and Andrew DeSouza — FINRA

r/WallStreetBets Incident Anthology: More Data, More Problems

This is brilliant and I wish I’d thought of it years ago:

One of the things we’ve previously seen during database incidents is that a set of impacted tables can provide a unique fingerprint to identify a feature that’s triggering issues.

Courtney Wang — Reddit

The Deeper Root Cause of the Fastly and Akamai Outages

The suggested root cause involves consolidation in cloud providers and the importance of DNS.

Alban Kwan — CircleID

Full disclosure: Fastly, my employer, is mentioned.

The normalization of deviance in healthcare delivery

This paper is about recognizing normalization of deviance and techniques for dealing with it. This tidbit really made me think:

[…] they might have been taught a system deviation without realizing that it was so […]

Bus Horiz

Elephant in the Blameless War Room: Accountability

Blameless incident analysis is often at odds with a desire to “hold people accountable”. This article explores that conflict and techniques for managing the needs involved.

Christina Tan and Emily Arnott — Blameless

Shipping on a Spent Error Budget

What can you do if you’re out of error budget but you still want to deliver new features? Get creative.

Paul Osman — Honeycomb

The SRE Incident Response game

I am going to go through the variation we use to up skill our on-call engineers we called “The Kobayashi Maru”, the name we borrowed from the Star Trek training exercise to test the character of Starfleet cadets.

Bruce Dominguez

Outages

Slack
Zimbabwe Shared Services (financial services)
Snapchat
Facebook
YouTube
Twitter
Nest
GCash
SRE WEEKLY

A message from our sponsor, StackHawk:

Articles

Outages

Related