SRE Weekly Issue #347

Articles

Call for Proposals – Learning From Incidents

Check it out, a conference from the Learning From Incidents people!

Children of the Magenta (Automation Paradox, pt. 1)

Echoing Bainbridge’s Ironies of Automation, this article discusses the potential dangers of over-automation, using an air accident as a case study. I hadn’t been aware of the term “Children of the Magenta” before.

Katie Mingle — 99% Invisible

3 questions to ask in the build vs buy debate for incident response tooling

There’s more to it than just hacking together some slack workflows.

Ryan McDonald — FireHydrant

Touching Grass With SLOs – Internal SLOs

Honeycomb doesn’t do its SLOs “by the book”.

The way Honeycomb defines SLOs is radically different from what I expected. Instead of the definitions I wrote about at the beginning of this post, I saw:

Reid Savage — Honeycomb
Full disclosure: Honeycomb is my employer.

Here’s how a Twitter engineer says it will break in the coming weeks

An anonymous Twitter engineer talks about what’s going on over there and how they think it might play out.

Chris Stokel-Walker — MIT Technology Review

Argo Rollouts at scale: Bringing Automated Rollbacks to 2,100+ services at Monzo

They rolled out automated rollbacks across a complex infrastructure, and in this article, they share the lessons they learned in the process.

Will Sewell and Joseph Pallamidessi — Monzo

The most important thing to understand about queues

Okay. Here’s the Important Thing:

As you approach maximum throughput, average queue size – and therefore average wait time – approaches infinity.

Dan Slimmon

Running on Empty: The crash of Hapag-Lloyd flight 3378

It was not clear to the pilots that the fuel estimation system was not designed to be used in the way they were using it.

Admiral Cloudberg

The Cold Laws of Winter: The crash of Air Florida flight 90

As is usually the case with air accidents, the crash of Air Florida flight 90 did not have a single cause. In fact, the accident was the result of the confluence of two proximate factors, each of which was itself the culmination of a long chain of errors.

Admiral Cloudberg

SRE WEEKLY

A message from our sponsor, Rootly:

Articles

Related