This year’s VOID Report is out, and it’s well worth a read. The subtitle is “Exploring the Unintended Consequences of Automation in Software” which is a really good way to get me to read something!
Courtney Nash — The VOID
A terraform change deleted a critical resource, and reviewers missed it because the plan was so big. Now they use Atlantis and Open Policy Agent to avoid accidental deletions of critical resources.
Lin Du — InfoQ
When analyzing an incident, what can we learn when we assume that everyone did everything as well as possible?
Lorin Hochstein
onsite technicians performing this planned network maintenance inadvertently unplugged several fibers that were adjacent to those in the work order, but still in use for production traffic
There’s a huge difference between four and five nines. There’s an especially interesting quote in this article that Google doesn’t think five nines is attainable in a commercial service.
Diana Bocco — UptimeRobot
Here’s an interview with three SREs about what it’s like to be an SRE at IBM.
IBM
I’ve been hearing about Observability 2.0 but didn’t know what it was all about. This article explains what it is and how it can help with cost.
Charity Majors — Honeycomb
Full disclosure: Honeycomb is my employer.
A cute little video pep talk for SREs. The site is actually real, too!
Krazam
Like a mini Y2K, leap day came around again and left some technical glitches in its wake, as chronicled in this article.
Gergely Orosz — The Pragmatic Engineer
SRE WEEKLY