For every 9 you add to SLO, you’re making the system 10x more reliable but also 10x more expensive.
Alex Ewerlöf
In this incident story, the feature flags were served by the main application server. When a new feature caused the server to crash, there was no way to flip the flag back off to stop the crashes.
rachelbythebay
The author of a classification system for human error reflects 20 years later on the harm that such systems can cause by using deficit-based language.
Dr. Steven Shorrock
Here’s Fred Hebert’s analysis of Cloudflare’s write-up of their incident on November 2.
I’m hoping they’re going to do a more in-depth review.
Fred Hebert — VOID
In this post, we introduce a hybrid approach that seamlessly combines the precision of manual instrumentation with the comfort, efficiency, and performance of automatic instrumentation.
Ron Federman — Odigos
Change is not the problem. It’s unaddressed risk
Bruce Johnston — High Scalability
A shell script with a loop running a DB client can fill up your ephemeral ports in a hurry.
Oren Eini — RavenDB
When you get right down to it, it’s all human communication, even assembly code. It’s human factors all the way down.
Michael Hart
SRE WEEKLY