Articles
This episode of DisasterCast discusses what happens when attempts to make things safer backfire.
by trying to suppress small problems, we create a reservoir of danger waiting to burst out
Drew Rae
These images offer a glimpse into the visual patterns that appear in our variables and time-series, and the beauty that emerges from chaos. Some of the images in these galleries appeared during difficult rollouts, and some even during production incidents. All come from graphs generated by Google’s monitoring systems.
The popular slogan says “test in production”, but what if your business simply doesn’t allow it?
For any scenario where I expect to be causing client impact, I’d rather test in non-production than not test at all, since production is clearly off the table.
Christina Yakomin — InfoQ
There’s been a trend toward narrating our engineering work on company blogs, without which this newsletter probably wouldn’t exist.
Jordan Teicher — New York Times
My team recently moved databases from local files in the codebase to an online Database.
It didn’t go quite as planned, but they got there in the end.
Kaustubh Hiware — Mercari
In Product Analytics we wanted to support our colleagues in SRE, so we created a model to predict the monetary costs of incidents affecting our conversion funnel.
Enrique Hernani Ros — HelloFresh
There’s some interesting detail here about multiple failed UPSes and an accidental voltage mismatch exacerbating the situation.
Laura Dobberstein — The Register
SRE WEEKLY