Articles
TLS can be such a headache.
This was an interesting situation. There was a valid path to the USERTrust RSA Certification Authority, and there was also an expired path. The browser was able to find the valid chain, but the curl was not able to find it.
Adam Surak — Algolia
A well-researched article on shifting emphasis from incident prevention to learning and resilience.
Incidents cannot be prevented, because incidents are the inevitable result of success.
Alex Elman
This one’s worth reading through twice to let it sink in. It puts me in mind of this article by WIll Gallego, which is another thoughtful critique of error budgets.
Here are the claims I’m going to make:
- Large incidents are much more costly to organizations than small ones, so we should work to reduce the risk of large incidents.
- Error budgets don’t help reduce risk of large incidents.
Lorin Hochstein
This is a review of a few of the chapters of the book of the same title by Emil Stolarsky and Jaime Woo.
Have you read it too? I’d love to read your take on it!
Dean Wilson
This one’s worth reading the next time need to do an incident retrospective. The traps are:
- Counterfactual reasoning
- Normative language
- Mechanistic reasoning
John Allspaw — Adaptive Capacity Labs
The skill in question is glue work, and I sure appreciate a good gluer when I see one.
Emily Arnott — Blameless
This one starts out by defining SRE, then goes into how to define your team and fill it with people.
Julie Gunderson — PagerDuty
Outages
- Fastly
- Fastly is my employer.
- Slack
- Tyro Payments
- Signal
- .ke TLD (Kenya)
- Microsoft Teams, Office 365 and OneDrive
SRE WEEKLY