The PIOSEE model is taught to pilots as a rubric for coming to a decision in a difficult aviation situation. As this article explains, we can also use it during IT incidents.
Francisco Melo Jr.
What is high cardinality in monitoring systems? Here’s an excellent explanation that includes tips on how to manage cardinality.
Ash P — SREPath
As Xero transitioned to a standard of “you build it you run it”, suddenly more engineering teams were responsible for knowing about and implementing observability. They designed this maturity model to help teams understand what they were aiming for and how to get there.
Andrew Macdonald — Xero
With around 200 undersea fiber cuts worldwide per year, a fleet of ships is at the ready to pull up the cables and repair them.
Josh Dzieza — The Verge
Last year, Cloudflare suffered a control plane outage when one of their datacenters lost power. They since did significant work to improve their resilience to power outages, and it was put to the test when the same datacenter lost power again.
Matthew Prince, John Graham-Cumming, and Jeremy Hartman — Cloudflare
Going from non-remote to remote was challenging but here’s how our team changed as we began working from home
Stefan Mikolajczyk — WeTransfer
Platform teams have a hugely important role to fill in the engineering organization. They are often the teams that enable other teams to move with more speed and safety. They can also quickly become disconnected from their customers.
Ross Brodbeck
When your system successfully serves a degraded response to the customer, how should you count that toward your SLO? Is it success? Failure? Something in between?
Niall Murphy
SRE WEEKLY