SRE Weekly Issue #391

A message from our sponsor, Rootly:

Rootly is proud to have been recognized by G2 as a High Performer and Enterprise Leader in Incident Management for the sixth consecutive quarter! In total, we received nine G2 awards in the Summer Report. As a thank-you to our community, we’re giving away some awesome Rootly swag. Read our CEO’s blog post and pick up some free swag here:
https://rootly.com/blog/celebrating-our-nine-new-g2-awards

Articles

Operating complex systems is about creating accurate mental models, and abstractions are a key ingredient.

   Code Reliant

Why is it hard to get an organization to focus on LFI (learning from incidents) rather than RCA (root cause analysis)? Here’s a really great explanation.

  Lorin Hochstein

It’s about more than just money — like engineer morale, slowed innovation, and lost customers.

  Aaron Lober — Blameless

A great primer on the CAP theorem with a real-world example scenario.

  Lohith Chittineni

It’s really interesting to see how they handled distributed queuing and throttling across a highly distributed cache network without sacrificing speed.

  George Thomas — Cloudflare

[…] LLMs are black boxes that produce nondeterministic outputs and cannot be debugged or tested using traditional software engineering techniques. Hooking these black boxes up to production introduces reliability and predictability problems that can be terrifying.

  Charity Majors — Honeycomb
  Full disclosure: Honeycomb is my employer.

Dig into and understand how enough things work, and eventually you’ll look like a wizard.

  Rachel By the Bay

As a rule of thumb, always set timeouts when making network calls. And if you build libraries, always set reasonable default timeouts and make them configurable for your clients.

  Roberto Vitillo

SRE WEEKLY

Published
Categorized as SRE