SRE Weekly Issue #358

A message from our sponsor, Rootly:

Manage incidents directly from Slack with Rootly 🚒.

Rootly automates manual tasks like creating an incident channel, Jira ticket and Zoom rooms, inviting responders, creating statuspage updates, postmortem timelines and more. Want to see why companies like Canva and Grammarly love us?:

https://rootly.com/demo/

Articles

A new spin on changing the engines on a jet in flight: using DNS request/response rewriting to transition an application over without modification.

  lainra — Mercari

How much additional capacity can you get for a dollar?

  Dan Slimmon

Dealing with the unknown, limited cognitive bandwidth, coordination patterns, psychological safety and feeding information back into the organization.

  Fred Hebert — The New Stack
  Full disclosure: Honeycomb is my employer.

How do you enable adoption of SRE principles at a large, mature company that has little capacity for innovation?

the value proposition of “SRE” is the idea that you can handle an exponentially growing business with a logarithmically growing payroll.

  Layer Alpeh

Read this one to learn about four attributes of good alerting and how to ensure your SLO burn rate alerts are effective.

  Saheed Oladosu

There’s plenty of content out there telling you how to implement observability, or what good looks like. But what about bad observability? What are some anti-patterns to watch out for?

  Stephen Townshend — SquaredUp

This is an interview about on-call with Twilio’s VP of SRE who also spent 17 years as an SRE at Google.

  Elena Boroda

They started with a (mostly) single-availability-zone Kafka deployment. Here’s how they transitioned to a multi-zone architecture that can survive a single AZ failure.

  Andrey Polyakov and Kamya Shethia — Etsy

SRE WEEKLY

Published
Categorized as SRE