SRE Weekly Issue #336

A message from our sponsor, Rootly:

Manage incidents directly from Slack with Rootly đźš’. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging and adding responders, postmortem timeline, setting up reminders, and more. Book a demo (+ get a snazzy Rootly lego set):


In this article, I will introduce several improvements being made by the Microservices SRE Team, embedded with other teams.

  MizumotoShota — Mercari

What really stood out to me in this article is the Service Info section. A dashboard will quickly atrophy and lose its meaning without an explanation of what it’s for.

  Ali Sattari

When things go wrong, who is in charge? And what does it feel like to do that role?

This is a summary of a forum discussion about incident command, in case you don’t have time to listen to the whole thing.

  Emily Arnott — Blameless

Complex systems are weird, and a traditional deterministic view such as in older ITIL iterations doesn’t capture the situation. We need to evolve our practices.

  Jon Stevens-Hall

How can you design and interpret metrics for systems optimized for latency or throughput?

  Dan Slimmon

You can optimize for latency or throughput in a given system, but not both, since the two are directly at odds.

  Dan Slimmon


Categorized as SRE