SRE Weekly Issue #336

Articles

What it’s like to work as an embedded microservices SRE

In this article, I will introduce several improvements being made by the Microservices SRE Team, embedded with other teams.

MizumotoShota — Mercari

What should be on a SLI dashboard

What really stood out to me in this article is the Service Info section. A dashboard will quickly atrophy and lose its meaning without an explanation of what it’s for.

Ali Sattari

SRE: From Theory to Practice: What’s Difficult About Incident Command?

When things go wrong, who is in charge? And what does it feel like to do that role?

This is a summary of a forum discussion about incident command, in case you don’t have time to listen to the whole thing.

Emily Arnott — Blameless

Complex Adaptive Systems and ITSM

Complex systems are weird, and a traditional deterministic view such as in older ITIL iterations doesn’t capture the situation. We need to evolve our practices.

Jon Stevens-Hall

Latency- and Throughput-Optimized Clusters Under Load

How can you design and interpret metrics for systems optimized for latency or throughput?

Dan Slimmon

The Latency/Throughput Tradeoff: Why Fast Services Are Slow And Vice Versa

You can optimize for latency or throughput in a given system, but not both, since the two are directly at odds.

Dan Slimmon

SRE WEEKLY

A message from our sponsor, Rootly:

Articles

Related