SRE Weekly Issue #393

A message from our sponsor, Rootly:

Rootly is proud to have been recognized by G2 as a High Performer and Enterprise Leader in Incident Management for the sixth consecutive quarter! In total, we received nine G2 awards in the Summer Report. As a thank-you to our community, we’re giving away some awesome Rootly swag. Read our CEO’s blog post and pick up some free swag here:
https://rootly.com/blog/celebrating-our-nine-new-g2-awards

This repo contains a path to learn SRE, in the form of a list of concepts to familiarize oneself with.

  Teiva Harsanyi

How can we justify the (sometimes significant) expense of instilling observability into our systems?

  Nočnica Mellifera — SigNoz

It was DNS. Cloudflare’s 1.1.1.1 recursive DNS service failed this week, stemming from failure to parse the new ZONEMD record type.

  Ólafur Guðmundsson — Cloudflare

Rather than just dry theory, this article helps you understand what the CAP theory means in practice as you choose a data store.

Note: this link was 504ing at time of publishing, so here’s the archive.org copy.

  Bala Kalavala — Open Source For U

A “blameless” culture can get in the way if it means you’re not allowed to make any mention of who was at the pointy-end of your system when things blew up.

  incident.io

In this post, we will share how we formalized the LinkedIn Business Continuity & Resilience Program, how this new program helped increase our customers’ confidence in our operations, and the lessons that we learned as we attained ISO 22301 certification.

  Chau Vu — LinkedIn

This is the start of a 6-article series, with each going through one week along a path to prepare for SRE interviews.

We’ll spend each week focusing on building up your expertise in the key areas SREs need to know, like automation, monitoring, incident response, etc.

  Code Reliant

Beyond the CAP theorem, what actually happens during a partition?

“ if there is a partition (P), how does the system trade off availability and consistency (A and C); else (E), when the system is running normally in the absence of partitions, how does the system trade off latency and consistency (L and C)” [Daniel J. Abadi]

  Lohith Chittineni

SRE WEEKLY

Published
Categorized as SRE