{"id":755,"date":"2023-09-04T01:41:31","date_gmt":"2023-09-04T01:41:31","guid":{"rendered":"https:\/\/fde.cat\/index.php\/2023\/09\/04\/sre-weekly-issue-388\/"},"modified":"2023-09-04T01:41:31","modified_gmt":"2023-09-04T01:41:31","slug":"sre-weekly-issue-388","status":"publish","type":"post","link":"https:\/\/fde.cat\/index.php\/2023\/09\/04\/sre-weekly-issue-388\/","title":{"rendered":"SRE Weekly Issue #388"},"content":{"rendered":"<p><a href=\"https:\/\/sreweekly.com\/sre-weekly-issue-388\/\" title=\"Permalink to SRE Weekly Issue #388\" class=\"email_only\">View on sreweekly.com<\/a><\/p>\n<div class=\"sreweekly-sponsor-message\">\n<h2>A message from our sponsor, <a href=\"https:\/\/rootly.com\/demo\/?utm_source=sreweekly\">Rootly<\/a>:<\/h2>\n<p>When incidents impact your customers, failing to communicate with them effectively can erode trust even further and compound an already difficult situation. Learn the essentials of customer-facing incident communication in Rootly\u2019s latest blog post:<br \/>\n<a href=\"https:\/\/rootly.com\/blog\/the-medium-is-the-message-how-to-master-the-most-essential-incident-communication-channels\">https:\/\/rootly.com\/blog\/the-medium-is-the-message-how-to-master-the-most-essential-incident-communication-channels<\/a><\/p>\n<\/div>\n<h2>Articles<\/h2>\n<div class=\"wp-block-group\">\n<div class=\"wp-block-group__inner-container\">\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/surfingcomplexity.blog\/2023\/08\/27\/operating-effectively-in-high-surprise-mode\/\" target=\"_blank\" rel=\"noopener\">Operating effectively in high surprise mode<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>This article makes a cool analogy between designing systems to operate well under unexpected load and designing socio-technical systems that operate well when the people are surprised by what the system is doing.<\/p>\n<p>\u00a0\u00a0<small>Lorin Hochstein<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/incident.io\/blog\/service-level-agreement-best-practices\" target=\"_blank\" rel=\"noopener\">10 service level agreement practices you should implement<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>If you need to create SLAs, this article has some solid advice on how to go about it \u2014 and what to avoid.<\/p>\n<p>\u00a0\u00a0<small>incident.io<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/utcc.utoronto.ca\/~cks\/space\/blog\/sysadmin\/PrometheusAlertsAndScrapeFailures\" target=\"_blank\" rel=\"noopener\">Prometheus scrape failures can cause alerts to be \u2018resolved\u2019<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>If Prometheus can\u2019t scrape your service, an alert can get resolved incorrectly \u2014 and that can happen exactly when your service is failing!<\/p>\n<p>\u00a0\u00a0<small>Chris Siebenmann<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/medium.com\/@jpaulreed\/a-spectrum-of-actions-part-i-d768c56ed5f7\" target=\"_blank\" rel=\"noopener\">A Spectrum of Actions<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>A really nifty three-part exploration of action items in the aftermath of an incidents.  Rather than consider cost\/benefit, this article series proposes that we think about the likelihood of an action item being completed.<\/p>\n<p>\u00a0\u00a0<small>J. Paul Reed<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/statusgator.com\/blog\/is-north-virginia-aws-region-the-least-reliable-and-why\/\" target=\"_blank\" rel=\"noopener\">Is Northern Virginia Really the Least Reliable AWS Region And Why?<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Yes, as it turns out \u2014 and these folks have the receipts (along with some theories as to why).<\/p>\n<p>\u00a0\u00a0<small>Colin Bartlett<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/www.learningfromincidents.io\/posts\/insight-and-incidents\" target=\"_blank\" rel=\"noopener\">Reader: Insight and Incidents<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>The \u201cwow\u201d moment in this article is under the heading, \u201cWhat can we learn from creative desperation?\u201d<\/p>\n<p>\u00a0\u00a0<small>Eric Dobbs \u2014 Learning From Incidents<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/www.dolthub.com\/blog\/2023-08-30-how-to-create-automated-paging-on-call-at-your-startup\/\" target=\"_blank\" rel=\"noopener\">How to create automated paging and on-call at your startup<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Before explaining how they set up their on-call, these folks share why they <em>avoided<\/em> it in the early stages of their startup, and what made them finally take the plunge.<\/p>\n<p>\u00a0\u00a0<small>Dustin Brown \u2014 DoltHub<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/www.codereliant.io\/the-dark-side-of-sre\/\" target=\"_blank\" rel=\"noopener\">The Dark Side of SRE<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>For the good of the profession, the SRE community still needs to coalesce around more consistent job ladders, expectations, and competencies.<\/p>\n<p>\u00a0\u00a0<small>Code Reliant<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/www.honeycomb.io\/incident-review-what-comes-up-must-first-go-down\" target=\"_blank\" rel=\"noopener\">Incident Review: What Comes Up Must First Go Down<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Honeycomb had their worst incident ever at the end of July, and in their characteristic style, they\u2019ve posted an incredibly detailed analysis of what happened \u2014 and that\u2019s just the blog post.  Then you can click through for a 17-page PDF with lots more detail.<\/p>\n<p>\u00a0\u00a0<small>Fred Hebert \u2014 Honeycomb<\/small><br \/>\n\u00a0\u00a0<small><em>Full disclosure: Honeycomb is my employer.<\/em><\/small><\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p>SRE WEEKLY<\/p>","protected":false},"excerpt":{"rendered":"<p>View on sreweekly.com A message from our sponsor, Rootly: When incidents impact your customers, failing to communicate with them effectively can erode trust even further and compound an already difficult situation. Learn the essentials of customer-facing incident communication in Rootly\u2019s latest blog post: https:\/\/rootly.com\/blog\/the-medium-is-the-message-how-to-master-the-most-essential-incident-communication-channels Articles Operating effectively in high surprise mode This article makes a&hellip; <a class=\"more-link\" href=\"https:\/\/fde.cat\/index.php\/2023\/09\/04\/sre-weekly-issue-388\/\">Continue reading <span class=\"screen-reader-text\">SRE Weekly Issue #388<\/span><\/a><\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","footnotes":""},"categories":[8],"tags":[],"class_list":["post-755","post","type-post","status-publish","format-standard","hentry","category-sre","entry"],"jetpack_featured_media_url":"","jetpack-related-posts":[{"id":760,"url":"https:\/\/fde.cat\/index.php\/2023\/09\/11\/sre-weekly-issue-389\/","url_meta":{"origin":755,"position":0},"title":"SRE Weekly Issue #389","date":"September 11, 2023","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, Rootly: When incidents impact your customers, failing to communicate with them effectively can erode trust even further and compound an already difficult situation. Learn the essentials of customer-facing incident communication in Rootly\u2019s latest blog post: https:\/\/rootly.com\/blog\/the-medium-is-the-message-how-to-master-the-most-essential-incident-communication-channels Articles Building a Successful SRE Team\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":746,"url":"https:\/\/fde.cat\/index.php\/2023\/08\/14\/sre-weekly-issue-385\/","url_meta":{"origin":755,"position":1},"title":"SRE Weekly Issue #385","date":"August 14, 2023","format":false,"excerpt":"View on sreweekly.com Many apologies to Matt Cooper at GitHub, who is the actual author of the article Scaling Merge-ort Across GitHub from last week. Sorry for the mis-credit, Matt! A message from our sponsor, Rootly: When incidents impact your customers, failing to communicate with them effectively can erode trust\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":749,"url":"https:\/\/fde.cat\/index.php\/2023\/08\/22\/sre-weekly-issue-386\/","url_meta":{"origin":755,"position":2},"title":"SRE Weekly Issue #386","date":"August 22, 2023","format":false,"excerpt":"View on sreweekly.com This issue was delayed a day while I was enjoying a much-needed vacation with my family. While I\u2019m on the subject, it\u2019s hot take time: vacations are important for the reliability of our sociotechnical systems, so good SREs should take vacations regularly and encourage others to as\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":763,"url":"https:\/\/fde.cat\/index.php\/2023\/09\/18\/sre-weekly-issue-390\/","url_meta":{"origin":755,"position":3},"title":"SRE Weekly Issue #390","date":"September 18, 2023","format":false,"excerpt":"View on sreweekly.com Many apologies to my email subscribers, who have seen two accidental re-sends of old issues recently due to a weird glitch in my automation. I think I\u2019ve gotten a handle on it, and I\u2019ll run an internal retrospective of this incident, of course. A message from our\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":740,"url":"https:\/\/fde.cat\/index.php\/2023\/08\/07\/sre-weekly-issue-384\/","url_meta":{"origin":755,"position":4},"title":"SRE Weekly Issue #384","date":"August 7, 2023","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, Rootly: When incidents impact your customers, failing to communicate with them effectively can erode trust even further and compound an already difficult situation. Learn the essentials of customer-facing incident communication in Rootly\u2019s latest blog post: https:\/\/rootly.com\/blog\/the-medium-is-the-message-how-to-master-the-most-essential-incident-communication-channels Articles Scaling merge-ort across GitHub They\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":752,"url":"https:\/\/fde.cat\/index.php\/2023\/08\/28\/sre-weekly-issue-387\/","url_meta":{"origin":755,"position":5},"title":"SRE Weekly Issue #387","date":"August 28, 2023","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, Rootly: When incidents impact your customers, failing to communicate with them effectively can erode trust even further and compound an already difficult situation. Learn the essentials of customer-facing incident communication in Rootly\u2019s latest blog post:https:\/\/rootly.com\/blog\/the-medium-is-the-message-how-to-master-the-most-essential-incident-communication-channels Articles Scaling Software Systems: 10 Key Factors\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/755","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/comments?post=755"}],"version-history":[{"count":0,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/755\/revisions"}],"wp:attachment":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/media?parent=755"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/categories?post=755"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/tags?post=755"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}