{"id":749,"date":"2023-08-22T02:02:06","date_gmt":"2023-08-22T02:02:06","guid":{"rendered":"https:\/\/fde.cat\/index.php\/2023\/08\/22\/sre-weekly-issue-386\/"},"modified":"2023-08-22T02:02:06","modified_gmt":"2023-08-22T02:02:06","slug":"sre-weekly-issue-386","status":"publish","type":"post","link":"https:\/\/fde.cat\/index.php\/2023\/08\/22\/sre-weekly-issue-386\/","title":{"rendered":"SRE Weekly Issue #386"},"content":{"rendered":"<p><a href=\"https:\/\/sreweekly.com\/sre-weekly-issue-386\/\" title=\"Permalink to SRE Weekly Issue #386\" class=\"email_only\">View on sreweekly.com<\/a><\/p>\n<p>This issue was delayed a day while I was enjoying a much-needed vacation with my family.  While I\u2019m on the subject, it\u2019s hot take time: vacations are important for the reliability of our sociotechnical systems, so good SREs should take vacations regularly and encourage others to as well.<\/p>\n<div class=\"sreweekly-sponsor-message\">\n<h2>A message from our sponsor, <a href=\"https:\/\/rootly.com\/demo\/?utm_source=sreweekly\">Rootly<\/a>:<\/h2>\n<p>When incidents impact your customers, failing to communicate with them effectively can erode trust even further and compound an already difficult situation. Learn the essentials of customer-facing incident communication in Rootly\u2019s latest blog post:<br \/>\n<a href=\"https:\/\/rootly.com\/blog\/the-medium-is-the-message-how-to-master-the-most-essential-incident-communication-channels\">https:\/\/rootly.com\/blog\/the-medium-is-the-message-how-to-master-the-most-essential-incident-communication-channels<\/a><\/p>\n<\/div>\n<h2>Articles<\/h2>\n<div class=\"wp-block-group\">\n<div class=\"wp-block-group__inner-container\">\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/blog.alexewerlof.com\/p\/broken-ownership\" target=\"_blank\" rel=\"noopener\">Broken Ownership<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>If \u201cyou build it, you run it\u201d requires <strong>mandate<\/strong>, <strong>knowledge<\/strong>, and <strong>responsibility<\/strong>, what happens when one of those is missing?<\/p>\n<p>\u00a0\u00a0<small>Alex Ewerl\u00f6f<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/slack.engineering\/service-delivery-index-a-driver-for-reliability\/\" target=\"_blank\" rel=\"noopener\">Service Delivery Index: A Driver for Reliability<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Slack developed an all-encompassing metric for the user experience that goes beyond a simple SLO.<\/p>\n<p>\u00a0\u00a0<small>Matthew McKeen and Ryan Katkov<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/wso2.com\/whitepapers\/transactions-in-a-microservice-world\/\" target=\"_blank\" rel=\"noopener\"> Transactions in a Microservice World<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>This whitepaper delves deep into the ways a microservice architecture changes how transactions work.  It presents a method of dealing with microservice transaction failures through application-specific compensation logic.<\/p>\n<p>\u00a0\u00a0<small>Frank Leymann \u2014 WSO2<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/blog.bambulab.com\/cloud-temporary-outage-investigation\/?fbclid=IwAR2DgT9iRZYFloMcz-QwCsn4D3s8RWfTjet9KiNcjSF2FnSKFhisSBEREwc\" target=\"_blank\" rel=\"noopener\">Initial Investigation in the Bambu Cloud Temporary Outage<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Bambu is a brand of 3d printers that are primarily cloud-based.  A problem in their cloud system resulted in printers running jobs unexpectedly, causing significant damage to some customer\u2019s printers.<\/p>\n<p>\u00a0\u00a0<small>Bambu Lab<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/status.cloud.google.com\/incidents\/ZQFpiLgvHB7a7Ua7o26T\" target=\"_blank\" rel=\"noopener\">Google Cloud Hybrid Connectivity Incident Report<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>An interesting confluence of fiber optic line failures resulted in loss of connectivity on what should have been a redundant link.<\/p>\n<p>\u00a0\u00a0<small>Google<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/www.codereliant.io\/slos-are-overrated\/\" target=\"_blank\" rel=\"noopener\">SLOs Are Overrated<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>I know the title looks like click-bait, but this article delivers with 7 well thought-out critiques of SLOs.<\/p>\n<p>\u00a0\u00a0<small>Code Reliant<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/github.com\/runbear-io\/awesome-runbook\" target=\"_blank\" rel=\"noopener\">GitHub \u2013 runbear-io\/awesome-runbook<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>This latest entry into the awesome-* arena is a curated list of runbooks and related resources for popular software.<\/p>\n<p>\u00a0\u00a0<small>Runbear<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/surfingcomplexity.blog\/2023\/08\/20\/normal-incidents\/\" target=\"_blank\" rel=\"noopener\">Normal incidents<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>You shift from asking \u201cwhat was the abnormal work?\u201d to \u201chow did this incident happen even though everyone was doing normal work?\u201d<\/p>\n<p>This article immediately made me think of the <a href=\"https:\/\/www.youtube.com\/watch?v=R5CFhVvMTkk\">latest Mentour Pilot accident investigation<\/a> in which everyone acted nearly perfectly and yet still only narrowly avoided a mid-air collision.<\/p>\n<p>\u00a0\u00a0<small>Lorin Hochstein<\/small><\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p>SRE WEEKLY<\/p>","protected":false},"excerpt":{"rendered":"<p>View on sreweekly.com This issue was delayed a day while I was enjoying a much-needed vacation with my family. While I\u2019m on the subject, it\u2019s hot take time: vacations are important for the reliability of our sociotechnical systems, so good SREs should take vacations regularly and encourage others to as well. A message from our&hellip; <a class=\"more-link\" href=\"https:\/\/fde.cat\/index.php\/2023\/08\/22\/sre-weekly-issue-386\/\">Continue reading <span class=\"screen-reader-text\">SRE Weekly Issue #386<\/span><\/a><\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","footnotes":""},"categories":[8],"tags":[],"class_list":["post-749","post","type-post","status-publish","format-standard","hentry","category-sre","entry"],"jetpack_featured_media_url":"","jetpack-related-posts":[{"id":882,"url":"https:\/\/fde.cat\/index.php\/2024\/06\/17\/sre-weekly-issue-429\/","url_meta":{"origin":749,"position":0},"title":"SRE Weekly Issue #429","date":"June 17, 2024","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, FireHydrant: We\u2019ve gone all out on our new integration with Microsoft Teams. If you\u2019re a MS Teams user, FireHydrant now supports the most comprehensive integration for incident management. Run the entire IM process without ever leaving the chat. https:\/\/firehydrant.com\/blog\/introducing-a-brand-new-microsoft-teams-integration\/ Virtualizing Our Storage\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":760,"url":"https:\/\/fde.cat\/index.php\/2023\/09\/11\/sre-weekly-issue-389\/","url_meta":{"origin":749,"position":1},"title":"SRE Weekly Issue #389","date":"September 11, 2023","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, Rootly: When incidents impact your customers, failing to communicate with them effectively can erode trust even further and compound an already difficult situation. Learn the essentials of customer-facing incident communication in Rootly\u2019s latest blog post: https:\/\/rootly.com\/blog\/the-medium-is-the-message-how-to-master-the-most-essential-incident-communication-channels Articles Building a Successful SRE Team\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":737,"url":"https:\/\/fde.cat\/index.php\/2023\/07\/23\/sre-weekly-issue-382\/","url_meta":{"origin":749,"position":2},"title":"SRE Weekly Issue #382","date":"July 23, 2023","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, Rootly: Eliminate the anxiety around declaring an incident for nebulous problems by introducing a triage phase into your incident management process. Our latest blog posts dives into why the triage phase is so important, and how you can automate yours with Rootly.\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":740,"url":"https:\/\/fde.cat\/index.php\/2023\/08\/07\/sre-weekly-issue-384\/","url_meta":{"origin":749,"position":3},"title":"SRE Weekly Issue #384","date":"August 7, 2023","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, Rootly: When incidents impact your customers, failing to communicate with them effectively can erode trust even further and compound an already difficult situation. Learn the essentials of customer-facing incident communication in Rootly\u2019s latest blog post: https:\/\/rootly.com\/blog\/the-medium-is-the-message-how-to-master-the-most-essential-incident-communication-channels Articles Scaling merge-ort across GitHub They\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":746,"url":"https:\/\/fde.cat\/index.php\/2023\/08\/14\/sre-weekly-issue-385\/","url_meta":{"origin":749,"position":4},"title":"SRE Weekly Issue #385","date":"August 14, 2023","format":false,"excerpt":"View on sreweekly.com Many apologies to Matt Cooper at GitHub, who is the actual author of the article Scaling Merge-ort Across GitHub from last week. Sorry for the mis-credit, Matt! A message from our sponsor, Rootly: When incidents impact your customers, failing to communicate with them effectively can erode trust\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":826,"url":"https:\/\/fde.cat\/index.php\/2024\/02\/18\/sre-weekly-issue-412\/","url_meta":{"origin":749,"position":5},"title":"SRE Weekly Issue #412","date":"February 18, 2024","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, FireHydrant: FireHydrant\u2019s new and improved MTTX analytics dashboard is here! See which services are most affected by incidents, where they take the longest to detect (or acknowledge, mitigate, resolve \u2026 you name it); and how metrics and statistics change over time. https:\/\/firehydrant.com\/blog\/mttx-incident-analytics-to-drive-your-reliability-roadmap\/\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/749","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/comments?post=749"}],"version-history":[{"count":0,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/749\/revisions"}],"wp:attachment":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/media?parent=749"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/categories?post=749"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/tags?post=749"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}