{"id":763,"date":"2023-09-18T01:10:24","date_gmt":"2023-09-18T01:10:24","guid":{"rendered":"https:\/\/fde.cat\/index.php\/2023\/09\/18\/sre-weekly-issue-390\/"},"modified":"2023-09-18T01:10:24","modified_gmt":"2023-09-18T01:10:24","slug":"sre-weekly-issue-390","status":"publish","type":"post","link":"https:\/\/fde.cat\/index.php\/2023\/09\/18\/sre-weekly-issue-390\/","title":{"rendered":"SRE Weekly Issue #390"},"content":{"rendered":"<p><a href=\"https:\/\/sreweekly.com\/sre-weekly-issue-390\/\" title=\"Permalink to SRE Weekly Issue #390\" class=\"email_only\">View on sreweekly.com<\/a><\/p>\n<p>Many apologies to my email subscribers, who have seen two accidental re-sends of old issues recently due to a weird glitch in my automation.  I think I\u2019ve gotten a handle on it, and I\u2019ll run an internal retrospective of this incident, of course.<\/p>\n<div class=\"sreweekly-sponsor-message\">\n<h2>A message from our sponsor, <a href=\"https:\/\/rootly.com\/demo\/?utm_source=sreweekly\">Rootly<\/a>:<\/h2>\n<p>When incidents impact your customers, failing to communicate with them effectively can erode trust even further and compound an already difficult situation. Learn the essentials of customer-facing incident communication in Rootly\u2019s latest blog post:<br \/><a href=\"https:\/\/rootly.com\/blog\/the-medium-is-the-message-how-to-master-the-most-essential-incident-communication-channels\">https:\/\/rootly.com\/blog\/the-medium-is-the-message-how-to-master-the-most-essential-incident-communication-channels<\/a><\/p>\n<\/div>\n<h2>Articles<\/h2>\n<div class=\"wp-block-group\">\n<div class=\"wp-block-group__inner-container\">\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/thenewstack.io\/sre-vs-platform-engineer-cant-we-all-just-get-along\/\" target=\"_blank\" rel=\"noopener\">SRE vs Platform Engineer: Can\u2019t We All Just Get Along?<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Is it really SRE vs platform engineer? Or is there a way platforms can take site reliability to the next level?<\/p>\n<p>\u00a0\u00a0<small>Jennifer Riggins \u2014 The New Stack<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/robpostonblog.wordpress.com\/2023\/09\/04\/our-prerequisites-are-never-enough-for-high-risk\/\" target=\"_blank\" rel=\"noopener\">Our Prerequisites are Never Enough for High Risk<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>A surgeon delves into the key component that allows a group of skilled individuals to work effectively and safely together, using the term \u201cheed\u201d to describe this special interaction.<\/p>\n<p>Sidenote: in a hilarious coincidence this article managed to spoil me on a movie I was in the middle of watching (Arrival) \u2014 but it also put me in a really cool mindset to watch the rest of the film.<\/p>\n<p>\u00a0\u00a0<small>Dr. Rob Poston<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/www.issquareup.com\/incidents\/2trlsg0fbd9h\" target=\"_blank\" rel=\"noopener\">Degraded Performance: Square Services<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>More details on Square\u2019s outage from a couple weeks ago (it was DNS).<\/p>\n<p>\u00a0\u00a0<small>Square<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/azure.status.microsoft\/en-us\/status\/history\/\" target=\"_blank\" rel=\"noopener\">Azure status history<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Azure had an interesting outage in its Australia East region involving a power failure and the order cooling units were restored in.<\/p>\n<p>\u00a0\u00a0<small>Microsoft Azure<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/www.infoq.com\/presentations\/incidents-investigation\/\" target=\"_blank\" rel=\"noopener\">How Did It Make Sense at the Time? Understanding Incidents As They Occurred, Not as They Are Remembered <\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Asking this question is how you unlock the hidden essence of an incident.  This talk compares two public incident reports to show what it looks like when you dig into this question and when you don\u2019t.<\/p>\n<p>\u00a0\u00a0<small>Jacob Scott \u2014 InfoQ<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/admiralcloudberg.medium.com\/the-fallible-mind-the-crash-of-comair-flight-5191-cb80e005f73e\" target=\"_blank\" rel=\"noopener\">The Fallible Mind: The crash of Comair flight 5191<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>In this air accident, the pilots made a seemingly inexplicable mistake.<\/p>\n<p>This sentence really stood out to me, especially after reading the \u201cHow Did It Make Sense at the Time?\u201d article:<\/p>\n<p>When we inexplicably grab the wrong utensil when cooking or accidentally start taking our dirty dishes to the bathroom instead of the kitchen, we should be thankful that we aren\u2019t responsible for a plane full of people.<\/p>\n<p>\u00a0\u00a0<small>Admiral Cloudberg<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/github.blog\/2023-09-13-github-availability-report-august-2023\/\" target=\"_blank\" rel=\"noopener\">GitHub Availability Report: August 2023<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>There\u2019s an interesting failure mode in this one that might stand out for the Kafka admins among us:<\/p>\n<p>The Kafka consumer ended up stuck in a loop, unable to stabilize fast enough before timing out and restarting the coordination process.<\/p>\n<p>\u00a0\u00a0<small>Jakub Oleksy \u2014 GitHub<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/incident.io\/blog\/incident-management-and-problem-management\" target=\"_blank\" rel=\"noopener\">The connection between incident management and problem management<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>After explaining the difference between the ITIL terms \u201cincident management\u201d and \u201cproblem management\u201d, this article goes into a discussion of recent trends and whether it still makes sense to draw a distinction between the two.<\/p>\n<p>\u00a0\u00a0<small>Luis Gonzalez \u2014 incident.io<\/small><\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p>SRE WEEKLY<\/p>","protected":false},"excerpt":{"rendered":"<p>View on sreweekly.com Many apologies to my email subscribers, who have seen two accidental re-sends of old issues recently due to a weird glitch in my automation. I think I\u2019ve gotten a handle on it, and I\u2019ll run an internal retrospective of this incident, of course. A message from our sponsor, Rootly: When incidents impact&hellip; <a class=\"more-link\" href=\"https:\/\/fde.cat\/index.php\/2023\/09\/18\/sre-weekly-issue-390\/\">Continue reading <span class=\"screen-reader-text\">SRE Weekly Issue #390<\/span><\/a><\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","footnotes":""},"categories":[8],"tags":[],"class_list":["post-763","post","type-post","status-publish","format-standard","hentry","category-sre","entry"],"jetpack_featured_media_url":"","jetpack-related-posts":[{"id":760,"url":"https:\/\/fde.cat\/index.php\/2023\/09\/11\/sre-weekly-issue-389\/","url_meta":{"origin":763,"position":0},"title":"SRE Weekly Issue #389","date":"September 11, 2023","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, Rootly: When incidents impact your customers, failing to communicate with them effectively can erode trust even further and compound an already difficult situation. Learn the essentials of customer-facing incident communication in Rootly\u2019s latest blog post: https:\/\/rootly.com\/blog\/the-medium-is-the-message-how-to-master-the-most-essential-incident-communication-channels Articles Building a Successful SRE Team\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":755,"url":"https:\/\/fde.cat\/index.php\/2023\/09\/04\/sre-weekly-issue-388\/","url_meta":{"origin":763,"position":1},"title":"SRE Weekly Issue #388","date":"September 4, 2023","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, Rootly: When incidents impact your customers, failing to communicate with them effectively can erode trust even further and compound an already difficult situation. Learn the essentials of customer-facing incident communication in Rootly\u2019s latest blog post: https:\/\/rootly.com\/blog\/the-medium-is-the-message-how-to-master-the-most-essential-incident-communication-channels Articles Operating effectively in high surprise\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":749,"url":"https:\/\/fde.cat\/index.php\/2023\/08\/22\/sre-weekly-issue-386\/","url_meta":{"origin":763,"position":2},"title":"SRE Weekly Issue #386","date":"August 22, 2023","format":false,"excerpt":"View on sreweekly.com This issue was delayed a day while I was enjoying a much-needed vacation with my family. While I\u2019m on the subject, it\u2019s hot take time: vacations are important for the reliability of our sociotechnical systems, so good SREs should take vacations regularly and encourage others to as\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":752,"url":"https:\/\/fde.cat\/index.php\/2023\/08\/28\/sre-weekly-issue-387\/","url_meta":{"origin":763,"position":3},"title":"SRE Weekly Issue #387","date":"August 28, 2023","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, Rootly: When incidents impact your customers, failing to communicate with them effectively can erode trust even further and compound an already difficult situation. Learn the essentials of customer-facing incident communication in Rootly\u2019s latest blog post:https:\/\/rootly.com\/blog\/the-medium-is-the-message-how-to-master-the-most-essential-incident-communication-channels Articles Scaling Software Systems: 10 Key Factors\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":740,"url":"https:\/\/fde.cat\/index.php\/2023\/08\/07\/sre-weekly-issue-384\/","url_meta":{"origin":763,"position":4},"title":"SRE Weekly Issue #384","date":"August 7, 2023","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, Rootly: When incidents impact your customers, failing to communicate with them effectively can erode trust even further and compound an already difficult situation. Learn the essentials of customer-facing incident communication in Rootly\u2019s latest blog post: https:\/\/rootly.com\/blog\/the-medium-is-the-message-how-to-master-the-most-essential-incident-communication-channels Articles Scaling merge-ort across GitHub They\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":732,"url":"https:\/\/fde.cat\/index.php\/2023\/07\/10\/sre-weekly-issue-380\/","url_meta":{"origin":763,"position":5},"title":"SRE Weekly Issue #380","date":"July 10, 2023","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, Rootly: Curious how companies like Elastic, Tripadvisor, and 100s of others leverage Rootly to manage incidents in Slack and unlock instant best practices? Check out this lightning demo: https:\/\/www.loom.com\/share\/051c4be0425a436e888dc0c3690855ad Articles Amazon Prime Video\u2019s Microservices Move Doesn\u2019t Lead to a Monolith after All\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/763","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/comments?post=763"}],"version-history":[{"count":0,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/763\/revisions"}],"wp:attachment":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/media?parent=763"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/categories?post=763"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/tags?post=763"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}