{"id":871,"date":"2024-05-27T02:18:03","date_gmt":"2024-05-27T02:18:03","guid":{"rendered":"https:\/\/fde.cat\/index.php\/2024\/05\/27\/sre-weekly-issue-426\/"},"modified":"2024-05-27T02:18:03","modified_gmt":"2024-05-27T02:18:03","slug":"sre-weekly-issue-426","status":"publish","type":"post","link":"https:\/\/fde.cat\/index.php\/2024\/05\/27\/sre-weekly-issue-426\/","title":{"rendered":"SRE Weekly Issue #426"},"content":{"rendered":"<p><a class=\"email_only\" href=\"https:\/\/sreweekly.com\/sre-weekly-issue-426\/\">View on sreweekly.com<\/a><\/p>\n<p>Got any burning questions to ask an experienced SRE?  I\u2019m gathering your questions in <a href=\"https:\/\/forms.gle\/hAd2vKHthNVFkAQ57\">this google form<\/a>, and I\u2019d love to hear from you.  I\u2019m hoping to use your questions to help inspire authors looking to write more great SRE-related content.<\/p>\n<div class=\"sreweekly-sponsor-message\">\n<h2>A message from our sponsor, <a href=\"https:\/\/firehydrant.com\/\">FireHydrant<\/a>:<\/h2>\n<p>FireHydrant is now AI-powered for faster, smarter incidents! Power up your incidents with auto-generated real-time summaries, retrospectives, and status page updates.<\/p>\n<p><a href=\"https:\/\/firehydrant.com\/blog\/ai-for-incident-management-is-here\/\">https:\/\/firehydrant.com\/blog\/ai-for-incident-management-is-here\/<\/a><\/p>\n<\/div>\n<div class=\"wp-block-group is-layout-flow wp-block-group-is-layout-flow\">\n<div class=\"wp-block-group__inner-container\">\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/hross.substack.com\/p\/the-rule-of-5-errors\" target=\"_blank\" rel=\"noopener\">The Rule of 5 Errors<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>If your overall request volume is low, single errors can have a big impact on your metrics \u2014 a phenomenon I\u2019ve experienced at work recently.<\/p>\n<p>\u00a0\u00a0<small>Ross Brodbeck<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/www.gremlin.com\/blog\/how-reliability-differs-between-monolithic-and-microservice-based-architectures\" target=\"_blank\" rel=\"noopener\">How reliability differs between monolithic and microservice-based architectures<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>This article outlines five facets of microservice architectures that can have implications for reliability.<\/p>\n<p>\u00a0\u00a0<small>Andre Newman \u2014 Gremlin<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/admiralcloudberg.medium.com\/children-of-the-magenta-the-crash-of-american-airlines-flight-965-b16f57c34cfe\" target=\"_blank\" rel=\"noopener\">Children of the Magenta: The crash of American Airlines flight 965<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>If this title sounds familiar, I\u2019ve linked to <a href=\"https:\/\/99percentinvisible.org\/episode\/children-of-the-magenta-automation-paradox-pt-1\/\">an article about the Children of the Magenta concept<\/a> before.  In this accident report, the pilots became confused about their location and course, and ultimately, their trust in the Flight Management System contributed to the disaster.<\/p>\n<p>\u00a0\u00a0<small>Kyra Dempsey (Admiral Cloudberg)<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/www.honeycomb.io\/blog\/establishing-center-of-production-excellence-pt1\" target=\"_blank\" rel=\"noopener\">Establishing and Enabling a Center of Production Excellence<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>A Center of Production Excellence can be a powerful means for an organization to initiate transformations which foster resilience as it matures and its environment changes.<\/p>\n<p>\u00a0\u00a0<small>Nick Travaglini \u2014 Honeycomb<\/small><\/p>\n<p>\u00a0\u00a0<small><em>Full disclosure: Honeycomb is my employer.<\/em><\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/cloud.google.com\/blog\/products\/infrastructure\/details-of-google-cloud-gcve-incident\" target=\"_blank\" rel=\"noopener\">Details of Google Cloud GCVE incident<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Last week, I shared a story about an outage at UniSuper that was caused by Google Cloud.  This week, Google shared more details about what went wrong, and it\u2019s well worth a read.<\/p>\n<p>\u00a0\u00a0<small>Google<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/status.heroku.com\/incidents\/2664\" target=\"_blank\" rel=\"noopener\">Heroku Incident #2664 Followup<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>This incident is intriguing because exponential backoff made the problem harder to detect.<\/p>\n<p>\u00a0\u00a0<small>Heroku<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/blog.alexewerlof.com\/p\/service-level-pitfalls\" target=\"_blank\" rel=\"noopener\">Service level pitfalls<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>A discussion of what might get in the way of an organization implementing SLI\/SLO\/SLAs.<\/p>\n<p>Note that the second half of the article (overcoming those obstacles) is behind a <strong>paywall<\/strong>.  I don\u2019t often recommend pay-only content, but it\u2019s worth considering a subscription, because Alex is an excellent author whose work I\u2019ve featured here many times.<\/p>\n<p>\u00a0\u00a0<small>Alex Ewerl\u00f6f<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/surfingcomplexity.blog\/2024\/05\/25\/the-error-term-isnt-pareto-distributed\/\" target=\"_blank\" rel=\"noopener\">The error term isn\u2019t Pareto distributed<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>if we look at a distribution of incidents by contributor (or cause, or component), we\u2019re unlikely to see any one of these stand out as being the source of a large number of incidents.<\/p>\n<p>\u00a0\u00a0<small>Lorin Hochstein<\/small><\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p>SRE WEEKLY<\/p>","protected":false},"excerpt":{"rendered":"<p>View on sreweekly.com Got any burning questions to ask an experienced SRE? I\u2019m gathering your questions in this google form, and I\u2019d love to hear from you. I\u2019m hoping to use your questions to help inspire authors looking to write more great SRE-related content. A message from our sponsor, FireHydrant: FireHydrant is now AI-powered for&hellip; <a class=\"more-link\" href=\"https:\/\/fde.cat\/index.php\/2024\/05\/27\/sre-weekly-issue-426\/\">Continue reading <span class=\"screen-reader-text\">SRE Weekly Issue #426<\/span><\/a><\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","footnotes":""},"categories":[8],"tags":[],"class_list":["post-871","post","type-post","status-publish","format-standard","hentry","category-sre","entry"],"jetpack_featured_media_url":"","jetpack-related-posts":[{"id":663,"url":"https:\/\/fde.cat\/index.php\/2022\/12\/19\/sre-weekly-issue-352\/","url_meta":{"origin":871,"position":0},"title":"SRE Weekly Issue #352","date":"December 19, 2022","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly\u00a0\ud83d\ude92. Rootly automates manual tasks like creating an incident channel, Jira ticket and Zoom rooms, inviting responders, creating statuspage updates, postmortem timelines and more. Want to see why companies like Canva and Grammarly love us?:\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":798,"url":"https:\/\/fde.cat\/index.php\/2023\/12\/04\/sre-weekly-issue-401\/","url_meta":{"origin":871,"position":1},"title":"SRE Weekly Issue #401","date":"December 4, 2023","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, FireHydrant: Join FireHydrant Dec.14 for a conversation about on-call culture and its effect on engineering organizations, featuring special guests from Outreach and Udemy. Gain a better understanding of what makes excellent on-call culture and how to implement practices to improve yours. https:\/\/app.livestorm.co\/firehydrant\/better-incidents-winter-bonfire-inside-on-call?type=detailed\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":864,"url":"https:\/\/fde.cat\/index.php\/2024\/05\/13\/sre-weekly-issue-424\/","url_meta":{"origin":871,"position":2},"title":"SRE Weekly Issue #424","date":"May 13, 2024","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, FireHydrant: FireHydrant is now AI-powered for faster, smarter incidents! Power up your incidents with auto-generated real-time summaries, retrospectives, and status page updates. https:\/\/firehydrant.com\/blog\/ai-for-incident-management-is-here\/ My Availability Investment Playbook Here\u2019s an ultra-practical guide to pushing for reliability investments at your company, formatted as a\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":855,"url":"https:\/\/fde.cat\/index.php\/2024\/04\/15\/sre-weekly-issue-420\/","url_meta":{"origin":871,"position":3},"title":"SRE Weekly Issue #420","date":"April 15, 2024","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, FireHydrant: FireHydrant is now AI-powered for faster, smarter incidents! Power up your incidents with auto-generated real-time summaries, retrospectives, and status page updates. https:\/\/firehydrant.com\/blog\/ai-for-incident-management-is-here\/ 1.0 Launch Retrospective The game Last Epoch launched in February, and they had a rocky start. This huge retrospective\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":823,"url":"https:\/\/fde.cat\/index.php\/2024\/02\/12\/sre-weekly-issue-411\/","url_meta":{"origin":871,"position":4},"title":"SRE Weekly Issue #411","date":"February 12, 2024","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, FireHydrant: \u201cTo be honest, when can we switch?\u201d The first impressions are in. Check out what people are saying after seeing Signals, the new standard in alerting and on-call from FireHydrant, for the first time. https:\/\/firehydrant.com\/signals\/ Shared On-Call Is Where the SRE\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":847,"url":"https:\/\/fde.cat\/index.php\/2024\/04\/01\/sre-weekly-issue-418\/","url_meta":{"origin":871,"position":5},"title":"SRE Weekly Issue #418","date":"April 1, 2024","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, FireHydrant: FireHydrant is now AI-powered for faster, smarter incidents! Power up your incidents with auto-generated real-time summaries, retrospectives, and status page updates. https:\/\/firehydrant.com\/blog\/ai-for-incident-management-is-here\/ Redefining Observability The observability waters have been muddy for awhile, and this article does a great job of taking\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/871","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/comments?post=871"}],"version-history":[{"count":0,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/871\/revisions"}],"wp:attachment":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/media?parent=871"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/categories?post=871"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/tags?post=871"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}