{"id":723,"date":"2023-06-12T00:41:02","date_gmt":"2023-06-12T00:41:02","guid":{"rendered":"https:\/\/fde.cat\/index.php\/2023\/06\/12\/sre-weekly-issue-376\/"},"modified":"2023-06-12T00:41:02","modified_gmt":"2023-06-12T00:41:02","slug":"sre-weekly-issue-376","status":"publish","type":"post","link":"https:\/\/fde.cat\/index.php\/2023\/06\/12\/sre-weekly-issue-376\/","title":{"rendered":"SRE Weekly Issue #376"},"content":{"rendered":"<p><a href=\"https:\/\/sreweekly.com\/sre-weekly-issue-376\/\" title=\"Permalink to SRE Weekly Issue #376\" class=\"email_only\">View on sreweekly.com<\/a><\/p>\n<div class=\"sreweekly-sponsor-message\">\n<h2>A message from our sponsor, <a href=\"https:\/\/rootly.com\/demo\/?utm_source=sreweekly\">Rootly<\/a>:<\/h2>\n<p>Curious how companies like Figma, Tripadvisor, and 100s of others leverage Rootly to manage incidents in Slack and unlock instant best practices?  Check out this lightning demo:<br \/>\n<a href=\"https:\/\/www.loom.com\/share\/051c4be0425a436e888dc0c3690855ad\">https:\/\/www.loom.com\/share\/051c4be0425a436e888dc0c3690855ad<\/a><\/p>\n<\/div>\n<h2>Articles<\/h2>\n<div class=\"wp-block-group\">\n<div class=\"wp-block-group__inner-container\">\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/www.datadoghq.com\/blog\/engineering\/2023-03-08-deep-dive-into-incident-response\/\" target=\"_blank\" rel=\"noopener\">2023 03 08 Incident: A Deep Dive into Our Incident Response<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>With 100 workstreams and over 500 engineers engaged, this was the biggest incident response I\u2019ve read about in years.<\/p>\n<p>We had to force ourselves to identify the facts on the ground instead of \u201cwhat ought to be,\u201d and overrule our instincts to look for data in the places we normally looked (since our own monitoring was impacted).<\/p>\n<p>\u00a0\u00a0<small>Laura de Vesine \u2014 Datadog<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/thenewstack.io\/how-the-3-pillars-of-observability-miss-the-big-picture\/\" target=\"_blank\" rel=\"noopener\">How the \u20183 Pillars of Observability\u2019 Miss the Big Picture<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>When you unify these three \u201cpillars\u201d into one cohesive approach, a new ability to understand the full state of your system in several new ways also emerges.<\/p>\n<p>\u00a0\u00a0<small>Danyel Fisher \u2014 The New Stack<\/small><br \/>\n\u00a0\u00a0<small><em>Full disclosure: Honeycomb, my employer, is mentioned.<\/em><\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/status.dev.azure.com\/_event\/392143683\/post-mortem\" target=\"_blank\" rel=\"noopener\">Azure DevOps Outage in South Brazil<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>This report details the 10-hour incident response following the accidental deletion of live databases (rather than their snapshots, as intended).<\/p>\n<p>\u00a0\u00a0<small>Eric Mattingly \u2014 Azure<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/news.ycombinator.com\/item?id=36223543\" target=\"_blank\" rel=\"noopener\">Show HN: Keep \u2013 Create production alerts from plain English<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Neat trick: write your alerts in English and get GPT to convert them to real alert configurations.<\/p>\n<p>\u00a0\u00a0<small>Shahar and Tal \u2014 Keep (via HackerNews)<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/utcc.utoronto.ca\/~cks\/space\/blog\/sysadmin\/DNSResolverQueryLimitsIssue\" target=\"_blank\" rel=\"noopener\">A potential issue with outstanding query limits in your DNS resolver<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>If your DNS resolver is responsible for handling queries for both internal and external domains, what happens when external DNS requests fail?  Can internal ones still proceed?<\/p>\n<p>\u00a0\u00a0<small>Chris Siebenmann<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/davidkcaudill.medium.com\/delusion-soup-how-observability-got-here-and-what-we-can-do-about-it-21e3be942e9c?source=rss-6ae4c389b6bf------2\" target=\"_blank\" rel=\"noopener\">Delusion Soup: How Observability Got Here, and What We Can Do About It<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>This article explains potential pitfalls and downsides to observability tools and the ways vendors might try to get you to use them, along with tips for how to avoid the traps.<\/p>\n<p>\u00a0\u00a0<small>David Caudill<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/surfingcomplexity.blog\/2023\/06\/10\/treating-uncertainty-as-a-first-class-concern\/\" target=\"_blank\" rel=\"noopener\">Treating uncertainty as a first-class concern<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Too often, we dismiss the anomaly we just faced in an incident as a weird, one-off occurrence. And while that specific failure mode likely will be a one-off, we\u2019ll be faced with new anomalies in the future.<\/p>\n<p>\u00a0\u00a0<small>Loron Hochstein \u2014 Surfing Complexity<\/small><\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p>SRE WEEKLY<\/p>","protected":false},"excerpt":{"rendered":"<p>View on sreweekly.com A message from our sponsor, Rootly: Curious how companies like Figma, Tripadvisor, and 100s of others leverage Rootly to manage incidents in Slack and unlock instant best practices? Check out this lightning demo: https:\/\/www.loom.com\/share\/051c4be0425a436e888dc0c3690855ad Articles 2023 03 08 Incident: A Deep Dive into Our Incident Response With 100 workstreams and over 500&hellip; <a class=\"more-link\" href=\"https:\/\/fde.cat\/index.php\/2023\/06\/12\/sre-weekly-issue-376\/\">Continue reading <span class=\"screen-reader-text\">SRE Weekly Issue #376<\/span><\/a><\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","footnotes":""},"categories":[8],"tags":[],"class_list":["post-723","post","type-post","status-publish","format-standard","hentry","category-sre","entry"],"jetpack_featured_media_url":"","jetpack-related-posts":[{"id":543,"url":"https:\/\/fde.cat\/index.php\/2022\/02\/21\/sre-weekly-issue-310\/","url_meta":{"origin":723,"position":0},"title":"SRE Weekly Issue #310","date":"February 21, 2022","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly \ud83d\ude92. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging the right team, postmortem timeline, setting up reminders, and more. Book a demo (+ get a snazzy Rootly shirt): https:\/\/rootly.com\/demo\/?utm_source=sreweekly Articles\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":535,"url":"https:\/\/fde.cat\/index.php\/2022\/01\/24\/sre-weekly-issue-306\/","url_meta":{"origin":723,"position":1},"title":"SRE Weekly Issue #306","date":"January 24, 2022","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly \ud83d\ude92. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging the right team, postmortem timeline, setting up reminders, and more. Book a demo (+ get a snazzy Rootly shirt): https:\/\/rootly.com\/demo\/?utm_source=sreweekly Articles\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":579,"url":"https:\/\/fde.cat\/index.php\/2022\/05\/30\/sre-weekly-issue-324\/","url_meta":{"origin":723,"position":2},"title":"SRE Weekly Issue #324","date":"May 30, 2022","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly \ud83d\ude92. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging and adding responders, postmortem timeline, setting up reminders, and more. Book a demo (+ get a snazzy Rootly lego set): https:\/\/rootly.com\/demo\/\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":546,"url":"https:\/\/fde.cat\/index.php\/2022\/03\/07\/sre-weekly-issue-312\/","url_meta":{"origin":723,"position":3},"title":"SRE Weekly Issue #312","date":"March 7, 2022","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly \ud83d\ude92. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging the right team, postmortem timeline, setting up reminders, and more. Book a demo (+ get a snazzy Rootly shirt): https:\/\/rootly.com\/demo\/?utm_source=sreweekly Articles\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":537,"url":"https:\/\/fde.cat\/index.php\/2022\/01\/31\/sre-weekly-issue-307\/","url_meta":{"origin":723,"position":4},"title":"SRE Weekly Issue #307","date":"January 31, 2022","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly \ud83d\ude92. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging the right team, postmortem timeline, setting up reminders, and more. Book a demo (+ get a snazzy Rootly shirt): https:\/\/rootly.com\/demo\/?utm_source=sreweekly Articles\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":603,"url":"https:\/\/fde.cat\/index.php\/2022\/07\/04\/sre-weekly-issue-329\/","url_meta":{"origin":723,"position":5},"title":"SRE Weekly Issue #329","date":"July 4, 2022","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly \ud83d\ude92. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging and adding responders, postmortem timeline, setting up reminders, and more. Book a demo (+ get a snazzy Rootly lego set): https:\/\/rootly.com\/demo\/\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/723","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/comments?post=723"}],"version-history":[{"count":0,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/723\/revisions"}],"wp:attachment":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/media?parent=723"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/categories?post=723"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/tags?post=723"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}