{"id":832,"date":"2024-03-04T04:14:26","date_gmt":"2024-03-04T04:14:26","guid":{"rendered":"https:\/\/fde.cat\/index.php\/2024\/03\/04\/sre-weekly-issue-414\/"},"modified":"2024-03-04T04:14:26","modified_gmt":"2024-03-04T04:14:26","slug":"sre-weekly-issue-414","status":"publish","type":"post","link":"https:\/\/fde.cat\/index.php\/2024\/03\/04\/sre-weekly-issue-414\/","title":{"rendered":"SRE Weekly Issue #414"},"content":{"rendered":"<p><a href=\"https:\/\/sreweekly.com\/sre-weekly-issue-414\/\" title=\"Permalink to SRE Weekly Issue #414\" class=\"email_only\">View on sreweekly.com<\/a><\/p>\n<div class=\"sreweekly-sponsor-message\">\n<h2>A message from our sponsor, <a href=\"https:\/\/firehydrant.com\/\">FireHydrant<\/a>:<\/h2>\n<p>91% of engineering leaders say they want a better alerting tool. The other 9% couldn\u2019t take the survey on their Blackberry. Meet Signals: a new standard in alerting and on call, now available.<br \/>\n<a href=\"https:\/\/firehydrant.com\/blog\/alerting-and-on-call-scheduling-for-how-you-actually-work\/\">https:\/\/firehydrant.com\/blog\/alerting-and-on-call-scheduling-for-how-you-actually-work\/<\/a><\/p>\n<\/div>\n<div class=\"wp-block-group is-layout-flow wp-block-group-is-layout-flow\">\n<div class=\"wp-block-group__inner-container\">\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/www.thevoid.community\/report-2024#custom-code1\" target=\"_blank\" rel=\"noopener\">2024 VOID Report<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>This year\u2019s VOID Report is out, and it\u2019s well worth a read.  The subtitle is \u201cExploring the Unintended Consequences of Automation in Software\u201d which is a really good way to get me to read something!<\/p>\n<p>\u00a0\u00a0<small>Courtney Nash \u2014 The VOID<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/www.infoq.com\/presentations\/policy-automation-velocity-reliability\/\" target=\"_blank\" rel=\"noopener\">How DoorDash Ensures Velocity and Reliability through Policy Automation <\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>A terraform change deleted a critical resource, and reviewers missed it because the plan was so big.  Now they use Atlantis and Open Policy Agent to avoid accidental deletions of critical resources.<\/p>\n<p>\u00a0\u00a0<small>Lin Du \u2014 InfoQ<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/surfingcomplexity.blog\/2024\/02\/17\/what-if-everybody-did-everything-right\/\" target=\"_blank\" rel=\"noopener\">What if everybody did everything right?<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>When analyzing an incident, what can we learn when we assume that everyone did everything as well as possible?<\/p>\n<p>\u00a0\u00a0<small>Lorin Hochstein<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/status.cloud.google.com\/incidents\/McSxWsRNvAPn7SbWGGig\" target=\"_blank\" rel=\"noopener\">Google Cloud Incident Report: europe-west8-b partial outage <\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>onsite technicians performing this planned network maintenance inadvertently unplugged several fibers that were adjacent to those in the work order, but still in use for production traffic<\/p>\n<p>\u00a0\u00a0<small>Google<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/uptimerobot.com\/blog\/what-does-999-uptime-mean\/\" target=\"_blank\" rel=\"noopener\">What Does 99.999% Uptime Really Mean?<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>There\u2019s a huge difference between four and five nines. There\u2019s an especially interesting quote in this article that Google doesn\u2019t think five nines is attainable in a commercial service.<\/p>\n<p>\u00a0\u00a0<small>Diana Bocco \u2014 UptimeRobot<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/www.ibm.com\/blogs\/jobs\/life-as-a-site-reliability-engineer-at-ibm\/\" target=\"_blank\" rel=\"noopener\">Life as a Site Reliability Engineer at IBM<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Here\u2019s an interview with three SREs about what it\u2019s like to be an SRE at IBM.<\/p>\n<p>\u00a0\u00a0<small>IBM<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/www.honeycomb.io\/blog\/cost-crisis-observability-tooling\" target=\"_blank\" rel=\"noopener\">The Cost Crisis in Observability Tooling<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>I\u2019ve been hearing about Observability 2.0 but didn\u2019t know what it was all about.  This article explains what it is and how it can help with cost.<\/p>\n<p>\u00a0\u00a0<small>Charity Majors \u2014 Honeycomb<\/small><br \/>\n\u00a0\u00a0<small><em>Full disclosure: Honeycomb is my employer.<\/em><\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/www.youtube.com\/watch?v=ia8Q51ouA_s\" target=\"_blank\" rel=\"noopener\">Positive Affirmations for Site Reliability Engineers<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>A cute little video pep talk for SREs.  <a href=\"https:\/\/srenity.online\/\">The site<\/a> is actually real, too!<\/p>\n<p>\u00a0\u00a0<small>Krazam<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/blog.pragmaticengineer.com\/happy-leap-day\/\" target=\"_blank\" rel=\"noopener\">Happy Leap Day!<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Like a mini Y2K, leap day came around again and left some technical glitches in its wake, as chronicled in this article.<\/p>\n<p>\u00a0\u00a0<small>Gergely Orosz \u2014 The Pragmatic Engineer<\/small><\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p>SRE WEEKLY<\/p>","protected":false},"excerpt":{"rendered":"<p>View on sreweekly.com A message from our sponsor, FireHydrant: 91% of engineering leaders say they want a better alerting tool. The other 9% couldn\u2019t take the survey on their Blackberry. Meet Signals: a new standard in alerting and on call, now available. https:\/\/firehydrant.com\/blog\/alerting-and-on-call-scheduling-for-how-you-actually-work\/ 2024 VOID Report This year\u2019s VOID Report is out, and it\u2019s well&hellip; <a class=\"more-link\" href=\"https:\/\/fde.cat\/index.php\/2024\/03\/04\/sre-weekly-issue-414\/\">Continue reading <span class=\"screen-reader-text\">SRE Weekly Issue #414<\/span><\/a><\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","footnotes":""},"categories":[8],"tags":[],"class_list":["post-832","post","type-post","status-publish","format-standard","hentry","category-sre","entry"],"jetpack_featured_media_url":"","jetpack-related-posts":[{"id":504,"url":"https:\/\/fde.cat\/index.php\/2021\/11\/15\/sre-weekly-issue-296\/","url_meta":{"origin":832,"position":0},"title":"SRE Weekly Issue #296","date":"November 15, 2021","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly \ud83d\ude92. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging the right team, postmortem timeline, setting up reminders, and more. Book a demo: https:\/\/rootly.com\/?utm_source=sreweekly Articles Do you remember, the twenty fires\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":781,"url":"https:\/\/fde.cat\/index.php\/2023\/11\/06\/sre-weekly-issue-397\/","url_meta":{"origin":832,"position":1},"title":"SRE Weekly Issue #397","date":"November 6, 2023","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, FireHydrant: Incident management platform FireHydrant is combining alerting and incident response in one ring-to-retro tool. Sign up for the early access waitlist and be the first to experience the power of alerting + incident response in one platform at last. https:\/\/firehydrant.com\/signals\/ Modern\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":778,"url":"https:\/\/fde.cat\/index.php\/2023\/10\/30\/sre-weekly-issue-396\/","url_meta":{"origin":832,"position":2},"title":"SRE Weekly Issue #396","date":"October 30, 2023","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, FireHydrant: DevOps keeps evolving but alerting tools are stuck in the past. Any modern alerting tool should be built on these four principles: cost-efficiency, service catalog empowerment, easier scheduling and substitutions, and clear distinctions between incidents and alerts. https:\/\/firehydrant.com\/blog\/the-new-principles-of-incident-alerting-its-time-to-evolve\/ Translating Failures into\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":815,"url":"https:\/\/fde.cat\/index.php\/2024\/01\/22\/sre-weekly-issue-408\/","url_meta":{"origin":832,"position":3},"title":"SRE Weekly Issue #408","date":"January 22, 2024","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, FireHydrant: It\u2019s time for a new world of alerting tools that prioritize engineer well-being and efficiency. The future lies in intelligent systems that are compatible with real life and use conditional rules to adapt and refine thresholds, reducing alert fatigue. https:\/\/firehydrant.com\/blog\/the-alert-fatigue-dilemma-a-call-for-change-in-how-we-manage-on-call\/ Tell\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":812,"url":"https:\/\/fde.cat\/index.php\/2024\/01\/15\/sre-weekly-issue-407\/","url_meta":{"origin":832,"position":4},"title":"SRE Weekly Issue #407","date":"January 15, 2024","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, FireHydrant: Signals is now available in beta. Sign up to experience alerting for modern DevOps teams: Page teams, not services. Ingest inputs from any source. Bucket pricing based on usage. And one platform \u2014 ring to retro \u2014 finally. https:\/\/firehydrant.com\/blog\/signals-beta-live\/ On chains\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":775,"url":"https:\/\/fde.cat\/index.php\/2023\/10\/23\/sre-weekly-issue-395\/","url_meta":{"origin":832,"position":5},"title":"SRE Weekly Issue #395","date":"October 23, 2023","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, FireHydrant: Incident management platform FireHydrant is combining alerting and incident response in one ring-to-retro tool. Sign up for the early access waitlist and be the first to experience the power of alerting + incident response in one platform at last. https:\/\/firehydrant.com\/signals\/ What\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/832","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/comments?post=832"}],"version-history":[{"count":0,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/832\/revisions"}],"wp:attachment":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/media?parent=832"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/categories?post=832"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/tags?post=832"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}