{"id":838,"date":"2024-03-18T01:15:08","date_gmt":"2024-03-18T01:15:08","guid":{"rendered":"https:\/\/fde.cat\/index.php\/2024\/03\/18\/sre-weekly-issue-416\/"},"modified":"2024-03-18T01:15:08","modified_gmt":"2024-03-18T01:15:08","slug":"sre-weekly-issue-416","status":"publish","type":"post","link":"https:\/\/fde.cat\/index.php\/2024\/03\/18\/sre-weekly-issue-416\/","title":{"rendered":"SRE Weekly Issue #416"},"content":{"rendered":"<p><a href=\"https:\/\/sreweekly.com\/sre-weekly-issue-416\/\" title=\"Permalink to SRE Weekly Issue #416\" class=\"email_only\">View on sreweekly.com<\/a><\/p>\n<div class=\"sreweekly-sponsor-message\">\n<h2>A message from our sponsor, <a href=\"https:\/\/firehydrant.com\/\">FireHydrant<\/a>:<\/h2>\n<p>We need tools that help us show our value, enhance understanding of our systems, and free time for us to expand our skills. In this article, FireHydrant lays out three questions to ask vendors as you evaluate DevOps tools.<br \/>\n<a href=\"https:\/\/firehydrant.com\/blog\/3-questions-to-ask-of-any-devops-tool-in-2024\/\">https:\/\/firehydrant.com\/blog\/3-questions-to-ask-of-any-devops-tool-in-2024\/<\/a><\/p>\n<\/div>\n<div class=\"wp-block-group is-layout-flow wp-block-group-is-layout-flow\">\n<div class=\"wp-block-group__inner-container\">\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/thehackernews.com\/2024\/03\/4-instructive-postmortems-on-data.html\" target=\"_blank\" rel=\"noopener\">4 Instructive Postmortems on Data Downtime and Loss<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>What can we, in turn, learn from some of the most honest and blameless\u2014and public\u2014postmortems of the last few years?<\/p>\n<p>They cover incidents from GitLab, Tarsnap, Roblox, and Cloudflare with great summaries and takeaways.<\/p>\n<p>\u00a0\u00a0<small>The Hacker News<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/www.infoq.com\/podcasts\/incident-management-resilience\/\" target=\"_blank\" rel=\"noopener\">Resilience and Incident Management with Vanessa Huerta Granda <\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>My favorite part of this interview is when Vanessa describes parenting twin babies as constant incident response.<\/p>\n<p>\u00a0\u00a0<small>Shane Hastie \u2014 InfoQ<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/scalex.dev\/blog\/2024-02-28_beyond-the-beep-and-saving-sleep--optimizing-the-on-call-experience-84776f2e513c\/\" target=\"_blank\" rel=\"noopener\">Beyond the beep and saving sleep: optimizing the On-Call experience<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Here follow some lessons I\u2019ve learned from the trenches in small start-ups and larger engineering teams, to improve your on-call shift experience and remediation time for production issues and make sure you\u2019re spending on-call efforts on what has the most impact.<\/p>\n<p>\u00a0\u00a0<small>Alex Wauters<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/www.gremlin.com\/blog\/the-case-for-fault-injection-testing-in-production\" target=\"_blank\" rel=\"noopener\">The case for Fault Injection testing in Production<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Doing your chaos experiments in a non-production environment can feel safer, but what are you giving up?<\/p>\n<p>\u00a0\u00a0<small>Sam Rossoff \u2014 Gremlin<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/certomodo.substack.com\/p\/in-defense-of-shell-scripts\" target=\"_blank\" rel=\"noopener\">In Defense of Shell Scripts<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Sometimes, shell is just the right tool for the job.<\/p>\n<p>\u00a0\u00a0<small>Amin Astaneh \u2014 Certo Modo<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/www.csb.gov\/file.aspx?DocumentId=6120\" target=\"_blank\" rel=\"noopener\">Tank Explosions at Midland Resource Recovery<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p><a href=\"https:\/\/mastodon.social\/@whitequark\">Catherine from Mastodon<\/a> summarized this incident report beautifully:<\/p>\n<p>this is one of the most violently unhinged CSB reports i\u2019ve ever read [\u2026]<\/p>\n<p>while investigating an explosion at a facility, CSB staff tried to prevent another explosion of the same kind in the same facility, and being unable to convince the workers to not cause it, ended up hiding behind a shipping container<\/p>\n<p>\u00a0\u00a0<small>U.S. Chemical Safety and Hazard Investigation Board<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/www.checklyhq.com\/blog\/broken-windows-why-the-single-pane-of-glass-is-imp\/\" target=\"_blank\" rel=\"noopener\">Broken windows: why the \u2018Single Pane of Glass\u2019 is impossible<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>This one\u2019s about why people tend to want a \u201cSPoG\u201d and what we should want instead.  Bonus points for the Star Trek reference.<\/p>\n<p>\u00a0\u00a0<small>No\u010dnica Mellifera \u2014 Checkly<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/medium.com\/doctolib\/how-we-built-our-infrastructure-fail-over-checklist-5b31d4623136\" target=\"_blank\" rel=\"noopener\">How we built our infrastructure fail-over checklist<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Right in the middle of migrating from one datacenter to an HA pair of new datacenters, one of the new ones failed.  They had to quickly do a partial rollback of the migration to ride out the outage.<\/p>\n<p>\u00a0\u00a0<small>Gauthier Fran\u00e7ois \u2014 Doctolib<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/netflixtechblog.com\/announcing-bpftop-streamlining-ebpf-performance-optimization-6a727c1ae2e5\" target=\"_blank\" rel=\"noopener\">Announcing bpftop: Streamlining eBPF performance optimization<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Today, we are thrilled to announce the release of bpftop, a command-line tool designed to streamline the performance optimization and monitoring of eBPF programs.<\/p>\n<p>\u00a0\u00a0<small>Jose Fernandez \u2014 Netflix<\/small><\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p>SRE WEEKLY<\/p>","protected":false},"excerpt":{"rendered":"<p>View on sreweekly.com A message from our sponsor, FireHydrant: We need tools that help us show our value, enhance understanding of our systems, and free time for us to expand our skills. In this article, FireHydrant lays out three questions to ask vendors as you evaluate DevOps tools. https:\/\/firehydrant.com\/blog\/3-questions-to-ask-of-any-devops-tool-in-2024\/ 4 Instructive Postmortems on Data Downtime&hellip; <a class=\"more-link\" href=\"https:\/\/fde.cat\/index.php\/2024\/03\/18\/sre-weekly-issue-416\/\">Continue reading <span class=\"screen-reader-text\">SRE Weekly Issue #416<\/span><\/a><\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","footnotes":""},"categories":[8],"tags":[],"class_list":["post-838","post","type-post","status-publish","format-standard","hentry","category-sre","entry"],"jetpack_featured_media_url":"","jetpack-related-posts":[{"id":844,"url":"https:\/\/fde.cat\/index.php\/2024\/03\/25\/sre-weekly-issue-417\/","url_meta":{"origin":838,"position":0},"title":"SRE Weekly Issue #417","date":"March 25, 2024","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, FireHydrant: Join FireHydrant this Thursday for a conversation about on-call burnout and how to prevent it. Get a better understanding of what makes a fatigue-free on-call culture, including real-world examples from your incident management peers. No sales, just shop talk. https:\/\/app.livestorm.co\/firehydrant\/better-incidents-spring-bonfire-secrets-to-fatigue-free-on-call-in-2024 Harnessing\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":835,"url":"https:\/\/fde.cat\/index.php\/2024\/03\/11\/sre-weekly-issue-415\/","url_meta":{"origin":838,"position":1},"title":"SRE Weekly Issue #415","date":"March 11, 2024","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, FireHydrant: Join FireHydrant and talk shop with your DevOps peers on March 28! You\u2019ll gain a better understanding of what makes a fatigue-free on-call culture and how to implement practices to improve yours at this free, virtual roundtable. https:\/\/app.livestorm.co\/firehydrant\/better-incidents-spring-bonfire-secrets-to-fatigue-free-on-call-in-2024 The Wrong Way\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":318,"url":"https:\/\/fde.cat\/index.php\/2021\/08\/31\/sre-weekly-issue-274\/","url_meta":{"origin":838,"position":2},"title":"SRE Weekly Issue #274","date":"August 31, 2021","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, StackHawk: Join the GraphQL Security Testing Learning Lab on June 29 at 9 AM PT. Learn how to run automated security testing against your GraphQL APIs so you can find and fix vulnerabilities fast. http:\/\/sthwk.com\/graphql-learning-lab Articles Chicken Soup for the SLO The\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":771,"url":"https:\/\/fde.cat\/index.php\/2023\/10\/15\/sre-weekly-issue-394\/","url_meta":{"origin":838,"position":3},"title":"SRE Weekly Issue #394","date":"October 15, 2023","format":false,"excerpt":"View on sreweekly.com A warm welcome to my new sponsor, FireHydrant! A message from our sponsor, FireHydrant: The 2023 DORA report has two conclusions with big impacts on incident management: incremental steps matter, and good culture contributes to performance. Dig into both topics and explore ideas for how to start\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":809,"url":"https:\/\/fde.cat\/index.php\/2024\/01\/08\/sre-weekly-issue-406\/","url_meta":{"origin":838,"position":4},"title":"SRE Weekly Issue #406","date":"January 8, 2024","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, FireHydrant: Signals is now available in beta. Sign up to experience alerting for modern DevOps teams: Page teams, not services. Ingest inputs from any source. Bucket pricing based on usage. And one platform \u2014 ring to retro \u2014 finally. https:\/\/firehydrant.com\/blog\/signals-beta-live\/ How to\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":737,"url":"https:\/\/fde.cat\/index.php\/2023\/07\/23\/sre-weekly-issue-382\/","url_meta":{"origin":838,"position":5},"title":"SRE Weekly Issue #382","date":"July 23, 2023","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, Rootly: Eliminate the anxiety around declaring an incident for nebulous problems by introducing a triage phase into your incident management process. Our latest blog posts dives into why the triage phase is so important, and how you can automate yours with Rootly.\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/838","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/comments?post=838"}],"version-history":[{"count":0,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/838\/revisions"}],"wp:attachment":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/media?parent=838"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/categories?post=838"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/tags?post=838"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}