{"id":815,"date":"2024-01-22T03:05:54","date_gmt":"2024-01-22T03:05:54","guid":{"rendered":"https:\/\/fde.cat\/index.php\/2024\/01\/22\/sre-weekly-issue-408\/"},"modified":"2024-01-22T03:05:54","modified_gmt":"2024-01-22T03:05:54","slug":"sre-weekly-issue-408","status":"publish","type":"post","link":"https:\/\/fde.cat\/index.php\/2024\/01\/22\/sre-weekly-issue-408\/","title":{"rendered":"SRE Weekly Issue #408"},"content":{"rendered":"<p><a href=\"https:\/\/sreweekly.com\/sre-weekly-issue-408\/\" title=\"Permalink to SRE Weekly Issue #408\" class=\"email_only\">View on sreweekly.com<\/a><\/p>\n<div class=\"sreweekly-sponsor-message\">\n<h2>A message from our sponsor, <a href=\"https:\/\/firehydrant.com\/\">FireHydrant<\/a>:<\/h2>\n<p>It\u2019s time for a new world of alerting tools that prioritize engineer well-being and efficiency. The future lies in intelligent systems that are compatible with real life and use conditional rules to adapt and refine thresholds, reducing alert fatigue.<br \/>\n<a href=\"https:\/\/firehydrant.com\/blog\/the-alert-fatigue-dilemma-a-call-for-change-in-how-we-manage-on-call\/\">https:\/\/firehydrant.com\/blog\/the-alert-fatigue-dilemma-a-call-for-change-in-how-we-manage-on-call\/<\/a><\/p>\n<\/div>\n<div class=\"wp-block-group is-layout-flow wp-block-group-is-layout-flow\">\n<div class=\"wp-block-group__inner-container\">\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/surfingcomplexity.blog\/2023\/12\/24\/tell-me-about-a-time\/\" target=\"_blank\" rel=\"noopener\">Tell me about a time\u2026<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>This is either a set of SRE interview topics or the squares for the SRE bingo card.<\/p>\n<p>\u00a0\u00a0<small>Lorin Hochstein<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/willgallego.com\/2024\/01\/10\/blame-awareness-is-universal\/\" target=\"_blank\" rel=\"noopener\">Blame Awareness is Universal<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Blame awareness only works if you work towards blame awareness with all incidents, not just the ones that affect you.<\/p>\n<p>\u00a0\u00a0<small>Will Gallego<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/netflixtechblog.com\/rebuilding-netflix-video-processing-pipeline-with-microservices-4e5e6310e359?source=rss----2615bd06b42e---4\" target=\"_blank\" rel=\"noopener\">Rebuilding Netflix Video Processing Pipeline with Microservices<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>a brief history of our pipeline and the platforms, why the rebuilding was necessary, what these new services look like, and how they are being used for Netflix businesses.<\/p>\n<p>\u00a0\u00a0<small>Liwei Guo, Anush Moorthy, Li-Heng Chen, Vinicius Carvalho, Aditya Mavlankar, Agata Opalach, Adithya Prakash, Kyle Swanson, Jessica Tweneboah, Subbu Venkatrav, Lishan Zhu \u2014 Netflix<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/www.datadoghq.com\/blog\/best-practices-to-prevent-alert-fatigue\/\" target=\"_blank\" rel=\"noopener\">Best practices to prevent alert fatigue<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Here are five concrete tips to fix your alerts and improve alert fatigue.<\/p>\n<p>\u00a0 <small>Candace Shamieh, Daljeet Sandu, and Nicolas Narbais <\/small><small>\u2014<\/small> Datadog<\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/medium.com\/site-reliability-engineering-leadership\/sre-governance-d852342a329c\" target=\"_blank\" rel=\"noopener\">SRE Governance<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>This article contains guidelines for many kinds of reviews and activities SRE can do to improve reliability, such as SLO reviews, dependency reviews, and more.<\/p>\n<p>\u00a0\u00a0<small>Jamie Allen<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/www.honeycomb.io\/blog\/alerts-are-fundamentally-messy\" target=\"_blank\" rel=\"noopener\">Alerts Are Fundamentally Messy<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>However, the reality of alerting in a socio-technical system must cater not only to the mess around the signal, but also to the longer term interpretation of alerts by people and automation acting on them. This post will expand on this messiness and why Honeycomb favors an iterative approach to setting our alerts.<\/p>\n<p>\u00a0\u00a0<small>Fred Hebert \u2014 Honeycomb<\/small><br \/>\n\u00a0\u00a0<small><em>Full disclosure: Honeycomb is my employer.<\/em><\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/www.srepath.com\/danger-of-unreliable-platform-engineering\/\" target=\"_blank\" rel=\"noopener\">#23 \u2013 The Danger of Unreliable Platforms (with Jade Rubick)<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>This far-ranging conversation covers many aspects of developing a reliable platform for engineering.  There\u2019s a text summary if audio\u2019s not your thing.<\/p>\n<p>\u00a0\u00a0<small>Ash Patel \u2014 SREPath<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/slack.engineering\/slacks-migration-to-a-cellular-architecture\/\" target=\"_blank\" rel=\"noopener\">Slack\u2019s Migration to a Cellular Architecture<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Spurred by a single-AZ outage that took down their service, Slack set out to break their system into isolated segments so that an AZ can be drained of traffic quickly and without impacting customers.<\/p>\n<p>\u00a0\u00a0<small>Cooper Bethea \u2014 Slack<\/small><\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p>SRE WEEKLY<\/p>","protected":false},"excerpt":{"rendered":"<p>View on sreweekly.com A message from our sponsor, FireHydrant: It\u2019s time for a new world of alerting tools that prioritize engineer well-being and efficiency. The future lies in intelligent systems that are compatible with real life and use conditional rules to adapt and refine thresholds, reducing alert fatigue. https:\/\/firehydrant.com\/blog\/the-alert-fatigue-dilemma-a-call-for-change-in-how-we-manage-on-call\/ Tell me about a time\u2026 This&hellip; <a class=\"more-link\" href=\"https:\/\/fde.cat\/index.php\/2024\/01\/22\/sre-weekly-issue-408\/\">Continue reading <span class=\"screen-reader-text\">SRE Weekly Issue #408<\/span><\/a><\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","footnotes":""},"categories":[8],"tags":[],"class_list":["post-815","post","type-post","status-publish","format-standard","hentry","category-sre","entry"],"jetpack_featured_media_url":"","jetpack-related-posts":[{"id":531,"url":"https:\/\/fde.cat\/index.php\/2022\/01\/17\/sre-weekly-issue-305\/","url_meta":{"origin":815,"position":0},"title":"SRE Weekly Issue #305","date":"January 17, 2022","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly \ud83d\ude92. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging the right team, postmortem timeline, setting up reminders, and more. Book a demo (+ get a snazzy Rootly shirt): https:\/\/rootly.com\/demo\/?utm_source=sreweekly Articles\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":817,"url":"https:\/\/fde.cat\/index.php\/2024\/01\/29\/sre-weekly-issue-409\/","url_meta":{"origin":815,"position":1},"title":"SRE Weekly Issue #409","date":"January 29, 2024","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, FireHydrant: It\u2019s time for a new world of alerting tools that prioritize engineer well-being and efficiency. The future lies in intelligent systems that are compatible with real life and use conditional rules to adapt and refine thresholds, reducing alert fatigue. https:\/\/firehydrant.com\/blog\/the-alert-fatigue-dilemma-a-call-for-change-in-how-we-manage-on-call\/ Executing\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":847,"url":"https:\/\/fde.cat\/index.php\/2024\/04\/01\/sre-weekly-issue-418\/","url_meta":{"origin":815,"position":2},"title":"SRE Weekly Issue #418","date":"April 1, 2024","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, FireHydrant: FireHydrant is now AI-powered for faster, smarter incidents! Power up your incidents with auto-generated real-time summaries, retrospectives, and status page updates. https:\/\/firehydrant.com\/blog\/ai-for-incident-management-is-here\/ Redefining Observability The observability waters have been muddy for awhile, and this article does a great job of taking\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":255,"url":"https:\/\/fde.cat\/index.php\/2021\/08\/31\/sre-weekly-issue-252\/","url_meta":{"origin":815,"position":3},"title":"SRE Weekly Issue #252","date":"August 31, 2021","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, StackHawk: Interested in how you can automate application security testing with GitHub Actions? Check out this on demand webinar from StackHawk and Snyk and see how simple it is to get started. https:\/\/sthwk.com\/stackhawk-snyk Articles Building On-Call Culture at GitHub Their on-call started\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":835,"url":"https:\/\/fde.cat\/index.php\/2024\/03\/11\/sre-weekly-issue-415\/","url_meta":{"origin":815,"position":4},"title":"SRE Weekly Issue #415","date":"March 11, 2024","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, FireHydrant: Join FireHydrant and talk shop with your DevOps peers on March 28! You\u2019ll gain a better understanding of what makes a fatigue-free on-call culture and how to implement practices to improve yours at this free, virtual roundtable. https:\/\/app.livestorm.co\/firehydrant\/better-incidents-spring-bonfire-secrets-to-fatigue-free-on-call-in-2024 The Wrong Way\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":546,"url":"https:\/\/fde.cat\/index.php\/2022\/03\/07\/sre-weekly-issue-312\/","url_meta":{"origin":815,"position":5},"title":"SRE Weekly Issue #312","date":"March 7, 2022","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly \ud83d\ude92. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging the right team, postmortem timeline, setting up reminders, and more. Book a demo (+ get a snazzy Rootly shirt): https:\/\/rootly.com\/demo\/?utm_source=sreweekly Articles\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/815","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/comments?post=815"}],"version-history":[{"count":0,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/815\/revisions"}],"wp:attachment":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/media?parent=815"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/categories?post=815"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/tags?post=815"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}