{"id":778,"date":"2023-10-30T00:54:07","date_gmt":"2023-10-30T00:54:07","guid":{"rendered":"https:\/\/fde.cat\/index.php\/2023\/10\/30\/sre-weekly-issue-396\/"},"modified":"2023-10-30T00:54:07","modified_gmt":"2023-10-30T00:54:07","slug":"sre-weekly-issue-396","status":"publish","type":"post","link":"https:\/\/fde.cat\/index.php\/2023\/10\/30\/sre-weekly-issue-396\/","title":{"rendered":"SRE Weekly Issue #396"},"content":{"rendered":"<p><a href=\"https:\/\/sreweekly.com\/sre-weekly-issue-396\/\" title=\"Permalink to SRE Weekly Issue #396\" class=\"email_only\">View on sreweekly.com<\/a><\/p>\n<div class=\"sreweekly-sponsor-message\">\n<h2>A message from our sponsor, <a href=\"https:\/\/firehydrant.com\/\">FireHydrant<\/a>:<\/h2>\n<p>DevOps keeps evolving but alerting tools are stuck in the past. Any modern alerting tool should be built on these four principles: cost-efficiency, service catalog empowerment, easier scheduling and substitutions, and clear distinctions between incidents and alerts.<br \/>\n<a href=\"https:\/\/firehydrant.com\/blog\/the-new-principles-of-incident-alerting-its-time-to-evolve\/\">https:\/\/firehydrant.com\/blog\/the-new-principles-of-incident-alerting-its-time-to-evolve\/<\/a><\/p>\n<\/div>\n<div class=\"wp-block-group\">\n<div class=\"wp-block-group__inner-container\">\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/thenewstack.io\/translating-failures-into-service-level-objectives\/\" target=\"_blank\" rel=\"noopener\">Translating Failures into Service-Level Objectives<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Using 3 high-profile incidents from the past year, this article explores how to define SLOs that might catch similar problems, with a special focus on keeping the SLI close to the user experience.<\/p>\n<p>\u00a0\u00a0<small> Adriana Villela and Ana Margarita Medina \u2014 The New Stack<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/robertovitillo.com\/costs-of-microservices\/\" target=\"_blank\" rel=\"noopener\">The costs of microservices<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Microservices can have some great benefits, but if you want to build with them, you\u2019re going to have to solve a whole pile of new problems.<\/p>\n<p>\u00a0\u00a0<small>Roberto Vitillo<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/robertovitillo.com\/how-distributed-systems-fail\/\" target=\"_blank\" rel=\"noopener\">How distributed systems fail<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>To protect your application against failures, you first need to know what can go wrong. [\u2026] the most common failures you will encounter are caused by single points of failure, the network being unreliable, slow processes, and unexpected load.<\/p>\n<p>\u00a0\u00a0<small>Roberto Vitillo<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/medium.com\/@letathenasleep\/alerting-the-dos-and-don-ts-for-effective-observability-139db9fb49d1\" target=\"_blank\" rel=\"noopener\">Sofia\u2019s Observability Odyssey: The Do\u2019s and Don\u2019ts for Effective Observability<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>I love how this article keeps things interesting by starting with a fictional (but realistic) story about the dangers of over-alerting before continuing on to give direct advice.<\/p>\n<p>\u00a0\u00a0<small>Adso<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/www.codereliant.io\/retries-backoff-jitter\/\" target=\"_blank\" rel=\"noopener\">Retries, Backoff and Jitter<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>I especially enjoy the section on the potential pitfalls and challenges with retries and how you can avoid them.<\/p>\n<p>\u00a0\u00a0<small>CodeReliant<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/www.reddit.com\/r\/sre\/comments\/177ob10\/as_an_sre_how_often_are_you_directly_involved\/\" target=\"_blank\" rel=\"noopener\">As an SRE, how often are you directly involved with application code \/ logic?<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>This reddit thread is a goldmine, including this gem:<\/p>\n<p>I actively avoid getting involved with software subject matter expertise, because it robs the engineering team of self-reliance, which is itself a reliability issue.<\/p>\n<p>\u00a0\u00a0<small>u\/bv8z and others \u2014 reddit<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/blog.rust-lang.org\/inside-rust\/2023\/07\/21\/crates-io-postmortem.html\" target=\"_blank\" rel=\"noopener\">crates.io Postmortem: Broken Crate Downloads<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>There\u2019s a pretty cool \u201cFive Whys\u201d-style analysis that goes past \u201cdev pushed unreviewed code with incomplete tests to production\u201d and to the sociotechnical challenges underlying that.<\/p>\n<p>\u00a0\u00a0<small>Tobias Bieniek \u2014 crates.io<\/small><\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p>SRE WEEKLY<\/p>","protected":false},"excerpt":{"rendered":"<p>View on sreweekly.com A message from our sponsor, FireHydrant: DevOps keeps evolving but alerting tools are stuck in the past. Any modern alerting tool should be built on these four principles: cost-efficiency, service catalog empowerment, easier scheduling and substitutions, and clear distinctions between incidents and alerts. https:\/\/firehydrant.com\/blog\/the-new-principles-of-incident-alerting-its-time-to-evolve\/ Translating Failures into Service-Level Objectives Using 3 high-profile&hellip; <a class=\"more-link\" href=\"https:\/\/fde.cat\/index.php\/2023\/10\/30\/sre-weekly-issue-396\/\">Continue reading <span class=\"screen-reader-text\">SRE Weekly Issue #396<\/span><\/a><\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","footnotes":""},"categories":[8],"tags":[],"class_list":["post-778","post","type-post","status-publish","format-standard","hentry","category-sre","entry"],"jetpack_featured_media_url":"","jetpack-related-posts":[{"id":350,"url":"https:\/\/fde.cat\/index.php\/2021\/08\/31\/sre-weekly-issue-284\/","url_meta":{"origin":778,"position":0},"title":"SRE Weekly Issue #284","date":"August 31, 2021","format":false,"excerpt":"View on sreweekly.com Like last week, I prepared this week\u2019s issue in advance, so no Outages section.\u00a0 Have a great week! A message from our sponsor, StackHawk: Trying to automate application and API security testing? See how StackHawk and Burp Suite Enterprise stack up: https:\/\/sthwk.com\/burp-enterprise Articles Alerting on SLOs like\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":832,"url":"https:\/\/fde.cat\/index.php\/2024\/03\/04\/sre-weekly-issue-414\/","url_meta":{"origin":778,"position":1},"title":"SRE Weekly Issue #414","date":"March 4, 2024","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, FireHydrant: 91% of engineering leaders say they want a better alerting tool. The other 9% couldn\u2019t take the survey on their Blackberry. Meet Signals: a new standard in alerting and on call, now available. https:\/\/firehydrant.com\/blog\/alerting-and-on-call-scheduling-for-how-you-actually-work\/ 2024 VOID Report This year\u2019s VOID Report\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":819,"url":"https:\/\/fde.cat\/index.php\/2024\/02\/05\/sre-weekly-issue-410\/","url_meta":{"origin":778,"position":2},"title":"SRE Weekly Issue #410","date":"February 5, 2024","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, FireHydrant: How many seats are you paying for in your legacy alerting tool that rarely get paged? With Signals\u2019 bucket pricing, you only pay for what you use. Join the beta for a better tool at a better price. https:\/\/firehydrant.com\/blog\/signals-beta-live\/ Staying in\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":815,"url":"https:\/\/fde.cat\/index.php\/2024\/01\/22\/sre-weekly-issue-408\/","url_meta":{"origin":778,"position":3},"title":"SRE Weekly Issue #408","date":"January 22, 2024","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, FireHydrant: It\u2019s time for a new world of alerting tools that prioritize engineer well-being and efficiency. The future lies in intelligent systems that are compatible with real life and use conditional rules to adapt and refine thresholds, reducing alert fatigue. https:\/\/firehydrant.com\/blog\/the-alert-fatigue-dilemma-a-call-for-change-in-how-we-manage-on-call\/ Tell\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":602,"url":"https:\/\/fde.cat\/index.php\/2022\/06\/27\/sre-weekly-issue-328\/","url_meta":{"origin":778,"position":4},"title":"SRE Weekly Issue #328","date":"June 27, 2022","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly \ud83d\ude92. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging and adding responders, postmortem timeline, setting up reminders, and more. Book a demo (+ get a snazzy Rootly lego set): https:\/\/rootly.com\/demo\/\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":673,"url":"https:\/\/fde.cat\/index.php\/2023\/02\/06\/sre-weekly-issue-358\/","url_meta":{"origin":778,"position":5},"title":"SRE Weekly Issue #358","date":"February 6, 2023","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly\u00a0\ud83d\ude92. Rootly automates manual tasks like creating an incident channel, Jira ticket and Zoom rooms, inviting responders, creating statuspage updates, postmortem timelines and more. Want to see why companies like Canva and Grammarly love us?:\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/778","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/comments?post=778"}],"version-history":[{"count":0,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/778\/revisions"}],"wp:attachment":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/media?parent=778"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/categories?post=778"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/tags?post=778"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}