{"id":794,"date":"2023-11-20T02:41:45","date_gmt":"2023-11-20T02:41:45","guid":{"rendered":"https:\/\/fde.cat\/index.php\/2023\/11\/20\/sre-weekly-issue-399\/"},"modified":"2023-11-20T02:41:45","modified_gmt":"2023-11-20T02:41:45","slug":"sre-weekly-issue-399","status":"publish","type":"post","link":"https:\/\/fde.cat\/index.php\/2023\/11\/20\/sre-weekly-issue-399\/","title":{"rendered":"SRE Weekly Issue #399"},"content":{"rendered":"<p><a href=\"https:\/\/sreweekly.com\/sre-weekly-issue-399\/\" title=\"Permalink to SRE Weekly Issue #399\" class=\"email_only\">View on sreweekly.com<\/a><\/p>\n<div class=\"sreweekly-sponsor-message\">\n<h2>A message from our sponsor, <a href=\"https:\/\/firehydrant.com\/\">FireHydrant<\/a>:<\/h2>\n<p>Severity levels help responders and stakeholders understand the incident impact and set expectations for the level of response. This can mean jumping into action faster. But first, you have to ensure severity is actually being set. Here\u2019s one way.<br \/>\n<a href=\"https:\/\/firehydrant.com\/blog\/incident-severity-why-you-need-it-and-how-to-ensure-its-set\/\">https:\/\/firehydrant.com\/blog\/incident-severity-why-you-need-it-and-how-to-ensure-its-set\/ <\/a><\/p>\n<\/div>\n<div class=\"wp-block-group\">\n<div class=\"wp-block-group__inner-container\">\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/ferd.ca\/notes\/paper-how-in-the-world-did-we-ever-get-into-that-mode.html\" target=\"_blank\" rel=\"noopener\">Paper: How in the World Did We Ever Get into That Mode?<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>This research paper summary goes into Mode Error and the dangers of adding more features to a system in the form of modes, especially if the system can change modes on its own.<\/p>\n<p>\u00a0\u00a0<small>Fred Hebert (summary)<\/small><br \/>\n\u00a0\u00a0<small>Dr. Nadine B. Sarter (original paper)<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"http:\/\/blog.cloudflare.com\/post-mortem-on-cloudflare-control-plane-and-analytics-outage\/\" target=\"_blank\" rel=\"noopener\">Post Mortem on Cloudflare Control Plane and Analytics Outage<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Cloudflare suffered a power outage in one of the datacenters housing their control and data planes. The outage itself is intriguing, and in its aftermath, Cloudflare learned that their system wasn\u2019t as HA as they thought.<\/p>\n<p>Lots of great lessons here, and if you want more, they posted <a href=\"http:\/\/blog.cloudflare.com\/cloudflare-incident-on-october-30-2023\/\">another incident writeup<\/a> recently.<\/p>\n<p>\u00a0\u00a0<small> Matthew Prince \u2014 Cloudflare<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/lab.scub.net\/command-query-responsibility-segregation-cqrs-93e35d1929ec\" target=\"_blank\" rel=\"noopener\">Architecture Patterns : Command Query Responsibility Segregation (CQRS)<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Separating write from read workloads can increase complexity but also open the door to greater scalability, as this article explains.<\/p>\n<p>\u00a0\u00a0<small>Pier-Jean Malandrino<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/www.codereliant.io\/load-shedding\/\" target=\"_blank\" rel=\"noopener\">Load Shedding for High Traffic Systems<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Covers four strategies for load shedding, with code examples:<\/p>\n<p>Random Shedding<br \/>\nPriority-Based Shedding<br \/>\nResource-Based Shedding<br \/>\nNode Isolation<\/p>\n<p>\u00a0\u00a0<small>Code Reliant<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/blog.pragmaticengineer.com\/aws-azure-and-gcp-regional-outages\/\" target=\"_blank\" rel=\"noopener\">Handling a Regional Outage: Comparing the Response From AWS, Azure and GCP<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Lots of juicy details about the three outages, including a link to AWS\u2019s write-up of their Lambda outage in June.<\/p>\n<p>\u00a0\u00a0<small>Gergely Orosz<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/lab.scub.net\/architecture-patterns-the-circuit-breaker-8f79280771f1\" target=\"_blank\" rel=\"noopener\">Architecture Patterns : The Circuit-Breaker<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>The diagrams in this article are especially useful for understanding how the circuit-breaker pattern works.<\/p>\n<p>\u00a0\u00a0<small>Pier-Jean Malandrino<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/hart-michael.medium.com\/how-to-be-on-call-034e3a202729\" target=\"_blank\" rel=\"noopener\">How to be on-call<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>This one\u2019s about how on-call can go bad, and how to structure your team\u2019s on-call so to be livable and sustainable.<\/p>\n<p>\u00a0\u00a0<small>Michael Hart<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/rootly.com\/blog\/working-effectively-with-executives-during-an-incident\" target=\"_blank\" rel=\"noopener\">Working Effectively With Executives During an Incident<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Execs cast a big shadow in an incident, so it\u2019s important to have a plan for how to communicate with them, as this article explains.<\/p>\n<p>\u00a0\u00a0<small>Ashley Sawatsky \u2014 Rootly<\/small><\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p>SRE WEEKLY<\/p>","protected":false},"excerpt":{"rendered":"<p>View on sreweekly.com A message from our sponsor, FireHydrant: Severity levels help responders and stakeholders understand the incident impact and set expectations for the level of response. This can mean jumping into action faster. But first, you have to ensure severity is actually being set. Here\u2019s one way. https:\/\/firehydrant.com\/blog\/incident-severity-why-you-need-it-and-how-to-ensure-its-set\/ Paper: How in the World Did&hellip; <a class=\"more-link\" href=\"https:\/\/fde.cat\/index.php\/2023\/11\/20\/sre-weekly-issue-399\/\">Continue reading <span class=\"screen-reader-text\">SRE Weekly Issue #399<\/span><\/a><\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","footnotes":""},"categories":[8],"tags":[],"class_list":["post-794","post","type-post","status-publish","format-standard","hentry","category-sre","entry"],"jetpack_featured_media_url":"","jetpack-related-posts":[{"id":732,"url":"https:\/\/fde.cat\/index.php\/2023\/07\/10\/sre-weekly-issue-380\/","url_meta":{"origin":794,"position":0},"title":"SRE Weekly Issue #380","date":"July 10, 2023","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, Rootly: Curious how companies like Elastic, Tripadvisor, and 100s of others leverage Rootly to manage incidents in Slack and unlock instant best practices? Check out this lightning demo: https:\/\/www.loom.com\/share\/051c4be0425a436e888dc0c3690855ad Articles Amazon Prime Video\u2019s Microservices Move Doesn\u2019t Lead to a Monolith after All\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":320,"url":"https:\/\/fde.cat\/index.php\/2021\/08\/31\/sre-weekly-issue-275\/","url_meta":{"origin":794,"position":1},"title":"SRE Weekly Issue #275","date":"August 31, 2021","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, StackHawk: Join ZAP Founder & Project Lead Simon Bennetts on June 30 for a live AMA where he will be answering questions on all things open source and AppSec. Register: http:\/\/sthwk.com\/Simon-AMA Articles Practical Guide to SRE: Incident Severity Levels Here\u2019s a take\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":797,"url":"https:\/\/fde.cat\/index.php\/2023\/11\/27\/sre-weekly-issue-400\/","url_meta":{"origin":794,"position":2},"title":"SRE Weekly Issue #400","date":"November 27, 2023","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, FireHydrant: How is FireHydrant building its alerting tool, Signals, to be robust, lightning-fast, and configurable to how YOU work? In this edition, of their Captain\u2019s Log, they dive into CEL and how they\u2019re using it to handle routing and logic. https:\/\/firehydrant.com\/blog\/captains-log-how-were-leveraging-cel\/ A\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":543,"url":"https:\/\/fde.cat\/index.php\/2022\/02\/21\/sre-weekly-issue-310\/","url_meta":{"origin":794,"position":3},"title":"SRE Weekly Issue #310","date":"February 21, 2022","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly \ud83d\ude92. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging the right team, postmortem timeline, setting up reminders, and more. Book a demo (+ get a snazzy Rootly shirt): https:\/\/rootly.com\/demo\/?utm_source=sreweekly Articles\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":545,"url":"https:\/\/fde.cat\/index.php\/2022\/02\/28\/sre-weekly-issue-311\/","url_meta":{"origin":794,"position":4},"title":"SRE Weekly Issue #311","date":"February 28, 2022","format":false,"excerpt":"View on sreweekly.com I\u2019m dedicating this issue to the people of Ukraine, and also those in Russia that are protesting the invasion. A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly \ud83d\ude92. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging the right\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":885,"url":"https:\/\/fde.cat\/index.php\/2024\/06\/24\/sre-weekly-issue-430\/","url_meta":{"origin":794,"position":5},"title":"SRE Weekly Issue #430","date":"June 24, 2024","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, FireHydrant: We\u2019ve gone all out on our new integration with Microsoft Teams. If you\u2019re a MS Teams user, FireHydrant now supports the most comprehensive integration for incident management. Run the entire IM process without ever leaving the chat. https:\/\/firehydrant.com\/blog\/introducing-a-brand-new-microsoft-teams-integration\/ r\/sre: Senior SRE\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/794","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/comments?post=794"}],"version-history":[{"count":0,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/794\/revisions"}],"wp:attachment":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/media?parent=794"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/categories?post=794"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/tags?post=794"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}