{"id":616,"date":"2022-08-01T02:28:34","date_gmt":"2022-08-01T02:28:34","guid":{"rendered":"https:\/\/fde.cat\/index.php\/2022\/08\/01\/sre-weekly-issue-332\/"},"modified":"2022-08-01T02:28:34","modified_gmt":"2022-08-01T02:28:34","slug":"sre-weekly-issue-332","status":"publish","type":"post","link":"https:\/\/fde.cat\/index.php\/2022\/08\/01\/sre-weekly-issue-332\/","title":{"rendered":"SRE Weekly Issue #332"},"content":{"rendered":"<p><a href=\"https:\/\/sreweekly.com\/sre-weekly-issue-332\/\" title=\"Permalink to SRE Weekly Issue #332\" class=\"email_only\">View on sreweekly.com<\/a><\/p>\n<div class=\"sreweekly-sponsor-message\">\n<h2>A message from our sponsor, <a href=\"https:\/\/rootly.com\/demo\/?utm_source=sreweekly\">Rootly<\/a>:<\/h2>\n<p>Manage incidents directly from Slack with Rootly \ud83d\ude92. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging and adding responders, postmortem timeline, setting up reminders, and more. Book a demo (+ get a snazzy Rootly lego set):<br \/>\n<a href=\"https:\/\/rootly.com\/demo\/\">https:\/\/rootly.com\/demo\/<\/a><\/p>\n<\/div>\n<h2>Articles<\/h2>\n<div class=\"wp-block-group\">\n<div class=\"wp-block-group__inner-container\">\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/engineering.razorpay.com\/how-razorpays-notification-service-handles-increasing-load-f787623a490f\" target=\"_blank\" rel=\"noopener\">How Razorpay\u2019s Notification Service Handles Increasing Load<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Their notification service had complex load characteristics that made scaling up a tricky proposition.<\/p>\n<p>\u00a0\u00a0Anand Prakash \u2014 Razorpay<\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/about.gitlab.com\/blog\/2022\/07\/19\/reducing-pager-fatigue-and-improving-on-call-life\/\" target=\"_blank\" rel=\"noopener\">How we improved on-call life by reducing pager noise<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Coalescing alerts and adding dependencies in AlertManager were the key to reducing this team\u2019s excessive pager load.<\/p>\n<p>\u00a0\u00a0steveazz \u2014 GitLab<\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/surfingcomplexity.blog\/2022\/07\/23\/whats-allowed-to-count-as-a-cause-alerrt-edition\/\" target=\"_blank\" rel=\"noopener\">What\u2019s allowed to count as a cause: ALERRT edition<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Lorin Hochstein has started a <a href=\"https:\/\/surfingcomplexity.blog\/2022\/07\/23\/uvalde\/\">series<\/a> of blog posts on what we can learn about incident response from the Uvalde school shooting tragedy in the US.  This article looks at how an organization\u2019s perspective can affect their retrospective incident analysis.<\/p>\n<p>\u00a0\u00a0Lorin Hochstein<\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/surfingcomplexity.blog\/2022\/07\/23\/the-fog-of-war-in-uvalde\/\" target=\"_blank\" rel=\"noopener\">The fog of war in Uvalde<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>My claim here is that we should assume the officer is telling the truth and was acting reasonably if we want to understand how these types of failure modes can happen.<\/p>\n<p>Every retrospective ever:<\/p>\n<p>We must assume that a person can act reasonably and still come to the wrong conclusion in order to make progress.<\/p>\n<p>\u00a0\u00a0Lorin Hochstein<\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/jakub-m.github.io\/2022\/07\/17\/laport-clocks-formal.html\" target=\"_blank\" rel=\"noopener\">User settings, Lamport clocks and lightweight formal methods<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>How do you synchronize state between multiple browsers and a backend, and ensure that everyone\u2019s state will eventually converge?  These folks explain how they did it, and a bug they found through testing.<\/p>\n<p>\u00a0\u00a0Jakub Mikians \u2014 Airspace Intelligence<\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/blog.danslimmon.com\/2014\/09\/03\/mttr-lower-isnt-always-better\/\" target=\"_blank\" rel=\"noopener\">MTTR: lower isn\u2019t always better<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>MTTR is a mean, so it doesn\u2019t tell you anything about the <em>number<\/em> of incidents, among other potential pitfalls.<\/p>\n<p>\u00a0\u00a0Dan Slimmon<\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/status.cloud.google.com\/incidents\/fmEL9i2fArADKawkZAa2\" target=\"_blank\" rel=\"noopener\">Google Cloud Platform outage report: europe-west2 cooling failure<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Last week, I included a GCP outage in europe-west2.  This week, Google posted this report about what went wrong, and it\u2019s got layers.<\/p>\n<p>Bonus: <a href=\"https:\/\/status.cloud.google.com\/incidents\/vLsxuKoRvykNHW3nnhsJ\">another GCP outage report<\/a><\/p>\n<p>\u00a0\u00a0Google<\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/engineering.fb.com\/2022\/07\/25\/production-engineering\/its-time-to-leave-the-leap-second-in-the-past\/\" target=\"_blank\" rel=\"noopener\">It\u2019s time to leave the leap second in the past<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Meta wants to do away with leap seconds, because they make it especially difficult to create reliable systems.<\/p>\n<p>\u00a0\u00a0Oleg Obleukhov and Ahmad Byagowi \u2014 Meta<\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/incident.io\/blog\/pitfalls-of-post-mortems\" target=\"_blank\" rel=\"noopener\">3 common pitfalls of post-mortems<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>If you\u2019re anywhere near incident analysis in your organization, you need to read this list.<\/p>\n<p>\u00a0\u00a0Milly Leadley \u2014 incident.io<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<h2>Outages<\/h2>\n<p><a href=\"https:\/\/www.stackstatus.net\/incidents\/8d533a56-bacb-48e3-baa5-8266d93f08c3\">Stack Exchange<\/a><br \/>\n<a href=\"https:\/\/www.bleepingcomputer.com\/news\/microsoft\/microsoft-365-outage-knocks-down-admin-center-in-north-america\/\">Microsoft 365<\/a><br \/>\n<a href=\"https:\/\/status.slack.com\/\/2022-07\/1387a8fab7d2e2ec\">Slack<\/a><br \/>\n<a href=\"https:\/\/trello.status.atlassian.com\/incidents\/z0b0ljxlqktv\">Trello<\/a><br \/>\nSRE WEEKLY<\/p>","protected":false},"excerpt":{"rendered":"<p>View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly \ud83d\ude92. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging and adding responders, postmortem timeline, setting up reminders, and more. Book a demo (+ get a snazzy Rootly lego set): https:\/\/rootly.com\/demo\/ Articles How Razorpay\u2019s Notification Service&hellip; <a class=\"more-link\" href=\"https:\/\/fde.cat\/index.php\/2022\/08\/01\/sre-weekly-issue-332\/\">Continue reading <span class=\"screen-reader-text\">SRE Weekly Issue #332<\/span><\/a><\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","footnotes":""},"categories":[8],"tags":[],"class_list":["post-616","post","type-post","status-publish","format-standard","hentry","category-sre","entry"],"jetpack_featured_media_url":"","jetpack-related-posts":[{"id":519,"url":"https:\/\/fde.cat\/index.php\/2021\/12\/20\/sre-weekly-issue-301\/","url_meta":{"origin":616,"position":0},"title":"SRE Weekly Issue #301","date":"December 20, 2021","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly \ud83d\ude92. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging the right team, postmortem timeline, setting up reminders, and more. Book a demo: https:\/\/rootly.com\/demo\/?utm_source=sreweekly Articles BadgerDAO Exploit Technical Post Mortem This\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":543,"url":"https:\/\/fde.cat\/index.php\/2022\/02\/21\/sre-weekly-issue-310\/","url_meta":{"origin":616,"position":1},"title":"SRE Weekly Issue #310","date":"February 21, 2022","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly \ud83d\ude92. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging the right team, postmortem timeline, setting up reminders, and more. Book a demo (+ get a snazzy Rootly shirt): https:\/\/rootly.com\/demo\/?utm_source=sreweekly Articles\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":579,"url":"https:\/\/fde.cat\/index.php\/2022\/05\/30\/sre-weekly-issue-324\/","url_meta":{"origin":616,"position":2},"title":"SRE Weekly Issue #324","date":"May 30, 2022","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly \ud83d\ude92. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging and adding responders, postmortem timeline, setting up reminders, and more. Book a demo (+ get a snazzy Rootly lego set): https:\/\/rootly.com\/demo\/\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":535,"url":"https:\/\/fde.cat\/index.php\/2022\/01\/24\/sre-weekly-issue-306\/","url_meta":{"origin":616,"position":3},"title":"SRE Weekly Issue #306","date":"January 24, 2022","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly \ud83d\ude92. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging the right team, postmortem timeline, setting up reminders, and more. Book a demo (+ get a snazzy Rootly shirt): https:\/\/rootly.com\/demo\/?utm_source=sreweekly Articles\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":537,"url":"https:\/\/fde.cat\/index.php\/2022\/01\/31\/sre-weekly-issue-307\/","url_meta":{"origin":616,"position":4},"title":"SRE Weekly Issue #307","date":"January 31, 2022","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly \ud83d\ude92. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging the right team, postmortem timeline, setting up reminders, and more. Book a demo (+ get a snazzy Rootly shirt): https:\/\/rootly.com\/demo\/?utm_source=sreweekly Articles\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":546,"url":"https:\/\/fde.cat\/index.php\/2022\/03\/07\/sre-weekly-issue-312\/","url_meta":{"origin":616,"position":5},"title":"SRE Weekly Issue #312","date":"March 7, 2022","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly \ud83d\ude92. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging the right team, postmortem timeline, setting up reminders, and more. Book a demo (+ get a snazzy Rootly shirt): https:\/\/rootly.com\/demo\/?utm_source=sreweekly Articles\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/616","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/comments?post=616"}],"version-history":[{"count":0,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/616\/revisions"}],"wp:attachment":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/media?parent=616"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/categories?post=616"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/tags?post=616"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}