{"id":611,"date":"2022-07-25T01:10:03","date_gmt":"2022-07-25T01:10:03","guid":{"rendered":"https:\/\/fde.cat\/index.php\/2022\/07\/25\/sre-weekly-issue-331\/"},"modified":"2022-07-25T01:10:03","modified_gmt":"2022-07-25T01:10:03","slug":"sre-weekly-issue-331","status":"publish","type":"post","link":"https:\/\/fde.cat\/index.php\/2022\/07\/25\/sre-weekly-issue-331\/","title":{"rendered":"SRE Weekly Issue #331"},"content":{"rendered":"<p><a href=\"https:\/\/sreweekly.com\/sre-weekly-issue-331\/\" title=\"Permalink to SRE Weekly Issue #331\" class=\"email_only\">View on sreweekly.com<\/a><\/p>\n<div class=\"sreweekly-sponsor-message\">\n<h2>A message from our sponsor, <a href=\"https:\/\/rootly.com\/demo\/?utm_source=sreweekly\">Rootly<\/a>:<\/h2>\n<p>Manage incidents directly from Slack with Rootly \ud83d\ude92. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging and adding responders, postmortem timeline, setting up reminders, and more. Book a demo (+ get a snazzy Rootly lego set):<br \/>\n<a href=\"https:\/\/rootly.com\/demo\/\">https:\/\/rootly.com\/demo\/<\/a><\/p>\n<\/div>\n<h2>Articles<\/h2>\n<div class=\"wp-block-group\">\n<div class=\"wp-block-group__inner-container\">\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"http:\/\/disastercast.co.uk\/wp\/\" target=\"_blank\" rel=\"noopener\">DisasterCast \u2013 A podcast about scary things and how to stop them happening<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>I\u2019ve been listening to this podcast this week and I love it!  Each episode covers a disaster, safety theory, and other topics \u2014 with no ads.  Their site is down right now, but the podcast is available on the usual platforms.<\/p>\n<p>\u00a0\u00a0Drew Rae \u2014 DisasterCast<\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/betterprogramming.pub\/the-path-of-getting-comfortable-in-production-c88456146f00\" target=\"_blank\" rel=\"noopener\">An 8 Step Guide to Go From a Clueless to a Production-aware Software Engineer<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>If we want to get folks to own their code in production, we need to teach them how to think like an SRE.<\/p>\n<p>\u00a0\u00a0Boris Cherkasky<\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/firehydrant.com\/blog\/3-mistakes-ive-made-at-the-beginning-of-an-incident-and-how-not-to-make-them\/\" target=\"_blank\" rel=\"noopener\">3 mistakes I\u2019ve made at the beginning of an incident (and how not to make them)<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Let\u2019s look at three mistakes I\u2019ve made during those stressful moments during the beginning of an incident \u2014 and discuss how you can avoid making them.<\/p>\n<p>The mistakes are:<\/p>\n<p>Mistake 1: We didn\u2019t have a plan.<br \/>\nMistake 2: We weren\u2019t production ready.<br \/>\nMistake 3: We fell down a cognitive tunnel. <\/p>\n<p>\u00a0\u00a0Robert Ross \u2014 FireHydrant<\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/blixhavn.dev\/when-to-kill-the-canary\/\" target=\"_blank\" rel=\"noopener\">When to kill the canary<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>At what point does your canary test indicate failure?  Should the criteria be the same as your normal production alerting?<\/p>\n<p>\u00a0\u00a0\u00d8ystein Blixhavn<\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/www.honeycomb.io\/blog\/counting-alerts\/\" target=\"_blank\" rel=\"noopener\">On Counting Alerts<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>This is a followup to a <a href=\"https:\/\/www.honeycomb.io\/blog\/tracking-on-call-health\/\">previous article<\/a> about on-call health. In this one, the author shares metrics about the number of alerts and discusses whether this number is useful.<\/p>\n<p>\u00a0\u00a0Fred Hebert \u2014 Honeycomb<\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/engineering.razorpay.com\/high-availability-on-razorpay-payments-dashboard-c4a09f66aa61?source=rss----6407ad2e59af---4\" target=\"_blank\" rel=\"noopener\">High Availability on Razorpay Payments Dashboard<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Their dashboard crashed for 50% of user sessions, so they had a lot of work ahead of them. Find out how they got crash-free sessions to 99.9% and improved their time to respond to incidents.<\/p>\n<p>\u00a0\u00a0Sandesh Damkondwar \u2014 Razorpay<\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/twitter.com\/atoonk\/status\/1550896347691134977?s=21&amp;t=esTnMEIkcgxlQCEuUb2RSg\" target=\"_blank\" rel=\"noopener\">@atoonk on Twitter summarizing the Rogers Communications outage<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Rogers Communications, a major telecom in Canada, had a country-wide outage earlier this month.  I don\u2019t normally include telecom outages in the Outages section because they rarely share information that we can learn from.<\/p>\n<p>This time, Rogers released a (redacted) report on their outage, and this Twitter thread summarizes the key points.<\/p>\n<p>\u00a0\u00a0@atoonk on Twitter<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<h2>Outages<\/h2>\n<p><a href=\"https:\/\/www.straitstimes.com\/tech\/tech-news\/global-users-of-microsoft-teams-hit-by-outage-since-9am-office-365-also-affected\">Microsoft Teams and Office 365<\/a><br \/>\n<a href=\"https:\/\/www.theregister.com\/2022\/07\/21\/teams_outage\/\">Microsoft blames storage error for Teams outage<\/a><br \/>\n<a href=\"https:\/\/status.cloud.google.com\/incidents\/vLsxuKoRvykNHW3nnhsJ\">Google Cloud Storage<\/a><br \/>\n<a href=\"https:\/\/status.cloud.google.com\/incidents\/XVq5om2XEDSqLtJZUvcH\">Google Cloud europe-west2 region<\/a><\/p>\n<p>Preliminary root cause has been identified as multiple concurrent failures to our redundant cooling systems within one of the buildings that hosts the europe-west2-a zone for the europe-west2 region.<\/p>\n<p>SRE WEEKLY<\/p>","protected":false},"excerpt":{"rendered":"<p>View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly \ud83d\ude92. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging and adding responders, postmortem timeline, setting up reminders, and more. Book a demo (+ get a snazzy Rootly lego set): https:\/\/rootly.com\/demo\/ Articles DisasterCast \u2013 A podcast&hellip; <a class=\"more-link\" href=\"https:\/\/fde.cat\/index.php\/2022\/07\/25\/sre-weekly-issue-331\/\">Continue reading <span class=\"screen-reader-text\">SRE Weekly Issue #331<\/span><\/a><\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","footnotes":""},"categories":[8],"tags":[],"class_list":["post-611","post","type-post","status-publish","format-standard","hentry","category-sre","entry"],"jetpack_featured_media_url":"","jetpack-related-posts":[{"id":556,"url":"https:\/\/fde.cat\/index.php\/2022\/03\/21\/sre-weekly-issue-314\/","url_meta":{"origin":611,"position":0},"title":"SRE Weekly Issue #314","date":"March 21, 2022","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly \ud83d\ude92. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging and adding responders, postmortem timeline, setting up reminders, and more. Book a demo (+ get a snazzy Rootly lego set): https:\/\/rootly.com\/demo\/\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":543,"url":"https:\/\/fde.cat\/index.php\/2022\/02\/21\/sre-weekly-issue-310\/","url_meta":{"origin":611,"position":1},"title":"SRE Weekly Issue #310","date":"February 21, 2022","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly \ud83d\ude92. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging the right team, postmortem timeline, setting up reminders, and more. Book a demo (+ get a snazzy Rootly shirt): https:\/\/rootly.com\/demo\/?utm_source=sreweekly Articles\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":659,"url":"https:\/\/fde.cat\/index.php\/2022\/12\/05\/sre-weekly-issue-350\/","url_meta":{"origin":611,"position":2},"title":"SRE Weekly Issue #350","date":"December 5, 2022","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly\u00a0\ud83d\ude92. Rootly automates manual tasks like creating an incident channel, Jira ticket and Zoom rooms, inviting responders, creating statuspage updates, postmortem timelines and more. Want to see why companies like Canva and Grammarly love us?:\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":579,"url":"https:\/\/fde.cat\/index.php\/2022\/05\/30\/sre-weekly-issue-324\/","url_meta":{"origin":611,"position":3},"title":"SRE Weekly Issue #324","date":"May 30, 2022","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly \ud83d\ude92. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging and adding responders, postmortem timeline, setting up reminders, and more. Book a demo (+ get a snazzy Rootly lego set): https:\/\/rootly.com\/demo\/\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":535,"url":"https:\/\/fde.cat\/index.php\/2022\/01\/24\/sre-weekly-issue-306\/","url_meta":{"origin":611,"position":4},"title":"SRE Weekly Issue #306","date":"January 24, 2022","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly \ud83d\ude92. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging the right team, postmortem timeline, setting up reminders, and more. Book a demo (+ get a snazzy Rootly shirt): https:\/\/rootly.com\/demo\/?utm_source=sreweekly Articles\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":537,"url":"https:\/\/fde.cat\/index.php\/2022\/01\/31\/sre-weekly-issue-307\/","url_meta":{"origin":611,"position":5},"title":"SRE Weekly Issue #307","date":"January 31, 2022","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly \ud83d\ude92. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging the right team, postmortem timeline, setting up reminders, and more. Book a demo (+ get a snazzy Rootly shirt): https:\/\/rootly.com\/demo\/?utm_source=sreweekly Articles\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/611","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/comments?post=611"}],"version-history":[{"count":0,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/611\/revisions"}],"wp:attachment":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/media?parent=611"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/categories?post=611"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/tags?post=611"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}