{"id":500,"date":"2021-11-08T01:31:52","date_gmt":"2021-11-08T01:31:52","guid":{"rendered":"https:\/\/fde.cat\/index.php\/2021\/11\/08\/sre-weekly-issue-295\/"},"modified":"2021-11-08T01:31:52","modified_gmt":"2021-11-08T01:31:52","slug":"sre-weekly-issue-295","status":"publish","type":"post","link":"https:\/\/fde.cat\/index.php\/2021\/11\/08\/sre-weekly-issue-295\/","title":{"rendered":"SRE Weekly Issue #295"},"content":{"rendered":"<p><a href=\"https:\/\/sreweekly.com\/sre-weekly-issue-295\/\" title=\"Permalink to SRE Weekly Issue #295\" class=\"email_only\">View on sreweekly.com<\/a><\/p>\n<div class=\"sreweekly-sponsor-message\">\n<h2>A message from our sponsor, <a href=\"https:\/\/rootly.com\/?utm_source=sreweekly\">Rootly<\/a>:<\/h2>\n<p>Manage incidents directly from Slack with Rootly \ud83d\ude92. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging the right team, postmortem timeline, setting up reminders, and more. Book a demo:<br \/>\n<a href=\"https:\/\/rootly.com\/?utm_source=sreweekly\">https:\/\/rootly.com\/?utm_source=sreweekly<\/a><\/p>\n<\/div>\n<h2>Articles<\/h2>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/www.verica.io\/blog\/mttr-is-a-misleading-metric-now-what\/\">MTTR is a Misleading Metric\u2014Now What?<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>I love this crystal clear argument based on statistics and research. MTTR as a metric is simply meaningless.<\/p>\n<p>Courtney Nash \u2014 Verica<\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/incident.io\/blog\/five-steps-to-better-customer-comms\">Five steps to better customer communication<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Their steps for better communication during an outage:<\/p>\n<p>Provide context to minimise speculation<br \/>\nExplain what you\u2019re doing to demonstrate you\u2019re \u2018on it\u2019<br \/>\nSet some expectations for when things will return to normal<br \/>\nTell people what they should do0<br \/>\nLet folks know when you\u2019ll be updating them next<\/p>\n<p>Chris Evans \u2014 incident.io<\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/status.heroku.com\/incidents\/2365\">Heroku Incident 2365 Follow-Up<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Despite checking in advance to be sure their systems would support the new Let\u2019s Encrypt certificate chain, they ran into trouble.<\/p>\n<p>[\u2026] we discovered that several HTTP client libraries our systems use were using their own vendored root certificates.<\/p>\n<p>Heroku<\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/cloudpundit.com\/2021\/10\/14\/multicloud-failover-is-almost-always-a-terrible-idea\/\">Multicloud failover is almost always a terrible idea<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>This is the best case I\u2019ve seen yet against multi-cloud infrastructure. I really like the airline analogy.<\/p>\n<p>Lydia Leong<\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/blog.roblox.com\/2021\/10\/update-recent-service-outage\/\">An Update on Our Outage \u2013 Roblox<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Roblox had a major, several-day outage starting on October 28. I don\u2019t usually include game outages in the Outages section since they\u2019re so common and there\u2019s not usually much information to learn from, I sure do like a good post-incident report. Thanks, folks!<\/p>\n<p>David Baszucki \u2014 Roblox<\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/vorner.github.io\/2020\/11\/06\/40-ms-bug.html\">40 Ms Bug<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>When you\u2019re sending small TCP packets, two optimizations can conspire to introduce an artificial 40 millisecond (not megasecond\u2026) delay.<\/p>\n<p>Vorner<\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/www.google.com\/appsstatus\/dashboard\/incidents\/k71P8nHp32hgcMSsC3mR\">Google Incident report \u2014 Meet<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>_Here\u2019s Google\u2019s follow-up report for their October 25-26 Meet outage.<\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/www.reddit.com\/r\/sre\/comments\/qkj8xd\/how_to_deal_with_retries_in_slis\/\">\/r\/sre \u2014 How to deal with retries in SLIs<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Should you count failed requests toward your SLI if the client retries and succeeds? A good argument can be made on either side.<\/p>\n<p>u\/Sufficient_Tree4275 and other Reddit users<\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/engineering.mercari.com\/en\/blog\/entry\/20210129-embedded-sre\/\">What the SRE team wants to achieve with the development team<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Mercari restructured its SRE team, moving toward an embedded model to adapt to their growing microservice architecture.<\/p>\n<p>ShibuyaMitsuhiro \u2014 Mercari<\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/podcast.thevoid.community\/1793843\/9471681-episode-1-honeycomb-and-the-kafka-migration\">Episode 1: Honeycomb and the Kafka Migration \u2013 The VOID<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>There\u2019s a really great discussion in this episode about leaving slack in the system in the form of bits of capacity and inefficiency that can be drawn upon to buy time during an outage.<\/p>\n<p>Courtney Nash, with guests Liz Fong-Jones and Fred Hebert \u2014 Verica<\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/www.transposit.com\/blog\/why-a-reliability-mindset-must-be-adopted-beyond-sre\/\">Why a \u2018Reliability Mindset\u2019 Must Be Adopted Beyond SRE<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Here\u2019s how non-SREs can use SRE principles to improve their systems.<\/p>\n<p>Laurel Frazier \u2014 Transposit<\/p>\n<\/div>\n<\/div>\n<h2>Outages<\/h2>\n<p><a href=\"https:\/\/7news.com.au\/technology\/facebook-messenger-and-instagram-down-metas-suite-of-social-media-products-facing-network-outages-c-4425965\">Facebook, Messenger and Instagram<\/a><\/p>\n<p>Or Meta or whatever.<\/p>\n<p><a href=\"https:\/\/9to5google.com\/2021\/11\/03\/google-nest-outage-offline\/\">Google Nest<\/a><br \/>\nSRE WEEKLY<\/p>","protected":false},"excerpt":{"rendered":"<p>View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly \ud83d\ude92. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging the right team, postmortem timeline, setting up reminders, and more. Book a demo: https:\/\/rootly.com\/?utm_source=sreweekly Articles MTTR is a Misleading Metric\u2014Now What? I love this crystal clear&hellip; <a class=\"more-link\" href=\"https:\/\/fde.cat\/index.php\/2021\/11\/08\/sre-weekly-issue-295\/\">Continue reading <span class=\"screen-reader-text\">SRE Weekly Issue #295<\/span><\/a><\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","footnotes":""},"categories":[8],"tags":[],"class_list":["post-500","post","type-post","status-publish","format-standard","hentry","category-sre","entry"],"jetpack_featured_media_url":"","jetpack-related-posts":[{"id":640,"url":"https:\/\/fde.cat\/index.php\/2022\/10\/17\/sre-weekly-issue-343\/","url_meta":{"origin":500,"position":0},"title":"SRE Weekly Issue #343","date":"October 17, 2022","format":false,"excerpt":"View on sreweekly.com Bit of a short one this week as I recover from my third bout of COVID. Fortunately, this is another relatively mild one (thank you, vaccine!). Good luck everyone, and get your boosters. A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly\u00a0\ud83d\ude92. Rootly\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":616,"url":"https:\/\/fde.cat\/index.php\/2022\/08\/01\/sre-weekly-issue-332\/","url_meta":{"origin":500,"position":1},"title":"SRE Weekly Issue #332","date":"August 1, 2022","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly \ud83d\ude92. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging and adding responders, postmortem timeline, setting up reminders, and more. Book a demo (+ get a snazzy Rootly lego set): https:\/\/rootly.com\/demo\/\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":578,"url":"https:\/\/fde.cat\/index.php\/2022\/05\/23\/sre-weekly-issue-323\/","url_meta":{"origin":500,"position":2},"title":"SRE Weekly Issue #323","date":"May 23, 2022","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly \ud83d\ude92. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging and adding responders, postmortem timeline, setting up reminders, and more. Book a demo (+ get a snazzy Rootly lego set): https:\/\/rootly.com\/demo\/\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":711,"url":"https:\/\/fde.cat\/index.php\/2023\/05\/08\/sre-weekly-issue-371\/","url_meta":{"origin":500,"position":3},"title":"SRE Weekly Issue #371","date":"May 8, 2023","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, Rootly: Rootly is hiring for a Sr. Developer Relations Advocate to continue helping more world-class companies like Figma, NVIDIA, Squarespace, accelerate their incident management journey. Looking for previous on-call engineers with a passion for making the world a more reliable place.\u00a0 Learn\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":557,"url":"https:\/\/fde.cat\/index.php\/2022\/03\/28\/sre-weekly-issue-315\/","url_meta":{"origin":500,"position":4},"title":"SRE Weekly Issue #315","date":"March 28, 2022","format":false,"excerpt":"View on sreweekly.com I\u2019m going on vacation, so I\u2019m going to prepare next week\u2019s issue in advance. It\u2019ll look much like most issues, except there won\u2019t be an Outages section. See you all in two weeks! A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly \ud83d\ude92.\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":579,"url":"https:\/\/fde.cat\/index.php\/2022\/05\/30\/sre-weekly-issue-324\/","url_meta":{"origin":500,"position":5},"title":"SRE Weekly Issue #324","date":"May 30, 2022","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly \ud83d\ude92. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging and adding responders, postmortem timeline, setting up reminders, and more. Book a demo (+ get a snazzy Rootly lego set): https:\/\/rootly.com\/demo\/\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/500","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/comments?post=500"}],"version-history":[{"count":0,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/500\/revisions"}],"wp:attachment":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/media?parent=500"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/categories?post=500"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/tags?post=500"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}