{"id":734,"date":"2023-07-17T01:04:12","date_gmt":"2023-07-17T01:04:12","guid":{"rendered":"https:\/\/fde.cat\/index.php\/2023\/07\/17\/sre-weekly-issue-381\/"},"modified":"2023-07-17T01:04:12","modified_gmt":"2023-07-17T01:04:12","slug":"sre-weekly-issue-381","status":"publish","type":"post","link":"https:\/\/fde.cat\/index.php\/2023\/07\/17\/sre-weekly-issue-381\/","title":{"rendered":"SRE Weekly Issue #381"},"content":{"rendered":"<p><a href=\"https:\/\/sreweekly.com\/sre-weekly-issue-381\/\" title=\"Permalink to SRE Weekly Issue #381\" class=\"email_only\">View on sreweekly.com<\/a><\/p>\n<div class=\"sreweekly-sponsor-message\">\n<h2>A message from our sponsor, <a href=\"https:\/\/rootly.com\/demo\/?utm_source=sreweekly\">Rootly<\/a>:<\/h2>\n<p>Curious how companies like Elastic, Tripadvisor, and 100s of others leverage Rootly to manage incidents in Slack and unlock instant best practices?  Check out this lightning demo:<br \/>\n<a href=\"https:\/\/www.loom.com\/share\/051c4be0425a436e888dc0c3690855ad\">https:\/\/www.loom.com\/share\/051c4be0425a436e888dc0c3690855ad<\/a><\/p>\n<\/div>\n<h2>Articles<\/h2>\n<div class=\"wp-block-group\">\n<div class=\"wp-block-group__inner-container\">\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/dev.to\/karelvandenbussche\/the-pyramid-of-alerting-1g48\" target=\"_blank\" rel=\"noopener\">The Pyramid of Alerting<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>The Pyramid introduced in this article is three levels of monitoring: Operational, Data Validation, and Business Assumptions.  These roughly correspond to questions like: is the system up?  Is the right amount of data flowing through it?  Is that data <em>correct<\/em>?<\/p>\n<p>\u00a0\u00a0<small>Karel Vanden Bussche \u2014 DEV<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/gitlab.com\/gitlab-com\/gl-infra\/production\/-\/issues\/15999\" target=\"_blank\" rel=\"noopener\">Incident Review for Site-wide Outage for GitLab.com \u2013 Stale Terraform Pipeline #15997 (#15999) \u00b7 Issues \u00b7 GitLab.com \/ GitLab Infrastructure Team \/ production<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Extremely powerful tools can become extremely powerful footguns, for example Terraform.<\/p>\n<p>\u00a0\u00a0<small>Dave Smith \u2014 GitLab<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/igor.io\/latency\/\" target=\"_blank\" rel=\"noopener\">latency: a primer<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Sure, you know what latency is, but do you really know what a percentile is?  A histogram?  A heatmap?<\/p>\n<p>\u00a0\u00a0<small>igor<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/dzone.com\/articles\/cdn-observability-why-you-must-monitor-your-extend\" target=\"_blank\" rel=\"noopener\">CDN Observability<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>If you\u2019re using a CDN, you need to keep an eye on it.  Here\u2019s a primer on what to watch for.<\/p>\n<p>\u00a0\u00a0<small> Or Hillel \u2014 DZone<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/www.codereliant.io\/principles-of-reliable-software-design-part-1\/\" target=\"_blank\" rel=\"noopener\">Principles of Reliable Software Design<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>This article series covers 12 aspects important in the design of reliable systems.  Some of the aspects, such as modularity, loose coupling, graceful degradation, and redundancy, are covered in depth.<\/p>\n<p>\u00a0\u00a0<small>Code Reliant<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/github.blog\/2023-07-12-github-availability-report-june-2023\/\" target=\"_blank\" rel=\"noopener\">GitHub Availability Report: June 2023<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>A couple weeks back, GitHub was hard down, even including its status page at times.  This report goes into that in detail, and the cause is pretty interesting.<\/p>\n<p>\u00a0\u00a0<small>Jakub Oleksy \u2013 GitHub<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/blog.alexewerlof.com\/p\/failover\" target=\"_blank\" rel=\"noopener\">Failover<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>An in-depth look at different kinds of failover, including each kind\u2019s methodology and purposes.<\/p>\n<p>\u00a0\u00a0<small>Alex Ewerl\u00f6f<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/admiralcloudberg.medium.com\/finding-fault-the-crash-of-korean-air-cargo-flight-8509-36a4c1b7b58e\" target=\"_blank\" rel=\"noopener\">Finding Fault: The crash of Korean Air Cargo flight 8509<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>This one is especially interesting for the controversial and baseless conclusions popularized in the media about a supposed cause rooted in Korean culture.  It\u2019s a good reminder that we need to be careful to ensure the validity of the lessons we learn from incidents.<\/p>\n<p>\u00a0\u00a0<small>Admiral Cloudberg<\/small><\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p>SRE WEEKLY<\/p>","protected":false},"excerpt":{"rendered":"<p>View on sreweekly.com A message from our sponsor, Rootly: Curious how companies like Elastic, Tripadvisor, and 100s of others leverage Rootly to manage incidents in Slack and unlock instant best practices? Check out this lightning demo: https:\/\/www.loom.com\/share\/051c4be0425a436e888dc0c3690855ad Articles The Pyramid of Alerting The Pyramid introduced in this article is three levels of monitoring: Operational, Data&hellip; <a class=\"more-link\" href=\"https:\/\/fde.cat\/index.php\/2023\/07\/17\/sre-weekly-issue-381\/\">Continue reading <span class=\"screen-reader-text\">SRE Weekly Issue #381<\/span><\/a><\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","footnotes":""},"categories":[8],"tags":[],"class_list":["post-734","post","type-post","status-publish","format-standard","hentry","category-sre","entry"],"jetpack_featured_media_url":"","jetpack-related-posts":[{"id":739,"url":"https:\/\/fde.cat\/index.php\/2023\/07\/30\/sre-weekly-issue-383\/","url_meta":{"origin":734,"position":0},"title":"SRE Weekly Issue #383","date":"July 30, 2023","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, Rootly: Eliminate the anxiety around declaring an incident for nebulous problems by introducing a triage phase into your incident management process. Our latest blog posts dives into why the triage phase is so important, and how you can automate yours with Rootly.\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":708,"url":"https:\/\/fde.cat\/index.php\/2023\/05\/01\/sre-weekly-issue-370\/","url_meta":{"origin":734,"position":1},"title":"SRE Weekly Issue #370","date":"May 1, 2023","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly\u00a0\ud83d\ude92. Rootly automates manual tasks like creating an incident channel, Jira ticket and Zoom rooms, inviting responders, creating statuspage updates, postmortem timelines and more. Want to see why companies like Canva and Grammarly love us?:\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":543,"url":"https:\/\/fde.cat\/index.php\/2022\/02\/21\/sre-weekly-issue-310\/","url_meta":{"origin":734,"position":2},"title":"SRE Weekly Issue #310","date":"February 21, 2022","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly \ud83d\ude92. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging the right team, postmortem timeline, setting up reminders, and more. Book a demo (+ get a snazzy Rootly shirt): https:\/\/rootly.com\/demo\/?utm_source=sreweekly Articles\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":579,"url":"https:\/\/fde.cat\/index.php\/2022\/05\/30\/sre-weekly-issue-324\/","url_meta":{"origin":734,"position":3},"title":"SRE Weekly Issue #324","date":"May 30, 2022","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly \ud83d\ude92. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging and adding responders, postmortem timeline, setting up reminders, and more. Book a demo (+ get a snazzy Rootly lego set): https:\/\/rootly.com\/demo\/\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":535,"url":"https:\/\/fde.cat\/index.php\/2022\/01\/24\/sre-weekly-issue-306\/","url_meta":{"origin":734,"position":4},"title":"SRE Weekly Issue #306","date":"January 24, 2022","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly \ud83d\ude92. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging the right team, postmortem timeline, setting up reminders, and more. Book a demo (+ get a snazzy Rootly shirt): https:\/\/rootly.com\/demo\/?utm_source=sreweekly Articles\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":537,"url":"https:\/\/fde.cat\/index.php\/2022\/01\/31\/sre-weekly-issue-307\/","url_meta":{"origin":734,"position":5},"title":"SRE Weekly Issue #307","date":"January 31, 2022","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly \ud83d\ude92. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging the right team, postmortem timeline, setting up reminders, and more. Book a demo (+ get a snazzy Rootly shirt): https:\/\/rootly.com\/demo\/?utm_source=sreweekly Articles\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/734","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/comments?post=734"}],"version-history":[{"count":0,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/734\/revisions"}],"wp:attachment":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/media?parent=734"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/categories?post=734"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/tags?post=734"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}