{"id":861,"date":"2024-04-29T00:28:25","date_gmt":"2024-04-29T00:28:25","guid":{"rendered":"https:\/\/fde.cat\/index.php\/2024\/04\/29\/sre-weekly-issue-422\/"},"modified":"2024-04-29T00:28:25","modified_gmt":"2024-04-29T00:28:25","slug":"sre-weekly-issue-422","status":"publish","type":"post","link":"https:\/\/fde.cat\/index.php\/2024\/04\/29\/sre-weekly-issue-422\/","title":{"rendered":"SRE Weekly Issue #422"},"content":{"rendered":"<p><a href=\"https:\/\/sreweekly.com\/sre-weekly-issue-422\/\" title=\"Permalink to SRE Weekly Issue #422\" class=\"email_only\">View on sreweekly.com<\/a><\/p>\n<div class=\"sreweekly-sponsor-message\">\n<h2>A message from our sponsor, <a href=\"https:\/\/firehydrant.com\/\">FireHydrant<\/a>:<\/h2>\n<p>FireHydrant is now AI-powered for faster, smarter incidents! Power up your incidents with auto-generated real-time summaries, retrospectives, and status page updates.<br \/>\n<a href=\"https:\/\/firehydrant.com\/blog\/ai-for-incident-management-is-here\/\">https:\/\/firehydrant.com\/blog\/ai-for-incident-management-is-here\/<\/a><\/p>\n<\/div>\n<div class=\"wp-block-group is-layout-flow wp-block-group-is-layout-flow\">\n<div class=\"wp-block-group__inner-container\">\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/franciscomelojr.ca\/2023\/01\/09\/piosee-decision-model-as-troubleshooting-methodology\/\" target=\"_blank\" rel=\"noopener\">PIOSEE Decision Model and preparations for critical situations<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>The PIOSEE model is taught to pilots as a rubric for coming to a decision in a difficult aviation situation.  As this article explains, we can also use it during IT incidents. <\/p>\n<p>\u00a0\u00a0<small>Francisco Melo Jr.<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/srepath.substack.com\/p\/observability-cardinality-conundrum\" target=\"_blank\" rel=\"noopener\">Solving Observability\u2019s Cardinality Conundrum<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>What is high cardinality in monitoring systems?  Here\u2019s an excellent explanation that includes tips on how to manage cardinality.<\/p>\n<p>\u00a0\u00a0<small>Ash P \u2014 SREPath<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/devblog.xero.com\/building-a-customer-focused-observability-maturity-model-7b890aa11cb5\" target=\"_blank\" rel=\"noopener\">Building a customer-focused Observability Maturity Model<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>As Xero transitioned to a standard of \u201cyou build it you run it\u201d, suddenly more engineering teams were responsible for knowing about and implementing observability.  They designed this maturity model to help teams understand what they were aiming for and how to get there.<\/p>\n<p>\u00a0\u00a0<small>Andrew Macdonald \u2014 Xero<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/www.theverge.com\/c\/24070570\/internet-cables-undersea-deep-repair-ships\" target=\"_blank\" rel=\"noopener\">The invisible seafaring industry that keeps the internet afloat<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>With around 200 undersea fiber cuts worldwide per year, a fleet of ships is at the ready to pull up the cables and repair them.<\/p>\n<p>\u00a0\u00a0<small>Josh Dzieza \u2014 The Verge<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/blog.cloudflare.com\/major-data-center-power-failure-again-cloudflare-code-orange-tested\" target=\"_blank\" rel=\"noopener\">Major data center power failure (again): Cloudflare Code Orange tested<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Last year, Cloudflare suffered a control plane outage when one of their datacenters lost power.  They since did significant work to improve their resilience to power outages, and it was put to the test when the same datacenter lost power <em>again<\/em>.<\/p>\n<p>\u00a0\u00a0<small>    Matthew Prince, John Graham-Cumming, and Jeremy Hartman \u2014 Cloudflare<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/wetransfer.com\/engineering\/how-the-platform-team-became-effective-remote\/\" target=\"_blank\" rel=\"noopener\">How the Platform team became effective in working remotely<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Going from non-remote to remote was challenging but here\u2019s how our team changed as we began working from home<\/p>\n<p>\u00a0\u00a0<small>Stefan Mikolajczyk \u2014 WeTransfer<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/hross.substack.com\/p\/the-platform-empathy-gap\" target=\"_blank\" rel=\"noopener\">The Platform Empathy Gap<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Platform teams have a hugely important role to fill in the engineering organization. They are often the teams that enable other teams to move with more speed and safety. They can also quickly become disconnected from their customers.<\/p>\n<p>\u00a0\u00a0<small>Ross Brodbeck<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/blog.relyabilit.ie\/graceful-degradation-and-slos\/\" target=\"_blank\" rel=\"noopener\">Graceful Degradation and SLOs<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>When your system <em>successfully<\/em> serves a degraded response to the customer, how should you count that toward your SLO?  Is it success?  Failure?  Something in between?<\/p>\n<p>\u00a0\u00a0<small>Niall Murphy<\/small><\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p>SRE WEEKLY<\/p>","protected":false},"excerpt":{"rendered":"<p>View on sreweekly.com A message from our sponsor, FireHydrant: FireHydrant is now AI-powered for faster, smarter incidents! Power up your incidents with auto-generated real-time summaries, retrospectives, and status page updates. https:\/\/firehydrant.com\/blog\/ai-for-incident-management-is-here\/ PIOSEE Decision Model and preparations for critical situations The PIOSEE model is taught to pilots as a rubric for coming to a decision in&hellip; <a class=\"more-link\" href=\"https:\/\/fde.cat\/index.php\/2024\/04\/29\/sre-weekly-issue-422\/\">Continue reading <span class=\"screen-reader-text\">SRE Weekly Issue #422<\/span><\/a><\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","footnotes":""},"categories":[8],"tags":[],"class_list":["post-861","post","type-post","status-publish","format-standard","hentry","category-sre","entry"],"jetpack_featured_media_url":"","jetpack-related-posts":[{"id":663,"url":"https:\/\/fde.cat\/index.php\/2022\/12\/19\/sre-weekly-issue-352\/","url_meta":{"origin":861,"position":0},"title":"SRE Weekly Issue #352","date":"December 19, 2022","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly\u00a0\ud83d\ude92. Rootly automates manual tasks like creating an incident channel, Jira ticket and Zoom rooms, inviting responders, creating statuspage updates, postmortem timelines and more. Want to see why companies like Canva and Grammarly love us?:\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":855,"url":"https:\/\/fde.cat\/index.php\/2024\/04\/15\/sre-weekly-issue-420\/","url_meta":{"origin":861,"position":1},"title":"SRE Weekly Issue #420","date":"April 15, 2024","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, FireHydrant: FireHydrant is now AI-powered for faster, smarter incidents! Power up your incidents with auto-generated real-time summaries, retrospectives, and status page updates. https:\/\/firehydrant.com\/blog\/ai-for-incident-management-is-here\/ 1.0 Launch Retrospective The game Last Epoch launched in February, and they had a rocky start. This huge retrospective\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":844,"url":"https:\/\/fde.cat\/index.php\/2024\/03\/25\/sre-weekly-issue-417\/","url_meta":{"origin":861,"position":2},"title":"SRE Weekly Issue #417","date":"March 25, 2024","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, FireHydrant: Join FireHydrant this Thursday for a conversation about on-call burnout and how to prevent it. Get a better understanding of what makes a fatigue-free on-call culture, including real-world examples from your incident management peers. No sales, just shop talk. https:\/\/app.livestorm.co\/firehydrant\/better-incidents-spring-bonfire-secrets-to-fatigue-free-on-call-in-2024 Harnessing\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":798,"url":"https:\/\/fde.cat\/index.php\/2023\/12\/04\/sre-weekly-issue-401\/","url_meta":{"origin":861,"position":3},"title":"SRE Weekly Issue #401","date":"December 4, 2023","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, FireHydrant: Join FireHydrant Dec.14 for a conversation about on-call culture and its effect on engineering organizations, featuring special guests from Outreach and Udemy. Gain a better understanding of what makes excellent on-call culture and how to implement practices to improve yours. https:\/\/app.livestorm.co\/firehydrant\/better-incidents-winter-bonfire-inside-on-call?type=detailed\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":835,"url":"https:\/\/fde.cat\/index.php\/2024\/03\/11\/sre-weekly-issue-415\/","url_meta":{"origin":861,"position":4},"title":"SRE Weekly Issue #415","date":"March 11, 2024","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, FireHydrant: Join FireHydrant and talk shop with your DevOps peers on March 28! You\u2019ll gain a better understanding of what makes a fatigue-free on-call culture and how to implement practices to improve yours at this free, virtual roundtable. https:\/\/app.livestorm.co\/firehydrant\/better-incidents-spring-bonfire-secrets-to-fatigue-free-on-call-in-2024 The Wrong Way\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":823,"url":"https:\/\/fde.cat\/index.php\/2024\/02\/12\/sre-weekly-issue-411\/","url_meta":{"origin":861,"position":5},"title":"SRE Weekly Issue #411","date":"February 12, 2024","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, FireHydrant: \u201cTo be honest, when can we switch?\u201d The first impressions are in. Check out what people are saying after seeing Signals, the new standard in alerting and on-call from FireHydrant, for the first time. https:\/\/firehydrant.com\/signals\/ Shared On-Call Is Where the SRE\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/861","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/comments?post=861"}],"version-history":[{"count":0,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/861\/revisions"}],"wp:attachment":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/media?parent=861"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/categories?post=861"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/tags?post=861"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}