{"id":849,"date":"2024-04-08T01:58:52","date_gmt":"2024-04-08T01:58:52","guid":{"rendered":"https:\/\/fde.cat\/index.php\/2024\/04\/08\/sre-weekly-issue-419\/"},"modified":"2024-04-08T01:58:52","modified_gmt":"2024-04-08T01:58:52","slug":"sre-weekly-issue-419","status":"publish","type":"post","link":"https:\/\/fde.cat\/index.php\/2024\/04\/08\/sre-weekly-issue-419\/","title":{"rendered":"SRE Weekly Issue #419"},"content":{"rendered":"<p><a href=\"https:\/\/sreweekly.com\/sre-weekly-issue-419\/\" title=\"Permalink to SRE Weekly Issue #419\" class=\"email_only\">View on sreweekly.com<\/a><\/p>\n<div class=\"sreweekly-sponsor-message\">\n<h2>A message from our sponsor, <a href=\"https:\/\/firehydrant.com\/\">FireHydrant<\/a>:<\/h2>\n<p>FireHydrant is now AI-powered for faster, smarter incidents! Power up your incidents with auto-generated real-time summaries, retrospectives, and status page updates.<br \/>\n<a href=\"https:\/\/firehydrant.com\/blog\/ai-for-incident-management-is-here\/\">https:\/\/firehydrant.com\/blog\/ai-for-incident-management-is-here\/<\/a><\/p>\n<\/div>\n<div class=\"wp-block-group is-layout-flow wp-block-group-is-layout-flow\">\n<div class=\"wp-block-group__inner-container\">\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/www.figma.com\/blog\/how-figmas-databases-team-lived-to-tell-the-scale\/\" target=\"_blank\" rel=\"noopener\">How Figma\u2019s Databases Team Lived to Tell the Scale<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Our nine month journey to horizontally shard Figma\u2019s Postgres stack, and the key to unlocking (nearly) infinite scalability.<\/p>\n<p>Retrofitting sharding is a huge undertaking.<\/p>\n<p>\u00a0\u00a0<small>Sammy Steele \u2014 Figma<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/graphite.dev\/blog\/staging-environment\" target=\"_blank\" rel=\"noopener\">Moving fast breaks things: the importance of a staging environment<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Ride along as this company evolves from constantly shipping directly to production to a robust staging and internal canary deployment system.<\/p>\n<p>\u00a0\u00a0<small>Greg Foster \u2014 Graphite<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/graphite.dev\/blog\/post-mortem-12-06-23\" target=\"_blank\" rel=\"noopener\">Post mortem: we took 124 seconds from you, here\u2019s 378 back<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>A lighthearted but still detail-filled take on a post-incident analysis for a short production outage.<\/p>\n<p>\u00a0\u00a0<small>Greg Foster \u2014 Graphite<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/temporal.io\/blog\/building-application-reliability-on-top-of-infrastructure-unreliability\" target=\"_blank\" rel=\"noopener\">Building Application Reliability on Top of Infrastructure Unreliability<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>This one has an interesting discussion of the nature of reliability and the impact of multiple services on overall reliability, including possible mathematical models to use.<\/p>\n<p>\u00a0\u00a0<small>Fitz \u2014 Temporal<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/www.srepath.com\/clearing-observability-delusions\/\" target=\"_blank\" rel=\"noopener\">#30 Clearing Delusions in Observability (with David Caudill)<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>This episode of the SREPath Podcast covers a variety of themes around observability and SLOs.  There\u2019s a great text-based summary if that\u2019s your preference.<\/p>\n<p>\u00a0\u00a0<small>Ash Patel \u2014 SREPath<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"http:\/\/www.brendangregg.com\/blog\/\/2024-03-24\/linux-crisis-tools.html\" target=\"_blank\" rel=\"noopener\">Linux Crisis Tools<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>This piece argues that you should install system debugging tools in on your production systems now, because it\u2019s going to be really hard to do it live when you need them.<\/p>\n<p>\u00a0\u00a0<small>Brendan Gregg<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/hross.substack.com\/p\/how-much-are-their-9s-worth\" target=\"_blank\" rel=\"noopener\">How much are their 9\u2019s worth?<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>Following on from a <a href=\"https:\/\/hross.substack.com\/p\/how-much-are-your-9s-worth\">previous article<\/a> about the squiggliness of availability numbers, this article evaluates SLAs from 4 major companies to try to divine what they actually mean.<\/p>\n<p>\u00a0\u00a0<small>Ross Brodbeck<\/small><\/p>\n<\/div>\n<\/div>\n<div class=\"sreweekly-entry\">\n<div class=\"sreweekly-title\"><a href=\"https:\/\/mkaz.me\/blog\/2024\/slo-formulas-implementation-in-promql-step-by-step\/\" target=\"_blank\" rel=\"noopener\">SLO formulas implementation in PromQL step by step<\/a><\/div>\n<div class=\"sreweekly-description\">\n<p>I want to present real-life examples of both availability and latency SLOs, as they are more nuanced than they may initially appear. Also, I find it worthwhile sharing a detailed guide as it showcases uncommon uses of PromQL and demonstrates the language\u2019s versatility.<\/p>\n<p>\u00a0\u00a0<small>Micha\u0142 Ka\u017amierczak<\/small><\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p>SRE WEEKLY<\/p>","protected":false},"excerpt":{"rendered":"<p>View on sreweekly.com A message from our sponsor, FireHydrant: FireHydrant is now AI-powered for faster, smarter incidents! Power up your incidents with auto-generated real-time summaries, retrospectives, and status page updates. https:\/\/firehydrant.com\/blog\/ai-for-incident-management-is-here\/ How Figma\u2019s Databases Team Lived to Tell the Scale Our nine month journey to horizontally shard Figma\u2019s Postgres stack, and the key to unlocking&hellip; <a class=\"more-link\" href=\"https:\/\/fde.cat\/index.php\/2024\/04\/08\/sre-weekly-issue-419\/\">Continue reading <span class=\"screen-reader-text\">SRE Weekly Issue #419<\/span><\/a><\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","footnotes":""},"categories":[8],"tags":[],"class_list":["post-849","post","type-post","status-publish","format-standard","hentry","category-sre","entry"],"jetpack_featured_media_url":"","jetpack-related-posts":[{"id":663,"url":"https:\/\/fde.cat\/index.php\/2022\/12\/19\/sre-weekly-issue-352\/","url_meta":{"origin":849,"position":0},"title":"SRE Weekly Issue #352","date":"December 19, 2022","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly\u00a0\ud83d\ude92. Rootly automates manual tasks like creating an incident channel, Jira ticket and Zoom rooms, inviting responders, creating statuspage updates, postmortem timelines and more. Want to see why companies like Canva and Grammarly love us?:\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":855,"url":"https:\/\/fde.cat\/index.php\/2024\/04\/15\/sre-weekly-issue-420\/","url_meta":{"origin":849,"position":1},"title":"SRE Weekly Issue #420","date":"April 15, 2024","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, FireHydrant: FireHydrant is now AI-powered for faster, smarter incidents! Power up your incidents with auto-generated real-time summaries, retrospectives, and status page updates. https:\/\/firehydrant.com\/blog\/ai-for-incident-management-is-here\/ 1.0 Launch Retrospective The game Last Epoch launched in February, and they had a rocky start. This huge retrospective\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":844,"url":"https:\/\/fde.cat\/index.php\/2024\/03\/25\/sre-weekly-issue-417\/","url_meta":{"origin":849,"position":2},"title":"SRE Weekly Issue #417","date":"March 25, 2024","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, FireHydrant: Join FireHydrant this Thursday for a conversation about on-call burnout and how to prevent it. Get a better understanding of what makes a fatigue-free on-call culture, including real-world examples from your incident management peers. No sales, just shop talk. https:\/\/app.livestorm.co\/firehydrant\/better-incidents-spring-bonfire-secrets-to-fatigue-free-on-call-in-2024 Harnessing\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":798,"url":"https:\/\/fde.cat\/index.php\/2023\/12\/04\/sre-weekly-issue-401\/","url_meta":{"origin":849,"position":3},"title":"SRE Weekly Issue #401","date":"December 4, 2023","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, FireHydrant: Join FireHydrant Dec.14 for a conversation about on-call culture and its effect on engineering organizations, featuring special guests from Outreach and Udemy. Gain a better understanding of what makes excellent on-call culture and how to implement practices to improve yours. https:\/\/app.livestorm.co\/firehydrant\/better-incidents-winter-bonfire-inside-on-call?type=detailed\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":835,"url":"https:\/\/fde.cat\/index.php\/2024\/03\/11\/sre-weekly-issue-415\/","url_meta":{"origin":849,"position":4},"title":"SRE Weekly Issue #415","date":"March 11, 2024","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, FireHydrant: Join FireHydrant and talk shop with your DevOps peers on March 28! You\u2019ll gain a better understanding of what makes a fatigue-free on-call culture and how to implement practices to improve yours at this free, virtual roundtable. https:\/\/app.livestorm.co\/firehydrant\/better-incidents-spring-bonfire-secrets-to-fatigue-free-on-call-in-2024 The Wrong Way\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":823,"url":"https:\/\/fde.cat\/index.php\/2024\/02\/12\/sre-weekly-issue-411\/","url_meta":{"origin":849,"position":5},"title":"SRE Weekly Issue #411","date":"February 12, 2024","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, FireHydrant: \u201cTo be honest, when can we switch?\u201d The first impressions are in. Check out what people are saying after seeing Signals, the new standard in alerting and on-call from FireHydrant, for the first time. https:\/\/firehydrant.com\/signals\/ Shared On-Call Is Where the SRE\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/849","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/comments?post=849"}],"version-history":[{"count":0,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/849\/revisions"}],"wp:attachment":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/media?parent=849"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/categories?post=849"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/tags?post=849"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}