View on sreweekly.com A message from our sponsor, FireHydrant: FireHydrant is now AI-powered for faster, smarter incidents! Power up your incidents with auto-generated real-time summaries, retrospectives, and status page updates. https://firehydrant.com/blog/ai-for-incident-management-is-here/ My Availability Investment Playbook Here’s an ultra-practical guide to pushing for reliability investments at your company, formatted as a runbook with a set of… Continue reading SRE Weekly Issue #424
Category: SRE
Posts related to Site Reliability Engineering
SRE Weekly Issue #423
View on sreweekly.com A message from our sponsor, FireHydrant: FireHydrant is now AI-powered for faster, smarter incidents! Power up your incidents with auto-generated real-time summaries, retrospectives, and status page updates. https://firehydrant.com/blog/ai-for-incident-management-is-here/ How to Fight Alert Fatigue with Synthetic Monitoring This one’s full of great advice about making sure alerts are actionable, including alerting on flows… Continue reading SRE Weekly Issue #423
SRE Weekly Issue #422
View on sreweekly.com A message from our sponsor, FireHydrant: FireHydrant is now AI-powered for faster, smarter incidents! Power up your incidents with auto-generated real-time summaries, retrospectives, and status page updates. https://firehydrant.com/blog/ai-for-incident-management-is-here/ PIOSEE Decision Model and preparations for critical situations The PIOSEE model is taught to pilots as a rubric for coming to a decision in… Continue reading SRE Weekly Issue #422
SRE Weekly Issue #421
View on sreweekly.com Last week, I mistakenly attributed [an article](https://www.paigerduty.com/sre-biggest-problem/) to PagerDuty. Actually, it was by Paige Cruz, whose clever blog name I didn’t pay anywhere near close enough attention to! Thanks to several readers that nudged me gently about my error. A message from our sponsor, FireHydrant: FireHydrant is now AI-powered for faster, smarter… Continue reading SRE Weekly Issue #421
SRE Weekly Issue #420
View on sreweekly.com A message from our sponsor, FireHydrant: FireHydrant is now AI-powered for faster, smarter incidents! Power up your incidents with auto-generated real-time summaries, retrospectives, and status page updates. https://firehydrant.com/blog/ai-for-incident-management-is-here/ 1.0 Launch Retrospective The game Last Epoch launched in February, and they had a rocky start. This huge retrospective post tells the story of… Continue reading SRE Weekly Issue #420
SRE Weekly Issue #419
View on sreweekly.com A message from our sponsor, FireHydrant: FireHydrant is now AI-powered for faster, smarter incidents! Power up your incidents with auto-generated real-time summaries, retrospectives, and status page updates. https://firehydrant.com/blog/ai-for-incident-management-is-here/ How Figma’s Databases Team Lived to Tell the Scale Our nine month journey to horizontally shard Figma’s Postgres stack, and the key to unlocking… Continue reading SRE Weekly Issue #419
SRE Weekly Issue #418
View on sreweekly.com A message from our sponsor, FireHydrant: FireHydrant is now AI-powered for faster, smarter incidents! Power up your incidents with auto-generated real-time summaries, retrospectives, and status page updates. https://firehydrant.com/blog/ai-for-incident-management-is-here/ Redefining Observability The observability waters have been muddy for awhile, and this article does a great job of taking a step back and building… Continue reading SRE Weekly Issue #418
SRE Weekly Issue #417
View on sreweekly.com A message from our sponsor, FireHydrant: Join FireHydrant this Thursday for a conversation about on-call burnout and how to prevent it. Get a better understanding of what makes a fatigue-free on-call culture, including real-world examples from your incident management peers. No sales, just shop talk. https://app.livestorm.co/firehydrant/better-incidents-spring-bonfire-secrets-to-fatigue-free-on-call-in-2024 Harnessing chaos in Cloudflare offices Remember… Continue reading SRE Weekly Issue #417
SRE Weekly Issue #416
View on sreweekly.com A message from our sponsor, FireHydrant: We need tools that help us show our value, enhance understanding of our systems, and free time for us to expand our skills. In this article, FireHydrant lays out three questions to ask vendors as you evaluate DevOps tools. https://firehydrant.com/blog/3-questions-to-ask-of-any-devops-tool-in-2024/ 4 Instructive Postmortems on Data Downtime… Continue reading SRE Weekly Issue #416
SRE Weekly Issue #415
View on sreweekly.com A message from our sponsor, FireHydrant: Join FireHydrant and talk shop with your DevOps peers on March 28! You’ll gain a better understanding of what makes a fatigue-free on-call culture and how to implement practices to improve yours at this free, virtual roundtable. https://app.livestorm.co/firehydrant/better-incidents-spring-bonfire-secrets-to-fatigue-free-on-call-in-2024 The Wrong Way to Use DORA Metrics […]… Continue reading SRE Weekly Issue #415