View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging the right team, postmortem timeline, setting up reminders, and more. Book a demo: https://rootly.io/?utm_source=sreweekly Articles Understanding How Facebook Disappeared from the Internet Facebook’s outage caused significantly… Continue reading SRE Weekly Issue #291
Month: October 2021
Inside the Lab: Expanding connectivity by sea, land, and air
The post Inside the Lab: Expanding connectivity by sea, land, and air appeared first on Facebook Engineering. Facebook Engineering
Evolution of Region Assignment in the Apache HBase Architecture — Part 3
Evolution of Region Assignment in the Apache HBase Architecture — Part 3 Authors: Viraj Jasani, Andrew Purtell, å¼ é“Ž(Duo Zhang) In the second part of this blog post series, we provided an overview of how the redesigned AssignmentManager in HBase 2 efficiently and reliably manages the process of region assignment. In this third entry in this blog post series, we… Continue reading Evolution of Region Assignment in the Apache HBase Architecture — Part 3
More details about the October 4 outage
Now that our platforms are up and running as usual after yesterday’s outage, I thought it would be worth sharing a little more detail on what happened and why — and most importantly, how we’re learning from it. This outage was triggered by the system that manages our global backbone network capacity. The backbone is… Continue reading More details about the October 4 outage
Lessons Learned using Spring Data Redis
Context Our Commerce Cloud team that is in charge of the Omnichannel Inventory service uses Redis as a remote cache to store data that lends itself for caching. The remote cache allows our multiple processes to get a synchronized and single view of the cached data. (See our previous blog post, Coordinated Rate Limiting in… Continue reading Lessons Learned using Spring Data Redis
Update about the October 4th outage
To all the people and businesses around the world who depend on us, we are sorry for the inconvenience caused by today’s outage across our platforms. We’ve been working as hard as we can to restore access, and our systems are now back up and running. The underlying cause of this outage also impacted many… Continue reading Update about the October 4th outage
SRE Weekly Issue #290
View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging the right team, postmortem timeline, setting up reminders, and more. Book a demo: https://rootly.io/?utm_source=sreweekly Articles Postmortem: Partial RavenDB Cloud outage Despite carefully testing how they would… Continue reading SRE Weekly Issue #290