We’ve made architecture changes to Meta’s event driven asynchronous computing platform that have enabled easy integration with multiple event-sources. We’re sharing our learnings from handling various workloads and how to tackle trade offs made with certain design choices in building the platform. Asynchronous computing is a paradigm where the user does not expect a workload… Continue reading Asynchronous computing at Meta: Overview and learnings
Month: January 2023
SRE Weekly Issue #357
View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒. Rootly automates manual tasks like creating an incident channel, Jira ticket and Zoom rooms, inviting responders, creating statuspage updates, postmortem timelines and more. Want to see why companies like Canva and Grammarly love us?: https://rootly.com/demo/ Articles 3 tips for… Continue reading SRE Weekly Issue #357
Watch Meta’s engineers discuss optimizing large-scale networks
Managing network solutions amidst a growing scale inherently brings challenges around performance, deployment, and operational complexities. At Meta, we’ve found that these challenges broadly fall into three themes: 1.) Data center networking: Over the past decade, on the physical front, we have seen a rise in vendor-specific hardware that comes with heterogeneous feature and… Continue reading Watch Meta’s engineers discuss optimizing large-scale networks
Tulip: Modernizing Meta’s data platform
The technical journey discusses the motivations, challenges, and technical solutions employed for warehouse schematization, especially a change to the wire serialization format employed in Meta’s data platform for data interchange related to Warehouse Analytics Logging. Here, we discuss the engineering, scaling, and nontechnical challenges of modernizing Meta’s exabyte-scale data platform by migrating to the new… Continue reading Tulip: Modernizing Meta’s data platform
SRE Weekly Issue #356
View on sreweekly.com Thanks to all of you that took the time to share your ideas about choosing incidents to investigate! I got some great answers and I’m looking forward to pulling them together into an article. I decided to give this GPT-3 thing a spin. It turns out that it absolutely can assemble a… Continue reading SRE Weekly Issue #356
SRE Weekly Issue #355
View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒. Rootly automates manual tasks like creating an incident channel, Jira ticket and Zoom rooms, inviting responders, creating statuspage updates, postmortem timelines and more. Want to see why companies like Canva and Grammarly love us?: https://rootly.com/demo/ Articles Which incidents aren’t… Continue reading SRE Weekly Issue #355
SRE Weekly Issue #354
View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒. Rootly automates manual tasks like creating an incident channel, Jira ticket and Zoom rooms, inviting responders, creating statuspage updates, postmortem timelines and more. Want to see why companies like Canva and Grammarly love us?: https://rootly.com/demo/ Articles DisasterCast Episode 31… Continue reading SRE Weekly Issue #354
No issue this week
Happy new year! I’m taking a break this week and I’ll be back with a new issue next week. See you in issue #354! SRE WEEKLY