Meta is introducing Velox, an open source unified execution engine aimed at accelerating data management systems and streamlining their development. Velox is under active development. Experimental results from our paper published at the International Conference on Very Large Data Bases (VLDB) 2022 show how Velox improves efficiency and consistency in data management systems. Velox helps… Continue reading Introducing Velox: An open source unified execution engine
Month: August 2022
Hyperpacks: Using Buildpacks to Build Hyperforce
At Salesforce we regularly use our products and services to scale our own business. One example is Buildpacks, which we created nearly a decade ago and is now a part of Hyperforce. Hyperpacks are an innovative new way of using Cloud Native Buildpacks (CNB) to manage our public cloud infrastructure. Buildpacks were created to help… Continue reading Hyperpacks: Using Buildpacks to Build Hyperforce
Improving Meta’s SLO workflows with data annotations
When we focus on minimizing errors and downtime here at Meta, we place a lot of attention on service-level indicators (SLIs) and service-level objectives (SLOs). Consider Instagram, for example. There, SLIs represent metrics from different product surfaces, like the volume of error response codes to certain endpoints, or the number of successful media uploads. Based… Continue reading Improving Meta’s SLO workflows with data annotations
SRE Weekly Issue #336
View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging and adding responders, postmortem timeline, setting up reminders, and more. Book a demo (+ get a snazzy Rootly lego set): https://rootly.com/demo/ Articles What it’s like to… Continue reading SRE Weekly Issue #336
SRE Weekly Issue #335
View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging and adding responders, postmortem timeline, setting up reminders, and more. Book a demo (+ get a snazzy Rootly lego set): https://rootly.com/demo/ Articles How an incident transformed… Continue reading SRE Weekly Issue #335
SRE Weekly Issue #334
View on sreweekly.com I’ll be on vacation starting next Sunday (yay!). That means the next two issues will be prepared in advance, so there won’t be an Outages section. A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging… Continue reading SRE Weekly Issue #334
How Instagram suggests new content
A touring alien from a galaxy far, far away is an avid Instagram user. Her Instagram Feed is dominated by: Friends and family posts Some space travel magazines A few general news accounts Lots of science fiction blogs She logs in, scrolls through her feed gently — catching up with friends and family, keeping pace… Continue reading How Instagram suggests new content
Architectural Principles for High Availability on Hyperforce
Infrastructure and software failures will happen. We idolize four 9s (99.99%) availability. We know we need to optimize and improve Recovery-Time-Objective (RTO, the time it takes to restore service after a service disruption) and Recovery-Point-Objective (RPO, the acceptable data loss measured in time). But how can we actually deliver high availability for our customers? One… Continue reading Architectural Principles for High Availability on Hyperforce
Scaling data ingestion for machine learning training at Meta
Many of Meta’s products, such as search and language translations, utilize AI models to continuously improve user experiences. As the performance of hardware we use to support training infrastructure increases, we need to scale our data ingestion infrastructure accordingly to handle workloads more efficiently. GPUs, which are used for training infrastructure, tend to double in… Continue reading Scaling data ingestion for machine learning training at Meta
SRE Weekly Issue #333
View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging and adding responders, postmortem timeline, setting up reminders, and more. Book a demo (+ get a snazzy Rootly lego set): https://rootly.com/demo/ Articles Is SRE Just Ops… Continue reading SRE Weekly Issue #333