SRE Weekly Issue #338

View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒. Rootly automates manual tasks like creating an incident channel, Jira ticket and Zoom rooms, inviting responders, creating statuspage updates, postmortem timelines and more. Want to see why companies like Canva and Grammarly love us?: https://rootly.com/demo/ Articles Intro to Themes… Continue reading SRE Weekly Issue #338

Published
Categorized as SRE

Open-sourcing TAOBench: An end-to-end social network benchmark

What the research is: The continued emergence of large social network applications has introduced a scale of data and query volume that challenges the limits of existing data stores. However, few benchmarks accurately simulate these request patterns, leaving researchers in short supply of tools to evaluate and improve upon these systems.  To address this issue,… Continue reading Open-sourcing TAOBench: An end-to-end social network benchmark

Published
Categorized as Technology

Network Entitlement: A contract-based network sharing solution

Meta’s overall network usage and traffic volume has increased as we’ve continued to add new services. Due to the scarcity of fiber resources, we’re developing an explicit resource reservation framework to effectively plan, manage, and operate the shared consumption of network bandwidth, which will help us keep up with demand and limit network disruptions during… Continue reading Network Entitlement: A contract-based network sharing solution

Published
Categorized as Technology

Viewing the world as a computer: Global capacity management

Meta currently operates 14 data centers around the world. This rapidly expanding global data center footprint poses new challenges for service owners and for our infrastructure management systems. Systems like Twine, which we use to scale cluster management, and RAS, which handles perpetual region-wide resource allocation, have provided the abstractions and automation necessary for service… Continue reading Viewing the world as a computer: Global capacity management

Published
Categorized as Technology

SRE Weekly Issue #337

View on sreweekly.com Thanks for all the vacation well-wishes! It was really great and relaxing. Take vacations, it’s important for reliability! While I was out, I shipped the past two issues with content prepared in advance, and without the Outages section. This gave me a chance to really think hard about the value of the… Continue reading SRE Weekly Issue #337

Published
Categorized as SRE

Introducing Velox: An open source unified execution engine

Meta is introducing Velox, an open source unified execution engine aimed at accelerating data management systems and streamlining their development. Velox is under active development. Experimental results from our paper published at the International Conference on Very Large Data Bases (VLDB) 2022 show how Velox improves efficiency and consistency in data management systems. Velox helps… Continue reading Introducing Velox: An open source unified execution engine

Published
Categorized as Technology

Hyperpacks: Using Buildpacks to Build Hyperforce

At Salesforce we regularly use our products and services to scale our own business. One example is Buildpacks, which we created nearly a decade ago and is now a part of Hyperforce. Hyperpacks are an innovative new way of using Cloud Native Buildpacks (CNB) to manage our public cloud infrastructure.  Buildpacks were created to help… Continue reading Hyperpacks: Using Buildpacks to Build Hyperforce

Published
Categorized as Technology

Improving Meta’s SLO workflows with data annotations

When we focus on minimizing errors and downtime here at Meta, we place a lot of attention on service-level indicators (SLIs) and service-level objectives (SLOs). Consider Instagram, for example. There, SLIs represent metrics from different product surfaces, like the volume of error response codes to certain endpoints, or the number of successful media uploads. Based… Continue reading Improving Meta’s SLO workflows with data annotations

Published
Categorized as Technology

SRE Weekly Issue #336

View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging and adding responders, postmortem timeline, setting up reminders, and more. Book a demo (+ get a snazzy Rootly lego set): https://rootly.com/demo/ Articles What it’s like to… Continue reading SRE Weekly Issue #336

Published
Categorized as SRE

SRE Weekly Issue #335

View on sreweekly.com A message from our sponsor, Rootly: Manage incidents directly from Slack with Rootly 🚒. Automate manual admin tasks like creating incident channel, Jira and Zoom, paging and adding responders, postmortem timeline, setting up reminders, and more. Book a demo (+ get a snazzy Rootly lego set): https://rootly.com/demo/ Articles How an incident transformed… Continue reading SRE Weekly Issue #335

Published
Categorized as SRE