ZippyDB is the largest strongly consistent, geographically distributed key-value store at Facebook. Since we first deployed ZippyDB in 2012, this key-value store has expanded rapidly, and today, ZippyDB serves a number of use cases, ranging from metadata for a distributed filesystem, counting events for both internal and external purposes, to product data that’s used for… Continue reading How we built a general purpose key value store for Facebook with ZippyDB
Author: fde
SRE Weekly Issue #282
View on sreweekly.com A message from our sponsor, StackHawk: ICYMI ZAP Creator and Project Lead Simon Bennetts recently unveiled ZAP’s new automation framework. Watch the session and see how it works: https://sthwk.com/Automation-Framework Articles A thorough introduction to bpftrace I really need to learn bpftrace, and this article is a great place to start. Brendan Gregg… Continue reading SRE Weekly Issue #282
Risk-driven backbone management during COVID-19 and beyond
What the research is: A first-of-its-kind study detailing our backbone management strategy to ensure high service performance throughout the COVID-19 pandemic. The pandemic moved most social interactions online and caused an unprecedented stress test on our global network infrastructure with tens of data center regions. At this scale, failures such as fiber cuts, router misconfigurations,… Continue reading Risk-driven backbone management during COVID-19 and beyond
Real-time Einstein Insights Using Kafka Streams
Sales representatives deal with hundreds of emails everyday. To help them prioritize, Salesforce offers critical insights on emails received. These insights are either generated by our deep learning models or defined by the customer by matching keywords using regex expressions. Insights are generated in real time in our microservice architecture, which is built using Kafka… Continue reading Real-time Einstein Insights Using Kafka Streams
Open-sourcing a more precise time appliance
Facebook engineers have built and open-sourced an Open Compute Time Appliance, an important component of the modern timing infrastructure. To make this possible, we came up with the Time Card — a PCI Express (PCIe) card that can turn almost any commodity server into a time appliance. With the help of the OCP community, we… Continue reading Open-sourcing a more precise time appliance
SRE Weekly Issue #283
View on sreweekly.com I’m on vacation enjoying the sunny beaches in Maine with my family, so I prepared this week’s issue in advance. No outages section, save for one big one I noticed due to direct personal experience. See you all next week! A message from our sponsor, StackHawk: StackHawk is now integrated with GitHub Code… Continue reading SRE Weekly Issue #283
Apricot subsea cable will boost internet capacity, speeds in the Asia-Pacific region
We are excited to announce our participation in the Apricot subsea cable system, together with leading regional and global partners. When completed, the project (which is still subject to regulatory approvals) will deliver much-needed internet capacity, redundancy, and reliability to expand connections in the Asia-Pacific region. The 12,000-kilometer-long cable will connect Japan, Taiwan, Guam, the… Continue reading Apricot subsea cable will boost internet capacity, speeds in the Asia-Pacific region
RAMP-TAO: Layering atomic transactions on Facebook’s online graph store
What the research is: RAMP-TAO is a new protocol that improves the developer experience on TAO, Facebook’s online social graph store, by providing stronger transactional guarantees. It is the first protocol to provide transactional semantics over an eventually consistent massive-scale data store while still preserving the system’s overall reliability and performance. RAMP-TAO enables an intuitive… Continue reading RAMP-TAO: Layering atomic transactions on Facebook’s online graph store
SRE Weekly Issue #284
View on sreweekly.com Like last week, I prepared this week’s issue in advance, so no Outages section. Have a great week! A message from our sponsor, StackHawk: Trying to automate application and API security testing? See how StackHawk and Burp Suite Enterprise stack up: https://sthwk.com/burp-enterprise Articles Alerting on SLOs like Pros Soundcloud is very clear… Continue reading SRE Weekly Issue #284
How to Rename a Helm Release
Problem The process for migrating from Helm v2 to v3, the latest stable major release, was pretty straightforward. However, while performing the migration, we encountered an anomaly with how one of the application charts had been deployed, thus introducing additional challenges. One of our application’s Helm v2 releases did not adhere to the standard naming… Continue reading How to Rename a Helm Release