{"id":629,"date":"2022-09-07T16:30:20","date_gmt":"2022-09-07T16:30:20","guid":{"rendered":"https:\/\/fde.cat\/index.php\/2022\/09\/07\/open-sourcing-taobench-an-end-to-end-social-network-benchmark\/"},"modified":"2022-09-07T16:30:20","modified_gmt":"2022-09-07T16:30:20","slug":"open-sourcing-taobench-an-end-to-end-social-network-benchmark","status":"publish","type":"post","link":"https:\/\/fde.cat\/index.php\/2022\/09\/07\/open-sourcing-taobench-an-end-to-end-social-network-benchmark\/","title":{"rendered":"Open-sourcing TAOBench: An end-to-end social network benchmark"},"content":{"rendered":"<h2><span>What the research is:<\/span><\/h2>\n<p><span>The continued emergence of large social network applications has introduced a scale of data and query volume that challenges the limits of existing data stores. However, few benchmarks accurately simulate these request patterns, leaving researchers in short supply of tools to evaluate and improve upon these systems.\u00a0<\/span><\/p>\n<p><span>To address this issue, we are open-sourcing <\/span><a href=\"http:\/\/taobench.org\/\" target=\"_blank\" rel=\"noopener\"><span>TAOBench<\/span><\/a><span>, a new benchmark that captures the social graph workload at Meta. We\u2019re making <\/span><a href=\"https:\/\/github.com\/audreyccheng\/taobench\" target=\"_blank\" rel=\"noopener\"><span>workload configurations<\/span><\/a><span> available, as well as a benchmarking framework that leverages these request features to accurately model production workloads and generate emergent application behavior. We\u2019re ensuring the integrity of TAOBench\u2019s workloads by validating them against their production counterparts. Furthermore, we\u2019re describing several benchmark use cases at Meta and reporting results for five popular distributed database systems to demonstrate the benefits of using TAOBench to evaluate system trade-offs and to identify and address performance issues. Our benchmark fills a gap in the available tools and data that researchers and developers have to inform system design decisions.<\/span><\/p>\n<h2><span>How it works:\u00a0<\/span><\/h2>\n<p><span>Since benchmarks are only as useful as the workloads they are derived from, we have identified five properties that should be captured by their request patterns. A comprehensive social network benchmark should:<\/span><\/p>\n<p><span>Accurately emulate social network requests<\/span><br \/>\n<span>Capture any transactional requirements<\/span><br \/>\n<span>Express data colocation preferences and constraints<\/span><br \/>\n<span>Model request distributions without prescriptive query types<\/span><br \/>\n<span>Exhibit multitenant behavior on shared data<\/span><\/p>\n<p><span>To satisfy these properties, we profile requests served by <\/span><a href=\"https:\/\/engineering.fb.com\/2013\/06\/25\/core-data\/tao-the-power-of-the-graph\/\" target=\"_blank\" rel=\"noopener\"><span>TAO<\/span><\/a><span>, an online graph data store at Meta.<\/span><\/p>\n<p><span>TAO is a read-optimized, geographically distributed data store that provides access to the social graph for diverse products and back-end systems. In aggregate, TAO serves over 10 billion requests per second on a changing dataset of many petabytes. Its workload contains a variety of notable attributes. For example, read and write skew often manifests on different keys: Over 99 percent of data items that are frequently written to are, on average, read less than once per day.\u00a0<\/span><\/p>\n<p><span>To accurately generate TAO\u2019s workloads at a flexible scale, we characterize these request patterns and identify a small set of parameters, including transaction size, key to shard mapping, and frequency of operation types, that are sufficient to replicate production workloads. We then leverage these features in TAOBench to both accurately downscale Meta\u2019s social network workload and model emergent application behavior. Our parametrized framework is open source and extensible, allowing it to simulate a range of different request patterns.<\/span><\/p>\n<p><span>To illustrate TAOBench\u2019s applicability, we report on how Meta uses this tool to test new features, optimizations, and reliability (e.g., hotspots, worst-case scenarios) as well as experiment with speculative workloads that would otherwise be difficult or infeasible to assess in production.<\/span><\/p>\n<p><span>We provide four examples:<\/span><\/p>\n<p><span>Analyzing new transaction use cases<\/span><br \/>\n<span>Assessing contention under longer lock hold times<\/span><br \/>\n<span>Evaluating new APIs<\/span><br \/>\n<span>Quantifying the performance of high fan-out transactions<\/span><\/p>\n<p><span>Furthermore, we provide the results for TAOBench on five widely used distributed databases (Cloud Spanner, CockroachDB, PlanetScale, TiDB, YugabyteDB) to demonstrate how our benchmark can be used to study performance trade-offs and identify optimization opportunities.\u00a0<\/span><\/p>\n<h2><span>Why it matters:\u00a0<\/span><\/h2>\n<p><span>Despite the ubiquity of social networks, there is a lack of publicly available, realistic workloads to guide research on their underlying database infrastructure. In academia, this scarcity makes it difficult to probe the limits of existing systems and develop novel mechanisms to overcome them. In industry, it is challenging for practitioners to evaluate new features and resolve issues without a way to reproduce these request patterns. To address the gap in representative workloads, we present TAOBench, the first open source benchmark that generates end-to-end, transactional request patterns derived from a large-scale social network. With our benchmark, we make Meta\u2019s social graph workload accessible to the database community and provide visibility into the real-world challenges of supporting such workloads.<\/span><\/p>\n<h2><span>Read the full paper:<\/span><\/h2>\n<p><a href=\"https:\/\/www.vldb.org\/pvldb\/vol15\/p1965-cheng.pdf\" target=\"_blank\" rel=\"noopener\">TAOBench: An end-to-end benchmark for social network workloads<\/a><\/p>\n<p>The post <a href=\"https:\/\/engineering.fb.com\/2022\/09\/07\/open-source\/taobench\/\">Open-sourcing TAOBench: An end-to-end social network benchmark<\/a> appeared first on <a href=\"https:\/\/engineering.fb.com\/\">Engineering at Meta<\/a>.<\/p>\n<p>Engineering at Meta<\/p>","protected":false},"excerpt":{"rendered":"<p>What the research is: The continued emergence of large social network applications has introduced a scale of data and query volume that challenges the limits of existing data stores. However, few benchmarks accurately simulate these request patterns, leaving researchers in short supply of tools to evaluate and improve upon these systems.\u00a0 To address this issue,&hellip; <a class=\"more-link\" href=\"https:\/\/fde.cat\/index.php\/2022\/09\/07\/open-sourcing-taobench-an-end-to-end-social-network-benchmark\/\">Continue reading <span class=\"screen-reader-text\">Open-sourcing TAOBench: An end-to-end social network benchmark<\/span><\/a><\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","footnotes":""},"categories":[7],"tags":[],"class_list":["post-629","post","type-post","status-publish","format-standard","hentry","category-technology","entry"],"jetpack_featured_media_url":"","jetpack-related-posts":[{"id":758,"url":"https:\/\/fde.cat\/index.php\/2023\/09\/07\/using-chakra-execution-traces-for-benchmarking-and-network-performance-optimization\/","url_meta":{"origin":629,"position":0},"title":"Using Chakra execution traces for benchmarking and network performance optimization","date":"September 7, 2023","format":false,"excerpt":"Meta presents Chakra execution traces, an open graph-based representation of AI\/ML workload execution, laying the foundation for benchmarking and network performance optimization. Chakra execution traces represent key operations, such as compute, memory, and communication, data and control dependencies, timing, and resource constraints. In collaboration with MLCommons, we are seeking industry-wide\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":759,"url":"https:\/\/fde.cat\/index.php\/2023\/09\/07\/arcadia-an-end-to-end-ai-system-performance-simulator\/","url_meta":{"origin":629,"position":1},"title":"Arcadia: An end-to-end AI system performance simulator","date":"September 7, 2023","format":false,"excerpt":"We\u2019re introducing Arcadia, Meta\u2019s unified system that simulates the compute, memory, and network performance of AI training clusters. Extracting maximum performance from an AI cluster and increasing overall efficiency warrants a multi-input system that accounts for various hardware and software parameters across compute, storage, and network collectively. Arcadia gives Meta\u2019s\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":787,"url":"https:\/\/fde.cat\/index.php\/2023\/11\/15\/watch-metas-engineers-on-building-network-infrastructure-for-ai\/","url_meta":{"origin":629,"position":2},"title":"Watch: Meta\u2019s engineers on building network infrastructure for AI","date":"November 15, 2023","format":false,"excerpt":"Meta is building for the future of AI at every level \u2013 from hardware like MTIA v1, Meta\u2019s first-generation AI inference accelerator to publicly released models like Llama 2, Meta\u2019s next-generation large language model, as well as new generative AI (GenAI) tools like Code Llama. Delivering next-generation AI products and\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":458,"url":"https:\/\/fde.cat\/index.php\/2021\/09\/20\/cachelib-facebooks-open-source-caching-engine-for-web-scale-services\/","url_meta":{"origin":629,"position":3},"title":"CacheLib, Facebook\u2019s open source caching engine for web-scale services","date":"September 20, 2021","format":false,"excerpt":"Caching plays an important role in helping people access their information efficiently. For example, when an email app loads, it temporarily caches some messages, so the user can refresh the page without the app retrieving the same messages. However, large-scale caching has long been a complex engineering challenge. Companies must\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":555,"url":"https:\/\/fde.cat\/index.php\/2022\/03\/17\/detecting-silent-errors-in-the-wild-combining-two-novel-approaches-to-quickly-detect-silent-data-corruptions-at-scale\/","url_meta":{"origin":629,"position":4},"title":"Detecting silent errors in the wild: Combining two novel approaches to quickly detect silent data corruptions at scale","date":"March 17, 2022","format":false,"excerpt":"Silent data corruptions (SDCs), data errors that go undetected by the larger system, are a widespread problem for large-scale infrastructure systems. Left undetected, these types of corruptions can cause data loss and propagate across the stack and manifest as application-level problems. Silent data corruptions (SDC) in hardware impact computational integrity\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":672,"url":"https:\/\/fde.cat\/index.php\/2023\/01\/31\/asynchronous-computing-at-meta-overview-and-learnings\/","url_meta":{"origin":629,"position":5},"title":"Asynchronous computing at Meta: Overview and learnings","date":"January 31, 2023","format":false,"excerpt":"We\u2019ve made architecture changes to Meta\u2019s event driven asynchronous computing platform that have\u00a0 enabled easy integration with multiple event-sources.\u00a0 We\u2019re sharing our learnings from handling various workloads and how to tackle trade offs made with certain design choices in building the platform. Asynchronous computing is a paradigm where the user\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/629","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/comments?post=629"}],"version-history":[{"count":0,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/629\/revisions"}],"wp:attachment":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/media?parent=629"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/categories?post=629"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/tags?post=629"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}