{"id":827,"date":"2024-02-20T17:00:29","date_gmt":"2024-02-20T17:00:29","guid":{"rendered":"https:\/\/fde.cat\/index.php\/2024\/02\/20\/aligning-velox-and-apache-arrow-towards-composable-data-management\/"},"modified":"2024-02-20T17:00:29","modified_gmt":"2024-02-20T17:00:29","slug":"aligning-velox-and-apache-arrow-towards-composable-data-management","status":"publish","type":"post","link":"https:\/\/fde.cat\/index.php\/2024\/02\/20\/aligning-velox-and-apache-arrow-towards-composable-data-management\/","title":{"rendered":"Aligning Velox and Apache Arrow: Towards composable data management"},"content":{"rendered":"<p><span>We\u2019ve partnered with<\/span> <a href=\"https:\/\/voltrondata.com\/\" target=\"_blank\" rel=\"noopener\"><span>Voltron Data<\/span><\/a><span> and the Arrow community to align and converge Apache Arrow with <\/span><a href=\"https:\/\/engineering.fb.com\/2023\/03\/09\/open-source\/velox-open-source-execution-engine\/\" target=\"_blank\" rel=\"noopener\"><span>Velox<\/span><\/a><span>, Meta\u2019s open source execution engine.<\/span><br \/>\n<span>Apache Arrow 15 includes three new format layouts developed through this partnership: StringView, ListView, and Run-End-Encoding (REE).<\/span><br \/>\n<span>This new convergence helps Meta and the larger community build data management systems that are unified, more efficient, and composable.<\/span><\/p>\n<p><span>Meta\u2019s Data Infrastructure teams have been<\/span> <a href=\"https:\/\/www.cidrdb.org\/cidr2023\/papers\/p77-chattopadhyay.pdf\" target=\"_blank\" rel=\"noopener\"><span>rethinking how data management systems are designed<\/span><\/a><span>. We want to<\/span> <a href=\"https:\/\/research.facebook.com\/publications\/the-composable-data-management-system-manifesto\/\" target=\"_blank\" rel=\"noopener\"><span>make our data management systems more composable<\/span><\/a><span> \u2013 meaning that instead of individually developing systems as monoliths we identify common components, factor them out as reusable libraries, and leverage common APIs and standards to increase the interoperability between them.\u00a0<\/span><\/p>\n<p><span>As we decompose our large, monolithic systems into a more modular stack of reusable components, open standards, such as<\/span> <a href=\"https:\/\/arrow.apache.org\/\"><span>Apache Arrow<\/span><\/a><span>, <\/span><span>play an important role for interoperability of these components. To further our efforts in creating a more unified data landscape for our systems as well as those in the larger community, we\u2019ve partnered with<\/span> <span>Voltron Data<\/span><span> and the Arrow community to converge Apache Arrow\u2019s open source columnar layouts with Velox, Meta\u2019s open source execution engine.<\/span><\/p>\n<p><span>The result combines the efficiency and agility offered by Velox with the widely-used Apache standard.\u00a0\u00a0<\/span><\/p>\n<h2><span>Why we need a composable data management system<\/span><\/h2>\n<p><span>Meta\u2019s data engines support large-scale workloads that include processing large datasets offline (ETL), interactive dashboard generation, ad hoc data exploration, and stream processing. More recently, a variety of feature engineering, data preprocessing, and training systems were built to support our rapidly expanding AI\/ML infrastructure. To ensure our engineering teams can efficiently maintain and enhance these engines as our products evolve, Meta has started a series of projects aimed at increasing our engineering efficiency by minimizing the duplication of work, improving the experience of internal data users through more consistent semantics across these engines, and, ultimately, accelerating the pace of innovation in data management.\u00a0<\/span><\/p>\n<h2><span>An introduction to Velox<\/span><\/h2>\n<p><a href=\"https:\/\/engineering.fb.com\/2023\/03\/09\/open-source\/velox-open-source-execution-engine\/\" target=\"_blank\" rel=\"noopener\"><span>Velox<\/span><\/a><span> is the first project in our composable data management system program. It\u2019s a unified execution engine, implemented as a C++ library, aimed at replacing the very processing core of many of these data management systems \u2013 their execution engine.<\/span><\/p>\n<p><span>Velox improves the efficiency of these systems by providing a unified, state-of-the-art implementation of features and optimizations that were previously only available in individual engines. It also improves the engineering efficiency of our organization since these features can now be written once, in a single library, and be (re-)used everywhere.<\/span><\/p>\n<p><span>Velox is currently in different stages of integration in more than 10 of Meta\u2019s data systems. We have observed<\/span> <a href=\"https:\/\/research.facebook.com\/publications\/velox-metas-unified-execution-engine\/\" target=\"_blank\" rel=\"noopener\"><span>3-10x efficiency improvements<\/span><\/a><span> in integrations with well-known systems in the industry like Apache Spark and Presto.\u00a0<\/span><\/p>\n<p><span>We <\/span><a href=\"https:\/\/engineering.fb.com\/2023\/03\/09\/open-source\/velox-open-source-execution-engine\/\" target=\"_blank\" rel=\"noopener\"><span>open-sourced Velox in 2022<\/span><\/a><span>. Today, it is developed in collaboration with more than 200 individual contributors around the world from more than 20 companies.\u00a0<\/span><\/p>\n<h2><span>Open standards and Apache Arrow<\/span><\/h2>\n<p><span>In order to enable interoperability with other components, a composable data management system has to understand common storage (file) formats, network serialization protocols, table APIs, and have a unified way of expressing computation. Oftentimes these components have to directly share in-memory datasets with each other, for example, when transferring data across language boundaries (C++ to Java or Python) for efficient UDF support.<\/span><\/p>\n<p><span>Our focus is to use open standards in these APIs as often as possible.<\/span> <a href=\"https:\/\/arrow.apache.org\/\" target=\"_blank\" rel=\"noopener\"><span>Apache Arrow<\/span><\/a><span> is an open source in-memory layout standard for columnar data that has been widely adopted in the industry. In a way, Arrow can be seen as the layer underneath Velox: Arrow describes how columnar data is represented in memory; Velox provides a series of execution and resource management primitives to process this data.<\/span><\/p>\n<p><span>Although the Arrow format predates Velox, we made a conscious design decision while creating Velox to extend and deviate from the Arrow format, creating a layout we call<\/span> <a href=\"https:\/\/facebookincubator.github.io\/velox\/develop\/vectors.html\" target=\"_blank\" rel=\"noopener\"><span>Velox Vectors<\/span><\/a><span>. The purpose was to accelerate the data processing operations commonly found in our workloads in ways that were not possible using Arrow. Velox Vectors provided the efficiency and agility we need to move fast, but in return created a fragmented space with limited component interoperability.\u00a0<\/span><\/p>\n<p><span>To bridge this gap and create a more unified data landscape for our systems and the community, we partnered with<\/span> <span>Voltron Data<\/span><span> and the Arrow community to align and converge these two formats. After a year of work, the new Apache Arrow release,<\/span> <a href=\"https:\/\/arrow.apache.org\/release\/15.0.0.html\" target=\"_blank\" rel=\"noopener\"><span>Apache Arrow 15.0.0<\/span><\/a><span>, includes three new format layouts inspired by Velox Vectors: StringView, ListView, and Run-End-Encoding (REE).<\/span><\/p>\n<p><span>Arrow 15 not only enables efficient (zero-copy) in-memory communication across components using Velox and Arrow, but also increases Arrow\u2019s applicability in modern execution engines, unlocking a variety of use cases across the industry.\u00a0<\/span><\/p>\n<h2><span>Details of the Arrow and Velox layout<\/span><\/h2>\n<p><span>Both Arrow and Velox Vectors are columnar layouts whose purpose is to represent batches of data in memory. A column is usually composed of a sequential buffer where row values are stored contiguously and an optional bitmask to represent the nullability\/validity of each value:\u00a0<\/span><\/p>\n<p>(a) Logical and (b) physical representation of an example dataset.<\/p>\n<p><span>The Arrow and Velox Vectors formats already had compatible layout representations for scalar fixed-size data types (such as integers, floats, and booleans) and dictionary-encoded data. However, there were incompatibilities in string representation and container types such as arrays and maps, and a lack of support for constant and run-length-encoded (RLE) data.<\/span><\/p>\n<h3><span>StringView \u2013 strings<\/span><\/h3>\n<p><span>Arrow\u2019s typical string representation uses the<\/span> <a href=\"https:\/\/arrow.apache.org\/docs\/format\/Columnar.html#variable-size-binary-layout\" target=\"_blank\" rel=\"noopener\"><span>variable-sized element layout<\/span><\/a><span>, which consists of one contiguous buffer containing the string contents (the data), and one buffer marking where each string starts (the offsets). The size of a string <\/span><span>i<\/span><span> can be obtained by subtracting <\/span><span>offsets[i+1]<\/span><span> by <\/span><span>offsets[i].<\/span><span> This is equivalent to representing strings as an array of characters:<\/span><span>\u00a0<\/span><\/p>\n<p>Arrow original string representation.<\/p>\n<p><span>While Arrow\u2019s representation stands out in simplicity, we found through a series of experiments that the following alternate string representation (which is now referred to as <\/span><span>StringView<\/span><span>) provides compelling properties that are important for efficient string processing:<\/span><span>\u00a0<\/span><\/p>\n<p>New StringView representation in Arrow 15.<\/p>\n<p><span>In the<\/span> <a href=\"https:\/\/arrow.apache.org\/docs\/format\/Columnar.html#variable-size-binary-view-layout\" target=\"_blank\" rel=\"noopener\"><span>new representation<\/span><\/a><span>, the first four bytes of the <\/span><span>view<\/span><span> object always contain the string size. If the string is short (up to 12 characters), the contents are stored inline in the view structure. Otherwise, a prefix of the string is stored in the next four bytes, followed by the buffer ID (StringViews can contain multiple data buffers) and the offset in that data buffer.<\/span><\/p>\n<p><span>The benefits of this layout are:<\/span><\/p>\n<p><span>Small strings of up to 12 bytes are fully inlined within the views buffer and can be read without dereferencing the data buffer. This increases memory locality as the typical cache miss of accessing the data buffer is avoided, increasing performance.<\/span><br \/>\n<span>Since StringViews store a small (four bytes) prefix with the view object, string comparisons can fail-fast and, in many cases, avoid accessing the data buffer. This property speeds up common operations such as highly selective filters and sorting.<\/span><br \/>\n<span>StringView gives developers more flexibility on how string data is laid out in memory. For example, it allows for certain common string operations, such as <\/span><span>\ud835\udc61\ud835\udc5f\ud835\udc56\ud835\udc5a<\/span><span>() and <\/span><span>\ud835\udc60\ud835\udc62\ud835\udc4f\ud835\udc60\ud835\udc61\ud835\udc5f<\/span><span>(), to be executed zero-copy by only updating the view object.<\/span><br \/>\n<span>Since StringView\u2019s view object has a fixed size (16 bytes), StringViews can be written out of order (e.g., first writing StringView at position 2, then 0 and 1).\u00a0<\/span><\/p>\n<p><span>Besides these properties, we have found that other modern processing engines and libraries like <\/span><a href=\"https:\/\/db.in.tum.de\/~freitag\/papers\/p29-neumann-cidr20.pdf\"><span>Umbra<\/span><\/a><span> and DuckDB follow a similar string representation approach, and, consequently, also used to deviate from Arrow. In Arrow 15, StringView has been added as a supported layout and can now be used to efficiently transfer string batches across these systems.<\/span><\/p>\n<h3><span>ListView \u2013 variable-sized containers<\/span><\/h3>\n<p><span>Variable-size containers like arrays and maps are<\/span> <a href=\"https:\/\/arrow.apache.org\/docs\/format\/Columnar.html#list-layout\" target=\"_blank\" rel=\"noopener\"><span>represented in Arrow<\/span><\/a><span> using one buffer containing the flattened elements from all rows, and one <\/span><span>offsets<\/span><span> buffer marking where the container on each row starts, similar to the original string representation. The number of elements a container on row <\/span><span>i<\/span><span> stores can be obtained by subtracting <\/span><span>offsets[i+1]<\/span><span> by <\/span><span>offsets[i]<\/span><span>:<\/span><span>\u00a0<\/span><\/p>\n<p>Arrow original list representation.<\/p>\n<p><span>To efficiently support execution of <\/span><a href=\"https:\/\/vldb.org\/pvldb\/vol15\/p3372-pedreira.pdf\" target=\"_blank\" rel=\"noopener\"><span>vectorized conditionals<\/span><\/a><span> (e.g., IF and SWITCH operations), the Velox Vectors layout has to allow developers to write columns out of order. This means that developers can, for example, first write all even row records then all odd row records without having to reorganize elements that have already been written.<\/span><\/p>\n<p><span>Primitive types can always be written out of order since the element size is constant and known beforehand. Likewise, strings can also be written out of order using StringView because the string metadata objects have a constant size (16 bytes), and string contents do not need to be written contiguously. To increase flexibility and support out-of-order writes for the remaining variable-sized types in Velox, we decided to keep both <\/span><span>lengths<\/span><span> and <\/span><span>offsets<\/span><span> buffers:<\/span><\/p>\n<p>New ListView representation in Arrow 15.<\/p>\n<p><span>To bridge the gap, a new format called ListView has been added to Arrow 15. It allows the representation of variable-sized elements that have both lengths and offsets buffers.<\/span><\/p>\n<p><span>Beyond allowing for efficient execution of conditionals, ListView gives developers more flexibility to slice and rearrange containers (e.g., operations like <\/span><span>slice()<\/span><span> and <\/span><span>trim_array()<\/span><span> can be implemented zero-copy), other than allowing for containers with overlapping ranges of elements.<\/span><\/p>\n<h3><span>REE \u2013 more encodings<\/span><\/h3>\n<p><span>We have also added two additional encoding formats commonly found in data warehouse workloads into Velox: constant encoding, to represent that all values in a column are the same, typically used to represent literals and partition keys; and RLE, to compactly represent consecutive runs of the same element.<\/span><\/p>\n<p><span>Upon discussion with the community, it was decided to add the REE format to Arrow. The REE format is a slight variation of RLE that, instead of storing the lengths of each run, stores the offset in which each run ends, providing better random-access support. With REEs it is also possible to represent constant encoded values by encoding them as a single run whose size is the entire batch.<\/span><\/p>\n<h2><span>Composability is the future of data management<\/span><\/h2>\n<p><span>Converging Arrow and Velox\u2019s memory layout is an important step towards making data management systems more composable. It enables systems to combine the power of Velox\u2019s state-of-the-art execution with the widespread industry adoption of Arrow\u2019s standard, resulting in a more efficient and seamless cooperation. The new extensions are already seeing adoption in libraries like <\/span><a href=\"https:\/\/github.com\/apache\/arrow\/issues\/39852\" target=\"_blank\" rel=\"noopener\"><span>PyArrow<\/span><\/a><span> and <\/span><a href=\"https:\/\/pola.rs\/posts\/polars-string-type\/\" target=\"_blank\" rel=\"noopener\"><span>Polars<\/span><\/a><span> and within Meta. In the future, it will allow more efficient interplay between projects like <\/span><a href=\"https:\/\/github.com\/oap-project\/gluten\"><span>Apache Gluten<\/span><\/a><span> (which uses Velox internally) and <\/span><a href=\"https:\/\/spark.apache.org\/docs\/latest\/api\/python\/user_guide\/sql\/arrow_pandas.html\" target=\"_blank\" rel=\"noopener\"><span>PySpark<\/span><\/a><span> (which consumes Arrow), for example.<\/span><\/p>\n<p><span>We envision that fragmentation and duplication of work can be reduced by decomposing data systems into reusable components which are open source and built based on open standards and APIs. Ultimately, we hope this work will help provide the foundation required to accelerate the pace of innovation in data management.<\/span><\/p>\n<h2><em><span>Acknowledgments<\/span><\/em><\/h2>\n<p><span>This format alignment was only possible due to a broad collaboration across different groups. A special thank you to Masha Basmanova, Orri Erling, Xiaoxuan Meng, Krishna Pai, Jimmy Lu, Kevin Wilfong, Laith Sakka, Wei He, Bikramjeet Vig, and Sridhar Anumandla from the Velox team at Meta; Felipe Carvalho, Ben Kietzman, Jacob Wujciak-Jens, Srikanth Nadukudy, Wes McKinney, and Keith Kraus from Voltron Data; and the entire Apache Arrow community for the insightful discussions, feedback, and receptivity to new ideas.<\/span><\/p>\n<p>The post <a href=\"https:\/\/engineering.fb.com\/2024\/02\/20\/developer-tools\/velox-apache-arrow-15-composable-data-management\/\">Aligning Velox and Apache Arrow: Towards composable data management<\/a> appeared first on <a href=\"https:\/\/engineering.fb.com\/\">Engineering at Meta<\/a>.<\/p>\n<p>Engineering at Meta<\/p>","protected":false},"excerpt":{"rendered":"<p>We\u2019ve partnered with Voltron Data and the Arrow community to align and converge Apache Arrow with Velox, Meta\u2019s open source execution engine. Apache Arrow 15 includes three new format layouts developed through this partnership: StringView, ListView, and Run-End-Encoding (REE). This new convergence helps Meta and the larger community build data management systems that are unified,&hellip; <a class=\"more-link\" href=\"https:\/\/fde.cat\/index.php\/2024\/02\/20\/aligning-velox-and-apache-arrow-towards-composable-data-management\/\">Continue reading <span class=\"screen-reader-text\">Aligning Velox and Apache Arrow: Towards composable data management<\/span><\/a><\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","footnotes":""},"categories":[7],"tags":[],"class_list":["post-827","post","type-post","status-publish","format-standard","hentry","category-technology","entry"],"jetpack_featured_media_url":"","jetpack-related-posts":[{"id":869,"url":"https:\/\/fde.cat\/index.php\/2024\/05\/22\/composable-data-management-at-meta\/","url_meta":{"origin":827,"position":0},"title":"Composable data management at Meta","date":"May 22, 2024","format":false,"excerpt":"In recent years, Meta\u2019s data management systems have evolved into a composable architecture that creates interoperability, promotes reusability, and improves engineering efficiency.\u00a0 We\u2019re sharing how we\u2019ve achieved this, in part, by leveraging Velox, Meta\u2019s open source execution engine, as well as work ahead as we continue to rethink our data\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":626,"url":"https:\/\/fde.cat\/index.php\/2022\/08\/31\/introducing-velox-an-open-source-unified-execution-engine\/","url_meta":{"origin":827,"position":1},"title":"Introducing Velox: An open source unified execution engine","date":"August 31, 2022","format":false,"excerpt":"Meta is introducing Velox, an open source unified execution engine aimed at accelerating data management systems and streamlining their development. Velox is under active development. Experimental results from our paper published at the International Conference on Very Large Data Bases (VLDB) 2022 show how Velox improves efficiency and consistency in\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":786,"url":"https:\/\/fde.cat\/index.php\/2023\/11\/13\/sre-weekly-issue-398\/","url_meta":{"origin":827,"position":2},"title":"SRE Weekly Issue #398","date":"November 13, 2023","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, FireHydrant: \u201cChange is the essential process of all existence.\u201d \u2013 Spock It\u2019s time for alerting to evolve. Get a first look at how incident management platform FireHydrant is architecting Signals, its native alerting tool, for resilience in the Signals Captain\u2019s Log. https:\/\/firehydrant.com\/blog\/captains-log-a-first-look-at-our-architecture-for-signals\/\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":753,"url":"https:\/\/fde.cat\/index.php\/2023\/08\/29\/scheduling-jupyter-notebooks-at-meta\/","url_meta":{"origin":827,"position":3},"title":"Scheduling Jupyter Notebooks at Meta","date":"August 29, 2023","format":false,"excerpt":"At Meta, Bento is our internal Jupyter notebooks platform that is leveraged by many internal users. Notebooks are also being used widely for creating reports and workflows (for example, performing data ETL) that need to be repeated at certain intervals. Users with such notebooks would have to remember to manually\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":779,"url":"https:\/\/fde.cat\/index.php\/2023\/10\/31\/automating-data-removal\/","url_meta":{"origin":827,"position":4},"title":"Automating data removal","date":"October 31, 2023","format":false,"excerpt":"Meta\u2019s Systematic Code and Asset Removal Framework (SCARF) has a subsystem for identifying and removing unused data types. SCARF scans production data systems to identify tables or assets that are unused and safely removes them. SCARF avoids tedious manual work and ensures that product data is correctly removed when a\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":787,"url":"https:\/\/fde.cat\/index.php\/2023\/11\/15\/watch-metas-engineers-on-building-network-infrastructure-for-ai\/","url_meta":{"origin":827,"position":5},"title":"Watch: Meta\u2019s engineers on building network infrastructure for AI","date":"November 15, 2023","format":false,"excerpt":"Meta is building for the future of AI at every level \u2013 from hardware like MTIA v1, Meta\u2019s first-generation AI inference accelerator to publicly released models like Llama 2, Meta\u2019s next-generation large language model, as well as new generative AI (GenAI) tools like Code Llama. Delivering next-generation AI products and\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/827","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/comments?post=827"}],"version-history":[{"count":0,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/827\/revisions"}],"wp:attachment":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/media?parent=827"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/categories?post=827"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/tags?post=827"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}