{"id":875,"date":"2024-06-10T16:30:55","date_gmt":"2024-06-10T16:30:55","guid":{"rendered":"https:\/\/fde.cat\/index.php\/2024\/06\/10\/serverless-jupyter-notebooks-at-meta\/"},"modified":"2024-06-10T16:30:55","modified_gmt":"2024-06-10T16:30:55","slug":"serverless-jupyter-notebooks-at-meta","status":"publish","type":"post","link":"https:\/\/fde.cat\/index.php\/2024\/06\/10\/serverless-jupyter-notebooks-at-meta\/","title":{"rendered":"Serverless Jupyter Notebooks at Meta"},"content":{"rendered":"<p><span>At Meta, <\/span><a href=\"https:\/\/developers.facebook.com\/blog\/post\/2021\/09\/20\/eli5-bento-interactive-notebook-empowers-development-collaboration-best-practices\/\" target=\"_blank\" rel=\"noopener\"><span>Bento<\/span><\/a><span>, our internal <\/span><a href=\"https:\/\/jupyter.org\/\" target=\"_blank\" rel=\"noopener\"><span>Jupyter<\/span><\/a><span> notebooks platform, is a popular tool that allows our engineers to mix code, text, and multimedia in a single document. Use cases run the entire spectrum from what we call \u201clite\u201d workloads that involve simple prototyping to heavier and more complex machine learning workflows. However, even though the lite workflows require limited compute, users still have to go through the same process of reserving and provisioning remote compute \u2013 a process that takes time \u2013 before the notebook is ready for any code execution.<\/span><\/p>\n<p><span>To address this problem, we have invested in building infrastructure that allows for code execution directly in the browser, removing the need to provision remote compute for some lite workloads. This infrastructure leverages a library called <\/span><a href=\"https:\/\/pyodide.org\/en\/stable\/\" target=\"_blank\" rel=\"noopener\"><span>Pyodide<\/span><\/a><span> that sits on top of <\/span><a href=\"https:\/\/webassembly.org\/\" target=\"_blank\" rel=\"noopener\"><span>WebAssembly<\/span><\/a><span> (<\/span><span>Wasm<\/span><span>)<\/span><\/p>\n<p><span>Here\u2019s how we married Bento with this in-browser, serverless code execution technology to power our notebooks platform for these lite workloads.<\/span><\/p>\n<h2><span>The motivation for supporting lite workloads<\/span><\/h2>\n<p><span>We define lite workloads as workloads that only consume data from upstream systems, do not have side effects to our underlying systems, and use up to the maximum Chrome tab memory limit. We frequently get internal feedback from the owners of these lite workloads that the time and complexity in getting started is not proportionate to what they want to use Bento for.\u00a0<\/span><\/p>\n<p><span>The requirements can be summarized as follows:<\/span><\/p>\n<p><span>An intuitive startup process that works right out of the box\u00a0<\/span><br \/>\n<span>A startup process that is very quick and has the notebook immediately ready for execution\u00a0<\/span><br \/>\n<span>A startup process that does not include the complex remote compute reservation process\u00a0<\/span><br \/>\n<span>An execution environment that supports the majority of the lite workloads<\/span><\/p>\n<h2><span>How we put the pieces together<\/span><\/h2>\n\n<h3><span>How this all works<\/span><\/h3>\n<p><a href=\"https:\/\/pyodide.org\/en\/stable\/\" target=\"_blank\" rel=\"noopener\"><span>Pyodide<\/span><\/a><span> (a Python distribution for the browser that runs on <\/span><a href=\"https:\/\/webassembly.org\/\" target=\"_blank\" rel=\"noopener\"><span>WebAssembly<\/span><\/a><span>) is an important ingredient for this work. We\u2019ve built a kernel abstraction around this which, when called from Bento, will just work as any of the classic kernels we have (with some limitations) and perform message passing using the <\/span><a href=\"https:\/\/jupyter-client.readthedocs.io\/en\/latest\/messaging.html\" target=\"_blank\" rel=\"noopener\"><span>Jupyter Protocol<\/span><\/a><span>.<\/span><span><br \/>\n<\/span><\/p>\n<h4><span>Kernel bridge\u00a0<\/span><\/h4>\n<p><span>This is just an abstraction that allows Bento to work with both traditional server-based kernels and this new browser-based kernel with no changes whatsoever to the rest of the system. The visible manifestation of this is just a selector in the notebook that toggles between server-based kernels and serverless.<\/span><\/p>\n\n<h4><span>Magics<\/span><\/h4>\n<p><a href=\"https:\/\/ipython.readthedocs.io\/en\/stable\/interactive\/magics.html\" target=\"_blank\" rel=\"noopener\"><span>Cell magics<\/span><\/a><span> are an important\u00a0 component of the Bento extension platform. In order to allow existing custom cells to work with no changes, we built middleware to capture these cell magics, process them directly in the context of javascript, and then just inject the expected results back into the Python kernel. A good example of this pattern is around <\/span><span>%%sql<\/span><span>, which we use to power our custom SQL <\/span><span>cell.\u00a0<\/span><\/p>\n<p><span>We\u2019ll showcase a few more examples in the section below on \u201cMeta-specific\u201d integrations.<\/span><\/p>\n<h4><span>Why we need a webworker<\/span><\/h4>\n<p><span>Since J<\/span><span>avaScript<\/span><span> is single-threaded, in the absence of a webworker, the entire browser would just lock up when we have \u201cexpensive\u201d kernel operations. Having kernel operations run in a webworker with just the results being passed to the main thread helps mitigate this.<\/span><\/p>\n\n<h2><span>Meta-specific integrations\u00a0<\/span><\/h2>\n<p><span>In order to unlock additional utility and have a coherent story around the extract, transform, and load (ETL) narrative, we built integrations with an initial set of existing extensions. These represent a relatively popular set of extensions that users leverage to perform data operations.<\/span><\/p>\n<h3><span>SQL Cell<\/span><\/h3>\n<p><span>This leverages the <\/span><span>%%sql<\/span><span> magic to fetch data from the warehouse and make it available for further processing in the <\/span><span>Pyodide<\/span><span> kernel.<\/span><\/p>\n\n<h3><span>Google Sheets\u00a0<\/span><\/h3>\n<p><span>Here, we leverage the <\/span><span>%%googlesheet<\/span><span> magic to fetch data from a Google sheet and make it available for further processing in the notebook.<\/span><\/p>\n\n<h3><span>GraphQL<\/span><\/h3>\n<p><span>Here, we leverage <\/span><span>%%graphql<\/span><span> magic, which powers the GraphQL cell to make data fetches and then inject the result back into the kernel for further processing.<\/span><\/p>\n\n<h3><span>Dataframe uploads<\/span><\/h3>\n<p><span>Data uploads are a bit trickier to pull off as compared to the data reads we showcased above. We instead achieve this functionality by:<\/span><\/p>\n<p><span>Leveraging the <\/span><span>%%dataframe<\/span><span> magic that powers the upload custom cell in order to fetch the arguments in a structured way.<\/span><br \/>\n<span>We then kick off an async job using <\/span><a href=\"https:\/\/engineering.fb.com\/2019\/06\/06\/data-center-engineering\/twine\/attachment\/tupperware-002-2\/\" target=\"_blank\" rel=\"noopener\"><span>Tupperware<\/span><\/a><span><span> (Meta\u2019s async tier compute platform) and show the status of the associated tupperware job in the cell output.<\/span><\/span><\/p>\n<h2><span>What\u2019s next for serverless notebooks<\/span><\/h2>\n<p><span>While we\u2019ve addressed the initial set of challenges to bring this product online, there is still a lot of work to be done to improve the developer experience for users. Firstly, we\u2019re planning on improving the lite workloads heuristic. Once we have this figured out, the next step will involve defaulting all new workloads to start as serverless. Then we can quickly autodetect (based on memory requirements, data volumes, or libraries in use) whether the workload is lite enough. If not, we can automatically switch that notebook to leverage a server-based kernel with minimal interruption to the user flow.<\/span><\/p>\n<p><span>After this, we plan to integrate with more existing cell extensions built on top of the Bento platform and thus expand the scope of what\u2019s possible when running \u201cserverless.\u201d<\/span><\/p>\n<p><span>The biggest limitation with this approach at Meta is that homegrown libraries that have not been ported to WebAssembly will be unavailable. Given this, we\u2019re also planning to explore whether we can farm out the execution of specific \u201cnon-lite\u201d cells to our remote execution infrastructure while making this work seamlessly with Pyodide.<\/span><\/p>\n<p><span>Once these have been addressed, \u201cserverless\u201d notebooks will become the de facto landing experience in Bento.<\/span><\/p>\n<h2><span>Acknowledgments\u00a0<\/span><\/h2>\n<p><span>Some of the approaches we took were directly inspired by the work done on <\/span><a href=\"https:\/\/github.com\/jupyterlite\" target=\"_blank\" rel=\"noopener\"><span>JupyterLite<\/span><\/a><span> and directly leverages the <\/span><a href=\"https:\/\/pyodide.org\/en\/stable\/\" target=\"_blank\" rel=\"noopener\"><span>Pyodide<\/span><\/a><span> library without which this project would not have been possible. I\u2019d also like to thank all the engineers at Meta I collaborated with to make this project a reality.<\/span><\/p>\n<p>The post <a href=\"https:\/\/engineering.fb.com\/2024\/06\/10\/data-infrastructure\/serverless-jupyter-notebooks-bento-meta\/\">Serverless Jupyter Notebooks at Meta<\/a> appeared first on <a href=\"https:\/\/engineering.fb.com\/\">Engineering at Meta<\/a>.<\/p>\n<p>Engineering at Meta<\/p>","protected":false},"excerpt":{"rendered":"<p>At Meta, Bento, our internal Jupyter notebooks platform, is a popular tool that allows our engineers to mix code, text, and multimedia in a single document. Use cases run the entire spectrum from what we call \u201clite\u201d workloads that involve simple prototyping to heavier and more complex machine learning workflows. However, even though the lite&hellip; <a class=\"more-link\" href=\"https:\/\/fde.cat\/index.php\/2024\/06\/10\/serverless-jupyter-notebooks-at-meta\/\">Continue reading <span class=\"screen-reader-text\">Serverless Jupyter Notebooks at Meta<\/span><\/a><\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","footnotes":""},"categories":[7],"tags":[],"class_list":["post-875","post","type-post","status-publish","format-standard","hentry","category-technology","entry"],"jetpack_featured_media_url":"","jetpack-related-posts":[{"id":753,"url":"https:\/\/fde.cat\/index.php\/2023\/08\/29\/scheduling-jupyter-notebooks-at-meta\/","url_meta":{"origin":875,"position":0},"title":"Scheduling Jupyter Notebooks at Meta","date":"August 29, 2023","format":false,"excerpt":"At Meta, Bento is our internal Jupyter notebooks platform that is leveraged by many internal users. Notebooks are also being used widely for creating reports and workflows (for example, performing data ETL) that need to be repeated at certain intervals. Users with such notebooks would have to remember to manually\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":566,"url":"https:\/\/fde.cat\/index.php\/2022\/04\/26\/sql-notebooks-combining-the-power-of-jupyter-and-sql-editors-for-data-analytics\/","url_meta":{"origin":875,"position":1},"title":"SQL Notebooks: Combining the power of Jupyter and SQL editors for data analytics","date":"April 26, 2022","format":false,"excerpt":"At Meta, our internal data tools are the main channel from our data scientists to our production engineers. As such, it\u2019s important for us to empower our scientists and engineers not only to use data to make decisions, but also to do so in a secure and compliant way. We\u2019ve\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":758,"url":"https:\/\/fde.cat\/index.php\/2023\/09\/07\/using-chakra-execution-traces-for-benchmarking-and-network-performance-optimization\/","url_meta":{"origin":875,"position":2},"title":"Using Chakra execution traces for benchmarking and network performance optimization","date":"September 7, 2023","format":false,"excerpt":"Meta presents Chakra execution traces, an open graph-based representation of AI\/ML workload execution, laying the foundation for benchmarking and network performance optimization. Chakra execution traces represent key operations, such as compute, memory, and communication, data and control dependencies, timing, and resource constraints. In collaboration with MLCommons, we are seeking industry-wide\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":697,"url":"https:\/\/fde.cat\/index.php\/2023\/04\/06\/build-faster-with-buck2-our-open-source-build-system\/","url_meta":{"origin":875,"position":3},"title":"Build faster with Buck2: Our open source build system","date":"April 6, 2023","format":false,"excerpt":"Buck2, our new open source, large-scale build system, is now available on GitHub. Buck2 is an extensible and performant build system written in Rust and designed to make your build experience faster and more efficient.\u00a0 In our internal tests at Meta, we observed that Buck2 completed builds 2x as fast\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":759,"url":"https:\/\/fde.cat\/index.php\/2023\/09\/07\/arcadia-an-end-to-end-ai-system-performance-simulator\/","url_meta":{"origin":875,"position":4},"title":"Arcadia: An end-to-end AI system performance simulator","date":"September 7, 2023","format":false,"excerpt":"We\u2019re introducing Arcadia, Meta\u2019s unified system that simulates the compute, memory, and network performance of AI training clusters. Extracting maximum performance from an AI cluster and increasing overall efficiency warrants a multi-input system that accounts for various hardware and software parameters across compute, storage, and network collectively. Arcadia gives Meta\u2019s\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":694,"url":"https:\/\/fde.cat\/index.php\/2023\/03\/23\/big-data-processing-driving-data-migration-for-salesforce-data-cloud\/","url_meta":{"origin":875,"position":5},"title":"Big Data Processing: Driving Data Migration  for Salesforce Data Cloud","date":"March 23, 2023","format":false,"excerpt":"The tsunami of data \u2014 set to exceed 180 zettabytes by 2025 \u2014 places significant pressure on companies. Simply having access to customer information is not enough \u2014 companies must also analyze and refine the data to find actionable pieces that power new business. As businesses collect these volumes of\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/875","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/comments?post=875"}],"version-history":[{"count":0,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/875\/revisions"}],"wp:attachment":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/media?parent=875"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/categories?post=875"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/tags?post=875"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}