{"id":651,"date":"2022-11-15T17:30:47","date_gmt":"2022-11-15T17:30:47","guid":{"rendered":"https:\/\/fde.cat\/index.php\/2022\/11\/15\/sapling-source-control-thats-user-friendly-and-scalable\/"},"modified":"2022-11-15T17:30:47","modified_gmt":"2022-11-15T17:30:47","slug":"sapling-source-control-thats-user-friendly-and-scalable","status":"publish","type":"post","link":"https:\/\/fde.cat\/index.php\/2022\/11\/15\/sapling-source-control-thats-user-friendly-and-scalable\/","title":{"rendered":"Sapling: Source control that\u2019s user-friendly and scalable"},"content":{"rendered":"<p><a href=\"https:\/\/www.sapling-scm.com\/docs\/introduction\/getting-started\" target=\"_blank\" rel=\"noopener\"><span>Sapling<\/span><\/a><span> is a new Git-compatible source control client.<\/span><br \/>\n<span>Sapling emphasizes usability while also scaling to the largest repositories in the world.<\/span><br \/>\n<span><a href=\"https:\/\/www.sapling-scm.com\/docs\/addons\/reviewstack\/\" target=\"_blank\" rel=\"noopener\">ReviewStack<\/a> is a demonstration code review UI for GitHub pull requests that integrates with Sapling to make reviewing stacks of commits easy.<\/span><br \/>\n<span>You can <\/span><a href=\"https:\/\/www.sapling-scm.com\/docs\/introduction\/getting-started\" target=\"_blank\" rel=\"noopener\"><span>get started using Sapling<\/span><\/a><span> today.\u00a0<\/span><\/p>\n<p><span>Source control is one of the most important tools for modern developers, and through tools such as Git and GitHub, it has become a foundation for the entire software industry. At Meta, source control is responsible for storing developers\u2019 in-progress code, storing the history of all code, and serving code to developer services such as build and test infrastructure. It is a critical part of our developer experience and our ability to move fast, and we\u2019ve invested heavily to build a world-class source control experience.<\/span><\/p>\n<p><span>We\u2019ve spent the past 10 years building Sapling, a scalable, user-friendly source control system, and today we\u2019re open-sourcing the <\/span><a href=\"https:\/\/www.sapling-scm.com\/\" target=\"_blank\" rel=\"noopener\"><span>Sapling client<\/span><\/a><span>. You can now try its <\/span><a href=\"https:\/\/www.sapling-scm.com\/docs\/overview\/intro\" target=\"_blank\" rel=\"noopener\"><span>various features<\/span><\/a><span> using Sapling\u2019s built-in Git support to clone any of your existing repositories. This is the first step in a longer process of making the entire Sapling system available to the world.\u00a0<\/span><\/p>\n<h2><span>What is Sapling?<\/span><\/h2>\n<p><span>Sapling is a source control system used at Meta that emphasizes usability and scalability. Git and Mercurial users will find that many of the basic concepts are familiar <\/span><span>\u2014 <\/span><span>and that workflows like understanding your repository, working with stacks of commits, and recovering from mistakes are substantially easier.<\/span><\/p>\n<p><span>When used with our Sapling-compatible server and virtual file system (we hope to open-source these in the future), Sapling can serve Meta\u2019s internal repository with tens of millions of files, tens of millions of commits, and tens of millions of branches. At Meta, Sapling is primarily used for our large monolithic repository (or monorepo, for short), but the Sapling client also supports cloning and interacting with Git repositories and can be used by individual developers to work with GitHub and other Git hosting services.<\/span><\/p>\n<h2><span>Why build a new source control system?<\/span><\/h2>\n<p><span>Sapling began 10 years ago as an initiative to make our monorepo scale in the face of tremendous growth. Public source control systems were not, and still are not, capable of handling repositories of this size. Breaking up the repository was also out of the question, as it would mean losing monorepo\u2019s benefits, such as simplified dependency management and the ability to make broad changes quickly. Instead, we decided to go all in and make our source control system scale.<\/span><\/p>\n<p><span>Starting as an extension to the Mercurial open source project, it rapidly grew into a system of its own with new storage formats, wire protocols, algorithms, and behaviors. Our ambitions grew along with it, and we began thinking about how we could improve not only the scale but also the actual experience of using source control.<\/span><\/p>\n<h2><span>Sapling\u2019s user experience<\/span><\/h2>\n<p><span>Historically, the usability of version control systems has left a lot to be desired; developers are expected to maintain a complex mental picture of the repository, and they are often forced to use esoteric commands to accomplish seemingly simple goals. We aimed to fix that with Sapling.<\/span><\/p>\n<p><span>A Git user who sits down with Sapling will initially find the basic commands familiar. Users clone a repository, make commits, amend, rebase, and push the commits back to the server. What will stand out, though, is how every command is designed for simplicity and ease of use. Each command does one thing. Local branch names are optional. There is no staging area. The list goes on.<\/span><\/p>\n<p><span>It\u2019s impossible to cover the entire user experience in a single blog post, so check out our <\/span><a href=\"https:\/\/www.sapling-scm.com\/docs\/overview\/basic-commands\" target=\"_blank\" rel=\"noopener\"><span>user experience documentation<\/span><\/a><span> to learn more.<\/span><\/p>\n\n<p><span>Below, we\u2019ll explore three particular areas of the user experience that have been so successful within Meta that we\u2019ve had requests for them outside of Meta as well.\u00a0<\/span><\/p>\n<h3><span>Smartlog: Your repo at a glance<\/span><\/h3>\n<p><span>The smartlog is one of the most important Sapling commands and the centerpiece of the entire user experience. By simply running the Sapling client with no arguments, <\/span><span>sl<\/span><span>, you can see all your local commits, where you are, where important remote branches are, what files have changed, and which commits are old and have new versions. Equally important, the smartlog hides all the information you don\u2019t care about. Remote branches you don\u2019t care about are not shown. Thousands of irrelevant commits in main are hidden behind a dashed line. The result is a clear, concise picture of your repository that\u2019s tailored to what matters to you, no matter how large your repo.<\/span><\/p>\n\n<p><span>Having this view at your fingertips changes how people approach source control. For new users, it gives them the right mental model from day one. It allows them to visually see the before-and-after effects of the commands they run. Overall, it makes people more confident in using source control. <\/span><\/p>\n<p><span>We\u2019ve even made an interactive smartlog web UI for people who are more comfortable with graphical interfaces. Simply run <\/span><span>sl web<\/span><span> to launch it in your browser. From there you can view your smartlog, commit, amend, checkout, and more.<\/span><\/p>\n\n<h3><span>Fixing mistakes with ease<\/span><\/h3>\n<p><span>The most frustrating aspect of many version control systems is trying to recover from mistakes. Understanding what you did is hard. Finding your old data is hard. Figuring out what command you should run to get the old data back is hard. The Sapling development team is small, and in order to support our tens of thousands of internal developers, we needed to make it as easy as possible to solve your own issues and get unblocked.<\/span><\/p>\n<p><span>To this end, Sapling provides a wide array of tools for understanding what you did and undoing it. Commands like <\/span><span>sl undo<\/span><span>, <\/span><span>sl redo<\/span><span>, <\/span><span>sl uncommit<\/span><span>, and <\/span><span>sl unamend<\/span><span> allow you to easily undo many operations. Commands like <\/span><span>sl hide<\/span><span> and <\/span><span>sl unhide<\/span><span> allow you to trivially and safely hide commits and bring them back to life. There is even an <\/span><span>sl undo -i<\/span><span> command for Mac and Linux that allows you to interactively scroll through old smartlog views to revert back to a specific point in time or just find the commit hash of an old commit you lost. Never again should you have to delete your repository and clone again to get things working.<\/span><\/p>\n\n<p><span>See our <\/span><a href=\"https:\/\/www.sapling-scm.com\/docs\/overview\/undo\"><span>UX doc<\/span><\/a><span> for a more extensive overview of our many recovery features.<\/span><\/p>\n<h3><span>First-class commit stacks<\/span><\/h3>\n<p><span>At Meta, working with stacks of commits is a common part of our workflow. First, an engineer building a feature will send out the small first step of that feature as a commit for code review. While it\u2019s being reviewed, they will start on the next step as a second commit that will later be sent for code review as well. A full feature will consist of many of these small, incremental, individually reviewed commits on top of one another.<\/span><\/p>\n<p><span>Working with stacks of commits is particularly difficult in many source control systems. It requires complex stateful commands like <\/span><span>git rebase -i<\/span><span> to add a single line to a commit earlier in the stack. Sapling makes this easy by providing explicit commands and workflows for making even the newest engineer able to edit, rearrange, and understand the commits in the stack.<\/span><\/p>\n<p><span>At its most basic, when you want to edit a commit in a stack, you simply check out that commit, via <\/span><span>sl goto COMMIT<\/span><span>, make your change, and amend it via <\/span><span>sl amend<\/span><span>. Sapling automatically moves, or rebases, the top of your stack onto the newly amended commit, allowing you to resolve any conflicts immediately. If you choose not to fix the conflicts now, you can continue working on that commit, and later run <\/span><span>sl restack<\/span><span> to bring your stack back together once again. Inspired by Mercurial\u2019s Evolve extension, Sapling keeps track of the mutation history of each commit under the hood, allowing it to algorithmically rebuild the stack later, no matter how many times you edit the stack.<\/span><\/p>\n\n<p><span>Beyond simply amending and restacking commits, Sapling offers a variety of commands for navigating your stack (<\/span><span>sl next<\/span><span>, <\/span><span>sl prev<\/span><span>, <\/span><span>sl goto top\/bottom<\/span><span>), adjusting your stack (<\/span><span>sl fold<\/span><span>, <\/span><span>sl split<\/span><span>), and even allows automatically pulling uncommitted changes from your working copy down into the appropriate commit in the middle of your stack (<\/span><span>sl absorb<\/span><span>, <\/span><span>sl amend \u2013to COMMIT<\/span><span>).<\/span><\/p>\n<h2><span>ReviewStack: Stack-oriented code review<\/span><\/h2>\n<p><span>Making it easy to work with stacks has many benefits: Commits become smaller, easier to reason about, and easier to review. But effectively reviewing stacks requires a code review tool that is tailored to them. Unfortunately, many external code review tools are optimized for reviewing the entire pull request at once instead of individual commits within the pull request. This makes it hard to have a conversation about individual commits and negates many of the benefits of having a stack of small, incremental, easy-to-understand commits.<\/span><\/p>\n<p><span>Therefore, we put together a demonstration website that shows just how intuitive and powerful stacked commit review flows could be. Check out our <\/span><a href=\"https:\/\/reviewstack.dev\/bolinfest\/monaco-tm\/pull\/39\" target=\"_blank\" rel=\"noopener\"><span>example stacked GitHub pull request<\/span><\/a><span>, or try it on your own pull request by visiting<\/span><a href=\"https:\/\/reviewstack.dev\/\" target=\"_blank\" rel=\"noopener\"> <span>ReviewStack<\/span><\/a><span>. You\u2019ll see how\u00a0 you can view the conversation and signal pertaining to a specific commit on a single page, and you can easily move between different parts of the stack with the drop down and navigation buttons at the top.<\/span><\/p>\n\n<h2><span>Scaling Sapling<\/span><\/h2>\n<p><span>Note: Many of our scale features require using a Sapling-specific server and are therefore unavailable in our initial client release. We describe them here as a preview of things to come. When using Sapling with a Git repository, some of these optimizations will not apply.<\/span><\/p>\n<p><span>Source control has numerous axes of growth, and making it scale requires addressing all of them: number of commits, files, branches, merges, length of file histories, size of files, and more. At its core, though, it breaks down into two parts: the history and the working copy.<\/span><\/p>\n<h3><span>Scaling history: Segmented Changelog and the art of being lazy<\/span><\/h3>\n<p><span>For large repositories, the history can be much larger than the size of the working copy you actually use. For instance, three-quarters of the 5.5 GB Linux kernel repo is the history. In Sapling, cloning the repository downloads almost no history. Instead, as you use the repository we download just the commits, trees, and files you actually need, which allows you to work with a repository that may be terabytes in size without having to actually download all of it. Although this requires being online, through efficient caching and indexes, we maintain a configurable ability to work offline in many common flows, like making a commit.<\/span><\/p>\n<p><span>Beyond just lazily downloading data, we need to be able to efficiently query history. We cannot afford to download millions of commits just to find the common ancestor of two commits or to draw the Smartlog graph. To solve this, we developed the Segmented Changelog, which allows the downloading of the high-level shape of the commit graph from the server, taking just a few megabytes, and lazily filling in individual commit data later as necessary. This enables querying the graph relationship between any two commits in O(number-of-merges) time, with nothing but the segments and the position of the two commits in the segments. The result is that commands like smartlog are less than a second, regardless of how big the repository is.<\/span><\/p>\n\n<p><span>Segmented Changelog speeds up other algorithms as well. When running <\/span><span>log<\/span><span> or <\/span><span>blame<\/span><span> on a file, we\u2019re able to bisect the segment graph to find the history in O(log n) time, instead of O(n), even in Git repositories. When used with our Sapling-specific server, we go even further, maintaining per-file history graphs that allow answering <\/span><span>sl log FILE<\/span><span> in less than a second, regardless of how old the file is.<\/span><\/p>\n<h3><span>Scaling the working copy: Virtual or Sparse<\/span><\/h3>\n<p><span>To scale the working copy, we\u2019ve developed a virtual file system (not yet publicly available) that makes it look and act as if you have the entire repository. Clones and checkouts become very fast, and while accessing a file for the first time requires a network request, subsequent accesses are fast and prefetching mechanisms can warm the cache for your project.<\/span><\/p>\n<p><span>Even without the virtual file system, we speed up <\/span><span>sl status<\/span><span> by utilizing Meta\u2019s <\/span><a href=\"https:\/\/facebook.github.io\/watchman\/\" target=\"_blank\" rel=\"noopener\"><span>Watchman file system monitor<\/span><\/a><span> to query which files have changed without scanning the entire working copy, and we have special support for sparse checkouts to allow checking out only part of the repository.<\/span><\/p>\n<p><span>Sparse checkouts are particularly designed for easy use within large organizations. Instead of each developer configuring and maintaining their own list of which files should be included, organizations can commit \u201csparse profiles\u201d into the repository. When a developer clones the repository, they can choose to enable the sparse profile for their particular product. As the product\u2019s dependencies change over time, the sparse profile can be updated by the person changing the dependencies, and every other engineer will automatically receive the new sparse configuration when they checkout or rebase forward. This allows thousands of engineers to work on a constantly shifting subset of the repository without ever having to think about it.<\/span><\/p>\n<p><span>To handle large files, Sapling even supports using a Git LFS server.<\/span><\/p>\n<h2><span>More to Come<\/span><\/h2>\n<p><span>The Sapling client is just the first chapter of this story. In the future, we aim to open-source the Sapling-compatible virtual file system, which enables working with arbitrarily large working copies and making checkouts fast, no matter how many files have changed.<\/span><\/p>\n<p><span>Beyond that, we hope to open-source the Sapling-compatible server: the scalable, distributed source control Rust service we use at Meta to serve Sapling and (soon) Git repositories. The server enables a multitude of new source control experiences. With the server, you can incrementally migrate repositories into (or out of) the monorepo, allowing you to experiment with monorepos before committing to them. It also enables Commit Cloud, where all commits in your organization are uploaded as soon as they are made, and sharing code is as simple as sending your colleague a commit hash and having them run <\/span><span>sl goto HASH<\/span><span>.<\/span><\/p>\n<p><span>The release of this post marks my 10th year of working on Sapling at Meta, almost to the day. It\u2019s been a crazy journey, and a single blog post cannot cover all the amazing work the team has done over the last decade. I highly encourage you to check out our <\/span><a href=\"https:\/\/www.sapling-scm.com\/docs\/overview\/intro\" target=\"_blank\" rel=\"noopener\"><span>armchair walkthrough<\/span><\/a><span> of Sapling\u2019s cool features. I\u2019d also like to thank the Mercurial open source community for all their collaboration and inspiration in the early days of Sapling, which started the journey to what it is today.<\/span><\/p>\n<p><span>I hope you find Sapling as pleasant to use as we do, and that Sapling might start a conversation about the current state of source control and how we can all hold the bar higher for the source control of tomorrow.<\/span><span>See the <\/span><a href=\"https:\/\/www.sapling-scm.com\/docs\/introduction\/getting-started\"><span>Getting Started<\/span><\/a><span> page to try Sapling today.<\/span><\/p>\n<p>The post <a href=\"https:\/\/engineering.fb.com\/2022\/11\/15\/open-source\/sapling-source-control-scalable\/\">Sapling: Source control that\u2019s user-friendly and scalable<\/a> appeared first on <a href=\"https:\/\/engineering.fb.com\/\">Engineering at Meta<\/a>.<\/p>\n<p>Engineering at Meta<\/p>","protected":false},"excerpt":{"rendered":"<p>Sapling is a new Git-compatible source control client. Sapling emphasizes usability while also scaling to the largest repositories in the world. ReviewStack is a demonstration code review UI for GitHub pull requests that integrates with Sapling to make reviewing stacks of commits easy. You can get started using Sapling today.\u00a0 Source control is one of&hellip; <a class=\"more-link\" href=\"https:\/\/fde.cat\/index.php\/2022\/11\/15\/sapling-source-control-thats-user-friendly-and-scalable\/\">Continue reading <span class=\"screen-reader-text\">Sapling: Source control that\u2019s user-friendly and scalable<\/span><\/a><\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","footnotes":""},"categories":[7],"tags":[],"class_list":["post-651","post","type-post","status-publish","format-standard","hentry","category-technology","entry"],"jetpack_featured_media_url":"","jetpack-related-posts":[{"id":728,"url":"https:\/\/fde.cat\/index.php\/2023\/06\/27\/meta-developer-tools-working-at-scale\/","url_meta":{"origin":651,"position":0},"title":"Meta developer tools: Working at scale","date":"June 27, 2023","format":false,"excerpt":"Every day, thousands of developers at Meta are working in repositories with millions of files. Those developers need tools that help them at every stage of the workflow while working at extreme scale. In this article we\u2019ll go through a few of the tools in the development process. And, as\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":820,"url":"https:\/\/fde.cat\/index.php\/2024\/02\/06\/dotslash-simplified-executable-deployment\/","url_meta":{"origin":651,"position":1},"title":"DotSlash: Simplified executable deployment","date":"February 6, 2024","format":false,"excerpt":"We\u2019ve open sourced DotSlash, a tool that makes large executables available in source control with a negligible impact on repository size, thus avoiding I\/O-heavy clone operations. With DotSlash, a set of platform-specific executables is replaced with a single script containing descriptors for the supported platforms. DotSlash handles transparently fetching, decompressing,\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":227,"url":"https:\/\/fde.cat\/index.php\/2021\/02\/02\/heroku-ci-and-github-checks-integration\/","url_meta":{"origin":651,"position":2},"title":"Heroku CI and Github Checks Integration","date":"February 2, 2021","format":false,"excerpt":"Heroku CI and GitHub Checks IntegrationBefore we dive into the crux of this article, let\u2019s first get an understanding of what GitHub Checks is and how it will be useful for you when you use any external Continuous Integration (CI) tool like CircleCI, Heroku CI, or any local tool. The\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":697,"url":"https:\/\/fde.cat\/index.php\/2023\/04\/06\/build-faster-with-buck2-our-open-source-build-system\/","url_meta":{"origin":651,"position":3},"title":"Build faster with Buck2: Our open source build system","date":"April 6, 2023","format":false,"excerpt":"Buck2, our new open source, large-scale build system, is now available on GitHub. Buck2 is an extensible and performant build system written in Rust and designed to make your build experience faster and more efficient.\u00a0 In our internal tests at Meta, we observed that Buck2 completed builds 2x as fast\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":683,"url":"https:\/\/fde.cat\/index.php\/2023\/02\/21\/what-is-the-secret-behind-increasing-salesforces-developer-velocity\/","url_meta":{"origin":651,"position":4},"title":"What is the Secret Behind Increasing Salesforce\u2019s Developer Velocity?","date":"February 21, 2023","format":false,"excerpt":"From retail to healthcare to IT and beyond, countless industries rely on software development to enhance business performance. However, to optimize software innovation and performance, companies must create enhanced environments that remove productivity blockers and deliver great experiences for developers. By empowering engineers to focus more on building new features\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":799,"url":"https:\/\/fde.cat\/index.php\/2023\/12\/05\/explaining-salesforces-large-scale-migration-to-git-how-we-enhanced-developer-productivity\/","url_meta":{"origin":651,"position":5},"title":"Explaining Salesforce\u2019s Large-Scale Migration to Git: How We Enhanced Developer Productivity","date":"December 5, 2023","format":false,"excerpt":"By Patrick Calahan and Scott Nyberg As new developer productivity technologies emerge, small and nimble enterprises with newer codebases swiftly embrace innovation. Conversely, larger organizations, rooted in larger and aging codebases, face obstacles replacing legacy technologies. Salesforce faced such a challenge with its primary Source Code Management (SCM) system. For\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/651","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/comments?post=651"}],"version-history":[{"count":0,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/651\/revisions"}],"wp:attachment":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/media?parent=651"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/categories?post=651"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/tags?post=651"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}