{"id":777,"date":"2023-10-24T16:00:09","date_gmt":"2023-10-24T16:00:09","guid":{"rendered":"https:\/\/fde.cat\/index.php\/2023\/10\/24\/automating-dead-code-cleanup\/"},"modified":"2023-10-24T16:00:09","modified_gmt":"2023-10-24T16:00:09","slug":"automating-dead-code-cleanup","status":"publish","type":"post","link":"https:\/\/fde.cat\/index.php\/2023\/10\/24\/automating-dead-code-cleanup\/","title":{"rendered":"Automating dead code cleanup"},"content":{"rendered":"<p><span>Meta\u2019s Systematic Code and Asset Removal Framework (SCARF) has a subsystem for identifying and removing dead code.<\/span><br \/>\n<span>SCARF combines static and dynamic analysis of programs to detect dead code from both a business and programming language perspective.<\/span><br \/>\n<span>SCARF automatically creates change requests that delete the dead code identified from the program analysis, minimizing developer costs.<\/span><\/p>\n<p><span>In our last blog post on <\/span><a href=\"https:\/\/engineering.fb.com\/2023\/10\/17\/data-infrastructure\/automating-product-deprecation-meta\/\"><span>automatic product deprecation<\/span><\/a><span>, we talked about the complexities of product deprecations, and a solution Meta has built called the Systematic Code and Asset Removal Framework (SCARF). As an example, we looked at <\/span><a href=\"https:\/\/about.fb.com\/news\/2015\/06\/introducing-moments\/\"><span>Moments<\/span><\/a><span>, the photo sharing app Meta launched in 2015 and eventually shut down in 2019, and how SCARF can help with the deprecation process through its workflow management capabilities. We discussed how SCARF saves engineering time by identifying the correct order of tasks for cleaning up a product and how it can be blocked from automating the cleanup when there are intersystem dependencies. This naturally leads to the question: How do we automatically unblock SCARF when there is code that references an asset?<\/span><\/p>\n<h2><span>Dead code removal in SCARF<\/span><\/h2>\n<p><span>SCARF contains a subsystem that automatically identifies dead code through a combination of static, runtime, and application analysis. It leverages this analysis to submit change requests to remove this code from our systems. This automated dead code removal improves the quality of our systems and also unblocks unused data removal in SCARF when the dead code includes references to data assets that prevent automated data cleanup.\u00a0<\/span><\/p>\n<h2><span>Code analysis<\/span><\/h2>\n<p><span>SCARF\u2019s code analysis subsystem gathers information from a variety of sources. First, a code dependency graph for each language is extracted from our compilers via <\/span><a href=\"https:\/\/glean.software\/\"><span>Glean<\/span><\/a><span>. This is then augmented with further information, like the usage of API endpoints from operational logs that determine whether an endpoint is used at runtime. Additional examples of domain-specific usage encoded include:\u00a0<\/span><\/p>\n<p><span>Script invocations for internal developer tools and system management commands.<\/span><br \/>\n<span>Template hooks for dynamically rendering pages in the <\/span><a href=\"https:\/\/instagram-engineering.com\/web-service-efficiency-at-instagram-with-python-4976d078e366\"><span>Instagram Django backend and URI handler and routing<\/span><\/a><span>.<\/span><br \/>\n<span>Async\u2019s dynamically referenced dispatch methods (<\/span><a href=\"https:\/\/engineering.fb.com\/2020\/08\/17\/production-engineering\/async\/\"><span>Meta\u2019s deferred job execution service<\/span><\/a><span>).<\/span><\/p>\n<p><span>SCARF must be capable of introspecting any and all types of dynamic usage in addition to the static dependency graph to make accurate determinations of whether a piece of code is truly safe to remove. These are combined and form an augmented dependency graph.<\/span><\/p>\n\n<p><span>SCARF supports multiple programming languages. This is very important, as products at Meta may have client code written in Java, Objective-C, and JavaScript, with server code written in <\/span><a href=\"https:\/\/hacklang.org\/\"><span>Hack<\/span><\/a><span>, and some backend infrastructure written in <\/span><a href=\"https:\/\/engineering.fb.com\/2023\/10\/05\/developer-tools\/python-312-meta-new-features\/\"><span>Python<\/span><\/a><span>. All of these pieces of code should be deleted as they all combine to form the same dependency graph since they are associated via APIs and other known forms of dynamic and language-spanning references.\u00a0<\/span><\/p>\n<p><span>SCARF operates at a<\/span> <span>symbol level as opposed to a file level, which allows for more granular analysis and cleanup. For example, an individual variable that is unused in a function will have its own fully qualified symbol, which allows for more granular cleanup than is possible at the file level.\u00a0<\/span><\/p>\n<h2><span>Garbage collection<\/span><\/h2>\n<p><span>SCARF analyzes the augmented dependency graph to identify unreachable nodes and subgraphs that can be deleted and will automatically generate code change requests to delete the corresponding code on a daily basis. A key benefit of analyzing the complete graph is that we can detect and delete cycles, where different parts of the codebase depend on each other. Deleting entire subgraphs accelerates the deletion of dead code and provides a better experience for the engineers leveraging this automation in their deprecations.<\/span><\/p>\n<p><span>It\u2019s important that the graph contains the augmented information, as static analysis alone may not reveal links between components created through dynamic references or runtime language features. There is a trade-off, though, in that augmenting the graph with dynamic usage information requires the full processing of the indexed code and the subsequent data analysis pipelines that provide the metrics. This increases the end to end duration of the entire process which can make prototyping new features or capabilities more difficult.\u00a0<\/span><\/p>\n<p><span>Earlier versions of SCARF avoided this upfront cost by taking a different approach. It analyzed each discoverable symbol individually and at runtime would run classifiers that queried for static and dynamic references in order to find dead root nodes \u2014 pieces of code with no inbound dependencies. This did not require the upfront construction of the complete dependency graph and simplified the process of running the system over small subsets of the codebase. As a result, it was trivial to prototype new classifiers that identified potential dynamic references without requiring time-consuming indexing or data analysis.\u00a0<\/span><\/p>\n<p><span>However, this longer end-to-end development cycle led to a dramatic improvement in coverage. The transition from analyzing individual symbols to the entire graph led to a nearly 50% increase in dead code removed from one of Meta\u2019s largest codebases. The new approach improves visibility into the state of our codebases: how much is alive, how much is dead, and how much of that we are removing in any given pass of SCARF.<\/span><\/p>\n<h2><span>Fine-tuning the dependency graph<\/span><\/h2>\n<p><span>Many of the dependencies that we index using Glean are for patterns of code invocation which do not necessarily block the deletion of that code. For example, let\u2019s say we had a class <\/span><span>PhotoRenderer<\/span><span>, and the only dependency on it was in code like this:<\/span><\/p>\n<p>if isinstance(renderer, PhotoRenderer):<br \/>\n    return renderer.render_photo()<br \/>\nelse:<br \/>\n    return renderer.render_generic()<\/p>\n<p><span>In this case, the references to PhotoRenderer and <\/span><span>render_photo()<\/span><span> can be removed, and the code changed to this:<\/span><\/p>\n<p>return renderer.render_generic()<\/p>\n<p><span>In this example, the class, PhotoRenderer, was <\/span>inlined<span> based on a rule derived from the semantics of Python: if there are no places where the PhotoRenderer class is instantiated, we can be confident that this code cannot take the first branch and it is therefore dead.<\/span><\/p>\n<p><span>In some cases, we derive these rules based on our application semantics as opposed to language semantics. Imagine this code:<\/span><\/p>\n<p>uri_dispatch = {<br \/>\n  &#8216;\/home\/&#8217;: HomeController,<br \/>\n  &#8216;\/photos\/&#8217;: PhotosController,<br \/>\n  &#8230;<br \/>\n}<\/p>\n<p><span>If we only analyzed a language-level dependency graph, it would be impossible to determine whether or not PhotosController is ever referenced as it can be invoked via this URI dispatch mechanism. However, if we know from our application analysis that the \u2018\/photos\/\u2019 endpoint never receives any requests in production, then we could remove the corresponding entry from this dictionary.\u00a0<\/span><\/p>\n<p><span>There\u2019s no inherent way to infer this given Python\u2019s language semantics, but our domain-specific logging and graph augmentation allow us to inform SCARF that this operation is safe.<\/span><\/p>\n<h2><span>Automating code changes<\/span><\/h2>\n<p><span>At Meta, we heavily automate changes to code. We built an internal service, called CodemodService, which empowers engineers to deploy configurations to automate code changes at scale. SCARF was the first instance of company-wide, fully automated code changes at Meta, and was built hand-in-hand alongside CodemodService. Today, CodemodService also powers hundreds of other types of automated code changes at Meta, from automating the formatting of code, automatically removing completed experiments, empowering large-scale API migrations, to improving coverage of strong types in partially-typed languages like Python and Hack.<\/span><\/p>\n<h2><span>Dead code removal at scale<\/span><\/h2>\n<p><span>SCARF uses CodemodService to create code change requests for engineers to review. These change requests incorporate human-readable descriptions informing engineers about the analysis that determined the targeted code is provably dead.\u00a0<\/span><\/p>\n\n<p><span>SCARF has grown to analyze hundreds of millions of lines of code; and five years on, it has automatically deleted more than 100 million lines of code in over 370,000 change requests. False-positives caught by engineers during code review are triaged and used to improve the analysis that SCARF performs and typically reflect new sources of dynamic usage that our augmented graphs must account for. Sometimes these misunderstood dynamic references can lead to incorrect deletion of code, and these deletions can make it to production. Meta has<\/span><a href=\"https:\/\/engineering.fb.com\/2017\/08\/31\/web\/rapid-release-at-massive-scale\/\"><span> other mechanisms in place to catch these problems<\/span><\/a><span> and we take such incidents very seriously.<\/span><\/p>\n<p><span>In some languages, we have such high confidence in our analysis that we can automatically accept and merge the change requests without human intervention to make better use of engineers\u2019 valuable time.\u00a0<\/span><\/p>\n<h2><span>Is dead code removal sufficient?<\/span><\/h2>\n<p><span>SCARF\u2019s automated dead code removal accelerates the process of shutting down and removing the code and data for deprecated products, but it does not solve it fully. Beyond the problems caused by interconnectivity, we are constantly improving our ability to integrate across all languages, systems, and frameworks at Meta. It is difficult to accurately cover every type of usage of code and data that enables our systems to determine what is truly dead.\u00a0<\/span><\/p>\n<p><span>Our systems also err on the side of caution, by searching for textual references to code and data through our <\/span><a href=\"https:\/\/www.facebook.com\/watch\/?v=1911812842425144\"><span>BigGrep system<\/span><\/a><span> and not solely relying on the curated graphs produced through Glean and our dynamic usage augmentations. This is a fallback safety mechanism that helps avoid accidentally deleting MySQL tables that are referenced by name in other languages and preventing deletions of dynamically invoked code in languages like Hack, Python, and JavaScript that can call code through string references or use <\/span><span>eval<\/span><span>. This approach can cause false negatives, but avoids false positives. When automating the removal of dead code, those are a more serious problem.<\/span><\/p>\n<p><span>As mentioned in <\/span><a href=\"https:\/\/engineering.fb.com\/2023\/10\/17\/data-infrastructure\/automating-product-deprecation-meta\/\"><span>our first post<\/span><\/a><span> of this series, SCARF provides workflow management features that work together with the dead code subsystem to provide a cohesive experience for fully deprecating products and features. Crucially, our engineers can iterate on code changes faster than our automation! If an engineer understands that a change has rendered a branch of code (and therefore an entire subgraph) unreachable, they can easily incorporate that deletion into their changes without waiting for our infrastructure to index the new code, analyze it, and eventually get around to submitting its automated changes. Engineers sometimes find it more productive to manually delete things rather than waiting to see if the automated systems will clean it up for them later.<\/span><\/p>\n<p><span>In the next and final blog post in this series, we will look at SCARF\u2019s unused data type\u00a0 subsystem that Meta has built that, in conjunction with the dead code subsystem, amplifies Meta\u2019s data minimization capabilities by automating the removal of dead and unused assets.<\/span><\/p>\n<p>The post <a href=\"https:\/\/engineering.fb.com\/2023\/10\/24\/data-infrastructure\/automating-dead-code-cleanup\/\">Automating dead code cleanup<\/a> appeared first on <a href=\"https:\/\/engineering.fb.com\/\">Engineering at Meta<\/a>.<\/p>\n<p>Engineering at Meta<\/p>","protected":false},"excerpt":{"rendered":"<p>Meta\u2019s Systematic Code and Asset Removal Framework (SCARF) has a subsystem for identifying and removing dead code. SCARF combines static and dynamic analysis of programs to detect dead code from both a business and programming language perspective. SCARF automatically creates change requests that delete the dead code identified from the program analysis, minimizing developer costs.&hellip; <a class=\"more-link\" href=\"https:\/\/fde.cat\/index.php\/2023\/10\/24\/automating-dead-code-cleanup\/\">Continue reading <span class=\"screen-reader-text\">Automating dead code cleanup<\/span><\/a><\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","footnotes":""},"categories":[7],"tags":[],"class_list":["post-777","post","type-post","status-publish","format-standard","hentry","category-technology","entry"],"jetpack_featured_media_url":"","jetpack-related-posts":[{"id":772,"url":"https:\/\/fde.cat\/index.php\/2023\/10\/17\/automating-product-deprecation\/","url_meta":{"origin":777,"position":0},"title":"Automating product deprecation","date":"October 17, 2023","format":false,"excerpt":"Systematic Code and Asset Removal Framework (SCARF) is Meta\u2019s unused code and data deletion framework. SCARF guides engineers through deprecating a product safely and efficiently via an internal tool. SCARF combines this tooling with automation to reduce load on engineers. At Meta, we are constantly innovating and experimenting by building\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":779,"url":"https:\/\/fde.cat\/index.php\/2023\/10\/31\/automating-data-removal\/","url_meta":{"origin":777,"position":1},"title":"Automating data removal","date":"October 31, 2023","format":false,"excerpt":"Meta\u2019s Systematic Code and Asset Removal Framework (SCARF) has a subsystem for identifying and removing unused data types. SCARF scans production data systems to identify tables or assets that are unused and safely removes them. SCARF avoids tedious manual work and ensures that product data is correctly removed when a\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":786,"url":"https:\/\/fde.cat\/index.php\/2023\/11\/13\/sre-weekly-issue-398\/","url_meta":{"origin":777,"position":2},"title":"SRE Weekly Issue #398","date":"November 13, 2023","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, FireHydrant: \u201cChange is the essential process of all existence.\u201d \u2013 Spock It\u2019s time for alerting to evolve. Get a first look at how incident management platform FireHydrant is architecting Signals, its native alerting tool, for resilience in the Signals Captain\u2019s Log. https:\/\/firehydrant.com\/blog\/captains-log-a-first-look-at-our-architecture-for-signals\/\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":781,"url":"https:\/\/fde.cat\/index.php\/2023\/11\/06\/sre-weekly-issue-397\/","url_meta":{"origin":777,"position":3},"title":"SRE Weekly Issue #397","date":"November 6, 2023","format":false,"excerpt":"View on sreweekly.com A message from our sponsor, FireHydrant: Incident management platform FireHydrant is combining alerting and incident response in one ring-to-retro tool. Sign up for the early access waitlist and be the first to experience the power of alerting + incident response in one platform at last. https:\/\/firehydrant.com\/signals\/ Modern\u2026","rel":"","context":"In &quot;SRE&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":548,"url":"https:\/\/fde.cat\/index.php\/2022\/03\/08\/an-open-source-compositional-deadlock-detector-for-android-java\/","url_meta":{"origin":777,"position":4},"title":"An open source compositional deadlock detector for Android Java","date":"March 8, 2022","format":false,"excerpt":"What the research is: We\u2019ve developed a new static analyzer that catches deadlocks in Java code for Android without ever running the code. What distinguishes our analyzer from past research is its ability to analyze revisions in codebases with hundreds of millions of lines of code. We have deployed our\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":656,"url":"https:\/\/fde.cat\/index.php\/2022\/11\/22\/retrofitting-null-safety-onto-java-at-meta\/","url_meta":{"origin":777,"position":5},"title":"Retrofitting null-safety onto Java at Meta","date":"November 22, 2022","format":false,"excerpt":"We developed a new static analysis tool called Nullsafe that is used at Meta to detect NullPointerException (NPE) errors in Java code. Interoperability with legacy code and gradual deployment model were key to Nullsafe\u2019s wide adoption and allowed us to recover some null-safety properties in the context of an otherwise\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/777","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/comments?post=777"}],"version-history":[{"count":0,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/777\/revisions"}],"wp:attachment":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/media?parent=777"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/categories?post=777"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/tags?post=777"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}