Automating product deprecation

Systematic Code and Asset Removal Framework (SCARF) is Meta’s unused code and data deletion framework.
SCARF guides engineers through deprecating a product safely and efficiently via an internal tool.
SCARF combines this tooling with automation to reduce load on engineers.

At Meta, we are constantly innovating and experimenting by building and shipping many different products, and those products comprise thousands of individual features. As part of this healthy technology lifecycle, it is inevitable that certain products or features will be deprecated. For example, in 2015 we launched a photo-sharing app called Moments, which was later deprecated in 2019. So, how did we efficiently and safely remove all of the code and data related to Moments without adversely affecting Meta’s other products and services?

In this three-part blog series, we will discuss the complexities involved in removing a product from a complex portfolio of products and the framework Meta has built to drive the automation of this process, our Systematic Code and Asset Removal Framework (SCARF). SCARF has had an important impact at Meta. In the last year, it has removed petabytes of unused data across 12.8M different data types stored in 21 different data systems. Over the last five years it has deleted over 100M lines of code.

The first post will introduce the complexities faced when systematically deprecating products safely in a large organization and the internal workflow tools we have developed. The second post will explain how SCARF automates the removal of dead code and the infrastructure that powers it. The third post will discuss SCARF’s orchestration for safely identifying and deleting unused data types across various data systems. 

Failure modes

Without established guidance detailing the process for determining when and how to remove a product or feature, a few failure modes might emerge. Consider the example of launching a time-specific feature for a large event that happens once every few years. Does it make sense to keep all the code and data related to it until the next event? Most of the time, maintaining code for a number of years that is unused is less desirable than building a new experience for the next event.

Engineers who do attempt the cleanup might find that doing so is a very time-consuming job. Correctly identifying all the pieces of code and data associated with the product and only taking action on those specific pieces of code and data is a laborious process. It’s possible that a table that is still being used, or code that is still required for a shared use case, could be included in the scope of a deletion effort. For example, some tables might be shared between products. Hypothetically, Moments may have started life as an extension to Facebook photos before it became a separate app.

Most importantly, it’s crucial that any cleanup efforts only remove things that are actively being deprecated or are already entirely unused. Deleting something that is actively being used in production could cause bad experiences for users. The interconnected nature of features within a large product like Facebook makes this a very real possibility. 

How did we solve this?

Meta has developed playbooks describing how to safely deprecate a product. These playbooks describe how to notify people and give them time to download their data, how to disable the product safely, and when to eventually delete the underlying code and data. They describe how and when to perform a product or feature deprecation, but actually removing the code and data for a product or feature is an engineering problem with an engineering solution. The engineering solution we have built does not replace these guides, but enables engineers to more safely and efficiently complete the product deprecation process.

In this post, we’ll describe Meta’s SCARF and how it guides engineers through this cleanup process. In subsequent posts, we will discuss its limitations and how they are mitigated by our automated dead code and data removal platforms. To start, we introduce Meta’s suite of internal tools developed to orchestrate both of these systems in guiding engineers to remove large products with complex dependencies.

Introducing workflow management

To simplify the task of removing a product, Meta has built a product deprecation workflow management tool into SCARF to help engineers delete a product’s dead code and unused data safely and efficiently. This tool lets engineers understand and break down the steps they’ll go through during the deprecation and coordinates the actions of SCARF to bring automation to bear with an engineer’s guidance.

Engineers can import their product or feature into SCARF, which then determines the constituent pieces of code and data, and identifies internal and external dependencies on these assets. SCARF automatically processes this information to guide engineers on the correct order of operations to delete assets safely and shows their progress relative to the desired end-state.

Any assets that are safe to be deleted immediately will be handled by SCARF’s automated code and data cleanup systems (which will be covered in more depth in our subsequent blog posts). Engineers are able to track this automation and accelerate it by using their domain knowledge combined with SCARF’s scoping analysis, which determines the assets that are safe to remove. 

Scoping the deprecation

Understanding when a piece of code or data is used by other components of a product is important, and detecting when an asset is referenced in the codebase of other products is crucial for a safe deprecation (the internal and external dependencies, respectively). For example, we would not want to leave a dead web link between two different products because that could lead to a bad user experience. An engineer deprecating a product must think carefully about each such dependency in the external boundary of their product.

To avoid these complications, engineers begin their deprecations by scoping the project, recursively adding any assets that should be deleted, and flagging dependencies that should be severed. For example, if Moments had an integration with the Facebook app’s Sharing feature, this dependency must be broken because the Facebook Sharing feature itself may not be in scope for removal. 

However, if the Sharing feature was unique to Moments, that would change the scope of removal as we would want to delete that component as well. Adding new assets into the scope of the project as it progresses requires further analysis to discover the new boundaries of the internal and external dependencies. These dependencies are expected to change over time as an engineer discovers extra components to delete. Alternatively, we may identify code that is actually a shared component and should not be deleted. Attempting to finalize this boundary from the start is very difficult, so we allow developers to redefine the boundary over time as the deprecation progresses to reflect this growing understanding. 

The graph of components, their internal and external dependencies, and related data assets, can grow extremely complex in large products. SCARF simplifies this problem by only requiring engineers to make “Flag Dependency” or “Add to Project” decisions at the boundaries. It then internally computes the correct deletion order for everything inside the project.

Creating a deletion roadmap

Once the set of unused assets to be deleted has been determined, SCARF will analyze the internal and external dependencies and create a deletion roadmap that outlines the correct sequence of steps for deleting everything safely. The roadmap is refreshed each day as changes are made, either from targeted assets being deleted or from modifications to the component graph through the Flag and Add actions. This roadmap is one of the workflow management tool’s most crucial features. 

Without guidance, engineers may attempt to delete everything in one fell swoop by removing an entire code directory without accounting for external systems’ dependencies, where changes cannot be committed atomically. Another example includes deleting data before the code that reads and writes it, which may lead to new data being created that must be cleaned again. Deprecations must be staggered to account for these various requirements and this entails that every deprecation be performed in a coordinated, multi-step process.

The key implementation detail is the encoding of business logic that allows products and apps to function in Meta’s systems. By encoding how users are able to engage with products and how those products communicate with other Meta services, we can ensure that upstream assets are always deleted before their dependencies. For example, typically a product will comprise code in many different languages across multiple repositories. An engineer needs to delete their mobile code (Java, Objective-C) in order to free up and delete their server-side GraphQL definitions. Deleting those GraphQL definitions makes it possible to delete business logic; deleting business logic makes it possible to delete data schema definitions, which in turn allows unused data to be deleted. 

The following diagram is a hypothetical example of the type of information that is presented to an engineer in their deletion roadmap. 

This sequencing isn’t obvious at the get-go! The links between these system boundaries are often weaker than the link between, for example, two classes in the same language; and the link may often only be discovered by an engineer during continuous testing on their code change requests. SCARF’s encoding of these inter-system boundaries enables the deletion roadmap which in turn enables engineers to stagger their removal actions safely.

Powering the analysis

SCARF’s workflow management tool is powered by detailed metrics from a combination of both static and dynamic analysis. Information about code is gathered from SCARF’s unified code dependency graph and information about data is gathered from SCARF’s asset usage analysis. These mechanisms will be discussed in more detail in our subsequent blog posts.

These metrics give the workflow management tool a very important property: If metrics can be found for an asset, it is in use and should be blocked from automated deletion. Correspondingly, any asset for which no metrics are available is ready for automatic deletion. With these metrics in hand, SCARF can show engineers a precise explanation of exactly what is blocking the automation and what steps they must manually perform in order for the automation to proceed. 

This completeness property allows SCARF to automatically begin removing code and data that is unused, while simultaneously showing engineers which things must be handled manually. At every step of the process, both engineers and automation work together to complete the deprecation. 

Once an engineer has acted upon this information and removed more dead code and unused data in accordance with the deletion roadmap, the continuous indexing and analysis of SCARF will detect these changes and automatically trigger any code/data cleanup it can. Finally, the workflow management tool will update its deletion roadmap to identify the next set of items requiring manual intervention, and the process is repeated. 

Is automation sufficient?

Some of the usage signals SCARF highlights (such as API endpoints that still receive traffic) do not necessarily cause compilation errors if they are ignored and are more subjective as to whether they should stop automated deletion. If the endpoint for a product receives a single request each day, should we delete it right away? We have already identified that the endpoint belongs to a product we are removing so we know it should be removed eventually, but the answer to when it is correct to delete it is ultimately a business decision. Further, we understand that mistakes can happen and that the automation is not expected to be 100 percent perfect. Therefore, SCARF errs on the side of caution as a mistaken deletion could lead to unrecoverable errors or data loss.

Correspondingly, SCARF offers the ability to override signals that prevent automation from taking action. In the example above, an engineer might decide we wish to proceed with deleting the API endpoint despite small amounts of traffic. This is extremely valuable when cleaning up internal products, tools, and services. Internal tools are built and deprecated with a much faster cadence than an external product often is and SCARF can help determine each individual internal user of a tool who needs to be asked if they’re agreeable with it being removed. 

This feature creates a feedback loop between engineers and automation. Our automation highlights usage signals; engineers then triage those usage signals (either by modifying code to remove them or, if applicable, marking the signal as not sufficient to block deletion); then our automation can proceed to progress the SCARF project. 

To further accelerate manual changes performed by engineers, SCARF also provides an instant code change feature. SCARF suggests a means for breaking a dependency through a targeted code change and an engineer is able to select one of a predetermined set of possible ways to break it. The change is immediately previewable and a change request can be immediately generated for review by other engineers.

After an engineer has progressed through a majority of the deletion roadmap and removed the entry points, client code, server code, and any other in-scope assets, they will be left with unused tables of data that can now be automatically cleaned up by SCARF. This automation will be explained in more detail in our future blog posts.

Integration with other tools at Meta

SCARF is useful for more than just product deprecation. We surface deprecation as an option when notifying engineers about routine maintenance or upgrade work on individual assets. For example, when asking developers to migrate away from a legacy internal API, we can include an option for them to deprecate the callsite or affected product and import their code into SCARF. Deprecating something can, in many cases, be more work than performing a simple upgrade, but deprecation means that no future maintenance work is necessary. Once something has been removed, future maintenance costs go to zero.

We hope you’ll look forward to our subsequent posts in the series, which will cover SCARF’s automated code and data deletion, respectively. 

The post Automating product deprecation appeared first on Engineering at Meta.

Engineering at Meta

Published
Categorized as Technology