{"id":338,"date":"2021-08-31T14:39:28","date_gmt":"2021-08-31T14:39:28","guid":{"rendered":"https:\/\/fde.cat\/?p=338"},"modified":"2021-08-31T14:39:28","modified_gmt":"2021-08-31T14:39:28","slug":"a-linear-programming-approach-for-optimizing-features-in-ml-models","status":"publish","type":"post","link":"https:\/\/fde.cat\/index.php\/2021\/08\/31\/a-linear-programming-approach-for-optimizing-features-in-ml-models\/","title":{"rendered":"A linear programming approach for optimizing features in ML models"},"content":{"rendered":"<p><span>Whether it\u2019s iterating on <\/span><a href=\"https:\/\/engineering.fb.com\/2021\/01\/26\/ml-applications\/news-feed-ranking\/\"><span>Facebook\u2019s News Feed ranking algorithm<\/span><\/a><span> or delivering the most relevant ads to users, we are constantly exploring new features to help improve our machine learning (ML) models. Every time we add new features, we create a challenging data engineering problem that requires us to think strategically about the choices we make. More complex features and sophisticated techniques require additional storage space. Even at a company the size of Facebook, capacity isn\u2019t infinite. If left unchecked, accepting all features would quickly overwhelm our capacity and slow down our iteration speed, decreasing the efficiency of running the models.<\/span><\/p>\n<p><span>To better understand the relationship between these features and the capacity of the infrastructure that needs to support them, we can frame the system as a linear programming problem. By doing this, we can maximize a model\u2019s performance, probe the sensitivity of its performance to different infrastructure constraints, and study the relationships between different services. <\/span><span>This work was done by data scientists embedded in our engineering team and demonstrates the value of analytics and data science in ML.<\/span><\/p>\n<h2><span>Supporting feature development<\/span><\/h2>\n<p><span>It\u2019s important to continually introduce features that best leverage new data to maintain performant models. New features are responsible for the majority of incremental model improvements. These ML models are useful for our ad delivery system. They work together to predict a person\u2019s likelihood of taking specific action on the ad. We work to continuously improve our models so our systems deliver only those ads that are relevant to a user.<\/span><\/p>\n<p><span>As our techniques become more sophisticated, we develop more complex features that demand more of our infrastructure. A feature can leverage different services depending on its purpose. Some features have a higher memory cost, while others require extra CPU or take up more storage. It\u2019s important to use our infrastructure wisely to maximize the performance of our models. We must be able to smartly allocate resources and be able to quantify the trade-offs of different scenarios.<\/span><\/p>\n<p><span>To address these problems, we frame our system as a linear programming problem that maximizes our model\u2019s metrics. We use this framework to better understand the interaction between our features and services. With this knowledge, we can automatically select the best features, identify infrastructure services to invest in, and maintain the health of both our models and services.<\/span><\/p>\n<h2><span>Framing our problem<\/span><\/h2>\n<p><span>To get a handle on our framework, we\u2019ll first introduce a model problem. Say we have multiple features that all take up some amount of space (the height of the rectangles) and contribute some amount of gain to our models (the teal squares), and we are unable to accommodate them all in our limited capacity.<\/span><\/p>\n<\/p>\n<p>\u00a0<\/p>\n<p><span>A naive solution would be to just start picking the features with the most gain (teal squares) until you are out of capacity. However, you might not be making the best use of your resources if you just prioritize the gain. For example, by taking in a big feature with a large gain, you could be taking up room that two smaller features with less gain could use instead. Together, those two smaller features would give you more bang for your buck than the big feature.<\/span><\/p>\n<\/p>\n<p><span>If we get a little less naive, we could instead look to pick features that give us the most bang per buck \u2014 features that have the most gain per storage. However, if we pick features only from that perspective, we could end up leaving out some less efficient features that we would still have room to accommodate.<\/span><\/p>\n<\/p>\n<p><span>We\u2019ve been looking at a very simplified view of infrastructure, but the reality is a bit more complex. For example, features often don\u2019t take up just one resource but need many \u2014 such as memory, CPU, or storage in other services. We can make our example slightly more sophisticated by adding in Service B, and saying that orange features take up space in both Service A and Service B.<\/span><\/p>\n<\/p>\n<p><span>Picking which features we use is not the only way to control our infrastructure usage. We can also employ various techniques to increase the efficiency of our feature storage. This sometimes comes with a cost, either through the feature itself or capacity from a service. In this case, let\u2019s say that we can halve the storage cost of some features (bordered in pink) but only at the cost of reducing the gain of the feature, and using some of the limited capacity in Service B.<\/span><\/p>\n<\/p>\n<p><span>We\u2019ll stop the example at this point, but this is enough for the general message to be clear \u2014 infrastructure can be a complicated interconnected system of different constraints. In reality, our capacity is not set in stone. We can move resources around if it is warranted. Features are also not the only thing we\u2019re working on. There are plenty of other projects and workflows that compete for the same resources. We need to not only choose the features that maximize our gain but also be able to answer questions about how our system responds to changes:<\/span><\/p>\n<p><span>Which features do we select to optimize the gain?<\/span><br \/>\n<span>Is feature compression worth it? More important, is it worth an engineer\u2019s time to implement it?<\/span><br \/>\n<span>How does the gain change if we add more capacity to Service A?<\/span><br \/>\n<span>How do service dependencies interact? If we increase the capacity of Service B, can we use less of Service A?<\/span><\/p>\n<h2><span>Scaling the problem<\/span><\/h2>\n<p><span>Let\u2019s step back and review the conditions of our model problem:<\/span><\/p>\n<p><span>We want to maximize our gain.<\/span><br \/>\n<span>We are limited by the capacity of Service A.<\/span><br \/>\n<span>We are also limited by the capacity of Service B, which only some features contribute to.<\/span><br \/>\n<span>Some features may be compressed, but:<\/span><\/p>\n<p><span>They suffer a loss to their gain.<\/span><br \/>\n<span>Some of Service B\u2019s capacity must be used.<\/span><\/p>\n<p><span>We can express all these constraints as a system of linear equations.<\/span><\/p>\n<p><span>Let \ud835\udc65 be a vector that is 0 or 1 that signifies whether we select the feature, and let \ud835\udc54 be a vector that stores the gain of the feature. The subscripts \ud835\udc53 and \ud835\udc50 denote whether we are specifying a full cost or compressed feature. For example, \ud835\udc65<\/span><span>\ud835\udc53<\/span><span> denotes full, uncompressed features that we have selected to include, and \ud835\udc54<\/span><span>\ud835\udc50<\/span><span> represents the cost of compressed features.<\/span><\/p>\n<p><span>Given these definitions, our objective is to maximize:<\/span><\/p>\n<\/p>\n<p><span>We can now add our constraints that model the limitations of our infrastructure:<\/span><\/p>\n<p><span>Features will either be selected and compressed, selected but not compressed, or not selected. We should not select the compressed and uncompressed versions of the same feature.<\/p>\n<p><\/span><br \/>\n<span>Let \ud835\udc60 be the storage cost of the feature and the subscripts \ud835\udc34 and \ud835\udc35 represent Service A and B, respectively. For example, \ud835\udc60<\/span><span>\ud835\udc34\ud835\udc50<\/span><span> represents the storage cost of compressed features in Service A. We are constrained by the capacity of the two services.<\/p>\n<p><\/span><br \/>\n<span>Some of Service B must be utilized to enable compression. Let\u2019s represent that as a few features that must be selected.<\/p>\n<p><\/span><\/p>\n<p><span>With this, we have now completely specified our problem in a few equations and can solve them using linear programming techniques. Of course, as we are interested in automating and productionalizing this, it can easily be specified in code. For this example, we accomplish this in Python using the excellent <\/span><a href=\"https:\/\/numpy.org\/\"><span>NumPy<\/span><\/a><span> and <\/span><a href=\"https:\/\/www.cvxpy.org\/\"><span>CVXPY<\/span><\/a><span> packages.<\/span><\/p>\n<p>import cvxpy as cp<br \/>\nimport numpy as np<br \/>\nimport pandas as pd<\/p>\n<p># Assuming data is a Pandas DataFrame that contains relevant feature data<br \/>\ndata = pd.DataFrame(&#8230;)<br \/>\n# These variables contain the maximum capacity of various services<br \/>\nservice_a = &#8230;<br \/>\nservice_b = &#8230;<\/p>\n<p>selected_full_features = cp.Variable(data.shape[0], boolean=True)<br \/>\nselected_compressed_features = cp.Variable(data.shape[0], boolean=True)<\/p>\n<p># Maximize the feature gain<br \/>\nfeature_gain = (<br \/>\n   data.uncompressed_feature_gain.to_numpy() @ selected_full_features<br \/>\n   + data.compressed_feature_gain.to_numpy() @ selected_compressed_features<br \/>\n)<\/p>\n<p>constraints = [<br \/>\n   # 1. We should not select the compressed and uncompressed version<br \/>\n   #    of the same feature<br \/>\n   selected_full_features + selected_compressed_features &lt;= np.ones(data.shape[0]),<br \/>\n   # 2. Features are restricted by the maximum capacity of the services<br \/>\n   data.full_storage_cost.to_numpy() @ selected_full_features<br \/>\n   + data.compressed_storage_cost.to_numpy() @ selected_full_features<br \/>\n   &lt;= service_a,<br \/>\n   data.full_memory_cost.to_numpy() @ selected_full_features<br \/>\n   + data.compressed_memory_cost.to_numpy() @ selected_compressed_features<br \/>\n   &lt;= service_b,<br \/>\n   # 3. Some features must be selected to enable compression<br \/>\n   selected_full_features &gt;= data.special_features.to_numpy(),<br \/>\n]<\/p>\n<h2><span>Leveraging the framework<\/span><\/h2>\n<p><span>Now we have a framework that we can use to express our questions and hypotheticals. If we want to find out how an increase in Service A translates to a feature gain, we can\u00a0 run the optimization problem above at different values for Service A capacity and plot the gain. This way, we can directly quantify the return for each incremental increase in capacity. We can use this as a strong signal for what services we should invest in for the future and directly compare the return on investment on more feature memory, computing, or storage.<\/span><\/p>\n<\/p>\n<p><span>Similarly, we can look at the relationships between services. We simply vary the capacity of Services A and B while keeping the gain constant. We can see that as Service B\u2019s capacity increases, less of Service A is needed to achieve the same gain. This can be leveraged if one service is overly stressed compared with another.<\/span><\/p>\n<\/p>\n<h2><span>Linear programming as a framework for automating decisions<\/span><\/h2>\n<p><span>Previously, feature approval was a manual process where teams would spend valuable time calculating how many features we could support and analyzing what the return on investment was for increasing the capacity of our services. In a company like Facebook \u2014 where we have multiple models being continuously iterated on \u2014 this approach does not scale. By framing our services as a system of linear equations, we take a complex interconnected system and simplify it into basic relationships that are easily communicated. By doing this, we can make smarter decisions about the features we deploy and the infrastructure we invest in.<\/span><\/p>\n<p>The post <a href=\"https:\/\/engineering.fb.com\/2021\/07\/29\/data-infrastructure\/linear-programming\/\">A linear programming approach for optimizing features in ML models<\/a> appeared first on <a href=\"https:\/\/engineering.fb.com\/\">Facebook Engineering<\/a>.<\/p>\n<p><a href=\"https:\/\/engineering.fb.com\/2021\/07\/29\/data-infrastructure\/linear-programming\/\">Read More<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Whether it\u2019s iterating on Facebook\u2019s News Feed ranking algorithm or delivering the most relevant ads to users, we are constantly exploring new features to help improve our machine learning (ML) models. Every time we add new features, we create a challenging data engineering problem that requires us to think strategically about the choices we make.&hellip; <a class=\"more-link\" href=\"https:\/\/fde.cat\/index.php\/2021\/08\/31\/a-linear-programming-approach-for-optimizing-features-in-ml-models\/\">Continue reading <span class=\"screen-reader-text\">A linear programming approach for optimizing features in ML models<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","footnotes":""},"categories":[7],"tags":[],"class_list":["post-338","post","type-post","status-publish","format-standard","hentry","category-technology","entry"],"jetpack_featured_media_url":"","jetpack-related-posts":[{"id":618,"url":"https:\/\/fde.cat\/index.php\/2022\/08\/10\/scaling-data-ingestion-for-machine-learning-training-at-meta\/","url_meta":{"origin":338,"position":0},"title":"Scaling data ingestion for machine learning training at Meta","date":"August 10, 2022","format":false,"excerpt":"Many of Meta\u2019s products, such as search and language translations, utilize AI models to continuously improve user experiences. As the performance of hardware we use to support training infrastructure increases, we need to scale our data ingestion infrastructure accordingly to handle workloads more efficiently. GPUs, which are used for training\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":893,"url":"https:\/\/fde.cat\/index.php\/2024\/07\/10\/metas-approach-to-machine-learning-prediction-robustness\/","url_meta":{"origin":338,"position":1},"title":"Meta\u2019s approach to machine learning prediction robustness","date":"July 10, 2024","format":false,"excerpt":"Meta\u2019s advertising business leverages large-scale machine learning (ML) recommendation models that power millions of ads recommendations per second across Meta\u2019s family of apps. Maintaining reliability of these ML systems helps ensure the highest level of service and uninterrupted benefit delivery to our users and advertisers. To minimize disruptions and ensure\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":848,"url":"https:\/\/fde.cat\/index.php\/2024\/04\/01\/unveiling-the-cutting-edge-features-of-ml-console-for-ai-model-lifecycle-management\/","url_meta":{"origin":338,"position":2},"title":"Unveiling the Cutting-Edge Features of ML Console for AI Model Lifecycle Management","date":"April 1, 2024","format":false,"excerpt":"In our \u201cEngineering Energizers\u201d Q&A series, we explore the journeys of engineering leaders who have made remarkable contributions in their fields. Today, we meet Venkat Krishnamani, a Lead Member of the Technical Staff for Salesforce Engineering and the lead engineer for Salesforce Einstein\u2019s Machine Learning (ML) Console. This vital tool\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":774,"url":"https:\/\/fde.cat\/index.php\/2023\/10\/17\/how-the-einstein-team-operationalizes-ai-models-at-lightning-speed-and-massive-scale\/","url_meta":{"origin":338,"position":3},"title":"How the Einstein Team Operationalizes AI Models at Lightning Speed and Massive Scale","date":"October 17, 2023","format":false,"excerpt":"By Yuliya Feldman and Scott Nyberg In our \u201cEngineering Energizers\u201d Q&A series, we examine the professional life experiences that have shaped Salesforce Engineering leaders. Meet Yuliya Feldman, a Software Engineering Architect at Salesforce. Yuliya works on Salesforce Einstein\u2019s Machine Learning Services team, responsible for operationalizing AI models, which serves as\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":789,"url":"https:\/\/fde.cat\/index.php\/2023\/10\/17\/how-the-einstein-team-operationalizes-ai-models-at-lightning-speed-and-massive-scale-2\/","url_meta":{"origin":338,"position":4},"title":"How the Einstein Team Operationalizes AI Models at Lightning Speed and Massive Scale","date":"October 17, 2023","format":false,"excerpt":"By Yuliya Feldman and Scott Nyberg In our \u201cEngineering Energizers\u201d Q&A series, we examine the professional life experiences that have shaped Salesforce Engineering leaders. Meet Yuliya Feldman, a Software Engineering Architect at Salesforce. Yuliya works on Salesforce Einstein\u2019s Machine Learning Services team, responsible for operationalizing AI models, which serve as\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":321,"url":"https:\/\/fde.cat\/index.php\/2021\/08\/31\/meet-kats-a-one-stop-shop-for-time-series-analysis\/","url_meta":{"origin":338,"position":5},"title":"Meet Kats \u2014 a one-stop shop for time series analysis","date":"August 31, 2021","format":false,"excerpt":"What it is:\u00a0 A new library to analyze time series data. Kats is a lightweight, easy-to-use, and generalizable framework for generic time series analysis, including forecasting, anomaly detection, multivariate analysis, and feature extraction\/embedding. To the best of our knowledge, Kats is the first comprehensive Python library for generic time series\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/338","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/comments?post=338"}],"version-history":[{"count":1,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/338\/revisions"}],"predecessor-version":[{"id":372,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/338\/revisions\/372"}],"wp:attachment":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/media?parent=338"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/categories?post=338"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/tags?post=338"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}