{"id":744,"date":"2023-08-09T16:00:52","date_gmt":"2023-08-09T16:00:52","guid":{"rendered":"https:\/\/fde.cat\/index.php\/2023\/08\/09\/scaling-the-instagram-explore-recommendations-system\/"},"modified":"2023-08-09T16:00:52","modified_gmt":"2023-08-09T16:00:52","slug":"scaling-the-instagram-explore-recommendations-system","status":"publish","type":"post","link":"https:\/\/fde.cat\/index.php\/2023\/08\/09\/scaling-the-instagram-explore-recommendations-system\/","title":{"rendered":"Scaling the Instagram Explore recommendations system"},"content":{"rendered":"<p><span><a href=\"https:\/\/ai.facebook.com\/blog\/powered-by-ai-instagrams-explore-recommender-system\/\" target=\"_blank\" rel=\"noopener\">Explore<\/a> is one of the largest recommendation systems on Instagram.<\/span><br \/>\n<span>We leverage machine learning to make sure people are always seeing content that is the most interesting and relevant to them.<\/span><br \/>\n<span>Using more advanced machine learning models, like Two Towers neural networks, we\u2019ve been able to make the Explore recommendation system even more scalable and flexible.<\/span><\/p>\n<p><span>AI plays an important role in <\/span><a href=\"https:\/\/about.fb.com\/news\/2023\/06\/how-ai-ranks-content-on-facebook-and-instagram\/\" target=\"_blank\" rel=\"noopener\"><span>what people see on Meta\u2019s platforms<\/span><\/a><span>. Every day, hundreds of millions of people visit Explore on Instagram to discover something new, making it one of the largest recommendation surfaces on Instagram.<\/span><\/p>\n<p><span>To build a large-scale system capable of recommending the most relevant content to people in real time out of billions of available options, we\u2019ve leveraged machine learning (ML) to introduce<\/span> <a href=\"https:\/\/ai.facebook.com\/blog\/powered-by-ai-instagrams-explore-recommender-system\/\" target=\"_blank\" rel=\"noopener\"><span>task specific domain-specific language (DSL) and a multi-stage approach to ranking<\/span><\/a><span>.<\/span><\/p>\n<p><span>As the system has continued to evolve, we\u2019ve expanded our multi-stage ranking approach with several well-defined stages, each focusing on different objectives and algorithms.<\/span><\/p>\n<p><span>Retrieval<\/span><br \/>\n<span>First-stage ranking<\/span><br \/>\n<span>Second-stage ranking<\/span><br \/>\n<span>Final reranking<\/span><\/p>\n<p><span>By leveraging caching and pre-computation with highly-customizable modeling techniques, like a<\/span>\u00a0<a href=\"https:\/\/research.google\/pubs\/pub48840\/\" target=\"_blank\" rel=\"noopener\"><span>Two Towers neural network (NN)<\/span><\/a><span>, we\u2019ve built a ranking system for Explore that is even more flexible and scalable than ever before.<\/span><\/p>\n<p>The stages funnel for Explore on Instagram.<\/p>\n<p><span>Readers might notice that the leitmotif of this post will be clever use of caching and pre-computation in different ranking stages. This allows us to use heavier models in every stage of ranking, learn behavior from data, and rely less on heuristics.<\/span><\/p>\n<h2><span>Retrieval<\/span><\/h2>\n<p><span>The basic idea behind retrieval is to get an approximation of what content (candidates) will be ranked high at later stages in the process if all of the content is drawn from a general media distribution.<\/span><\/p>\n<p><span>In a world with infinite computational power and no latency requirements we could rank all possible content. But, given real-world requirements and constraints, most large-scale recommender systems employ a multi-stage funnel approach \u2013 starting with thousands of candidates and narrowing down the number of candidates to hundreds as we go down the funnel.<\/span><\/p>\n<p><span>In most large-scale recommender systems, the retrieval stage consists of multiple candidates\u2019 retrieval sources (\u201csources\u201d for short). The main purpose of a source is to select hundreds of relevant items from a media pool of billions of items. Once we fetch candidates from different sources, we combine them together and pass them to ranking models.<\/span><\/p>\n<p><span>Candidates\u2019 sources can be based on heuristics (e.g., trending posts) as well as more sophisticated ML approaches. Additionally, retrieval sources can be real-time (capturing most recent interactions) and pre-generated (capturing long-term interests).<\/span><span>\u00a0 <\/span><\/p>\n<p>The four types of retrieval sources.<\/p>\n<p><span>To model media retrieval for different user groups with various interests, we utilize all these mentioned source types together and mix them with tunable weights.<\/span><\/p>\n<p><span>Candidates from pre-generated sources could be generated offline during off-peak hours (e.g., locally popular media), which further contributes to system scalability.<\/span><\/p>\n<p><span>Let\u2019s take a closer look at a couple of techniques that can be used in retrieval.<\/span><\/p>\n<h3><span>Two Tower NN<\/span><\/h3>\n<p><a href=\"https:\/\/research.google\/pubs\/pub48840\/\" target=\"_blank\" rel=\"noopener\"><span>Two Tower NNs<\/span><\/a><span> deserve special attention in the context of retrieval.\u00a0<\/span><\/p>\n<p><span>Our ML-based approach to retrieval used the<\/span> <a href=\"https:\/\/ai.facebook.com\/blog\/powered-by-ai-instagrams-explore-recommender-system\/\" target=\"_blank\" rel=\"noopener\"><span>Word2Vec algorithm<\/span><\/a><span> to generate user and media\/author embeddings based on their IDs.\u00a0<\/span><\/p>\n<p><span>The Two Towers model extends the Word2Vec algorithm, allowing us to use arbitrary user or media\/author features and learn from multiple tasks at the same time for multi-objective retrieval. This new model retains the maintainability and real-time nature of Word2Vec, which makes it a great choice for a candidate sourcing algorithm.<\/span><\/p>\n<p><span>Here\u2019s how the Two Tower retrieval works in general with schema:<\/span><\/p>\n<p><span>The Two Tower model consists of two separate neural networks \u2013 one for the user and one for the item.<\/span><br \/>\n<span>Each neural network only consumes features related to their entity and outputs an embedding.<\/span><br \/>\n<span>The learning objective is to predict engagement events (e.g., someone liking a post) as a similarity measure between user and item embeddings.<\/span><br \/>\n<span>After training, user embeddings should be close to the embeddings of relevant items for a given user. Therefore, item embeddings close to the user\u2019s embedding can be used as candidates for ranking.\u00a0<\/span><\/p>\n<p>How we train our Two Tower neural network for Explore.<\/p>\n<p><span>Given that user and item networks (towers) are independent after training, we can use an item tower to generate embeddings for items that can be used as candidates during retrieval. And we can do this on a daily basis using an offline pipeline.<\/span><\/p>\n<p><span>We can also put generated item embeddings into a service that supports online approximate nearest neighbors (ANN) search (e.g.,<\/span> <a href=\"https:\/\/engineering.fb.com\/2017\/03\/29\/data-infrastructure\/faiss-a-library-for-efficient-similarity-search\/\" target=\"_blank\" rel=\"noopener\"><span>FAISS<\/span><\/a><span>, HNSW, etc), to make sure that we don\u2019t have to scan through an entire set of items to find similar items for a given user.<\/span><\/p>\n<p><span>During online retrieval we use the user tower to generate user embedding on the fly by fetching the freshest user-side features, and use it to find the most similar items in the ANN service.<\/span><\/p>\n<p><span>It\u2019s important to keep in mind that the model can\u2019t consume user-item interaction features (which are usually the most powerful) because by consuming them it will lose the ability to provide cacheable user\/item embeddings.<\/span><\/p>\n<p><span>The main advantage of the Two Tower approach is that user and item embeddings can be cached, making inference for the Two Tower model extremely efficient.<\/span><\/p>\n<p>How the Two Towers model handles retrieval.<\/p>\n<h3><span>User interactions history<\/span><\/h3>\n<p><span>We can also use item embeddings directly to retrieve similar items to those from a user\u2019s interactions history.<\/span><\/p>\n<p><span>Let\u2019s say that a user liked\/saved\/shared some items. Given that we have embeddings of those items, we can find a list of similar items to each of them and combine them into a single list.\u00a0<\/span><\/p>\n<p><span>This list will contain items reflective of the user\u2019s previous and current interests.<\/span><\/p>\n<p>User interaction history for Explore.<\/p>\n<p><span>Compared with retrieving candidates using user embedding, directly using a user\u2019s interactions history allows us to have a better control over online tradeoff between different engagement types.<\/span><\/p>\n<p><span>In order for this approach to produce high-quality candidates, it\u2019s important to select good items from the user\u2019s interactions history. (i.e., If we try to find similar items to some randomly clicked item we might risk flooding someone\u2019s recommendations with irrelevant content).<\/span><\/p>\n<p><span>To select good candidates, we apply a rule-based approach to filter-out poor-quality items (i.e., sexual\/objectionable images, posts with high number of \u201creports\u201d, etc.) from the interactions history. This allows us to retrieve much better candidates for further ranking stages.<\/span><\/p>\n<h2><span>Ranking<\/span><\/h2>\n<p><span>After candidates are retrieved, the system needs to rank them by value to the user.<\/span><\/p>\n<p><span>Ranking in a high load system is usually divided into multiple stages that gradually reduce the number of candidates from a few thousand to few hundred that are finally presented to the user.<\/span><\/p>\n<p><span>In Explore, because it\u2019s infeasible to rank all candidates using heavy models, we use two stages:\u00a0<\/span><\/p>\n<p><span>A first-stage ranker (i.e., lightweight model), which is less precise and less computationally intensive and can recall thousands of candidates.<\/span><br \/>\n<span>A second-stage ranker (i.e., heavy model), which is more precise and compute intensive and operates on the 100 best candidates from the first stage.<\/span><\/p>\n<p><span>Using a two-stage approach allows us to rank more candidates while maintaining a high quality of final recommendations.<\/span><\/p>\n<p><span>For both stages we choose to use neural networks because, in our use case, it\u2019s important to be able to adapt to changing trends in users\u2019 behavior very quickly. Neural networks allow us to do this by utilizing continual online training, meaning we can re-train (fine-tune) our models every hour as soon as we have new data. Also, a lot of important features are categorical in nature, and neural networks provide a natural way of handling categorical data by learning embeddings<\/span><\/p>\n<h3><span>First-stage ranking<\/span><\/h3>\n<p><span>In the first-stage ranking our old friend the Two Tower NN comes into play again because of its cacheability property.\u00a0<\/span><\/p>\n<p><span>Even though the model architecture could be similar to retrieval, the learning objective differs quite a bit: We train the first stage ranker to predict the output of the second stage with the label:<\/span><\/p>\n<p><span>PSelect = { media in top K results ranked by the second stage}\u00a0<\/span><\/p>\n<p><span>We can view this approach as a way of distilling knowledge from a bigger second-stage model to a smaller (more light-weight) first-stage model.<\/span><\/p>\n<p>Two Tower inference with caching on the both the user and item side.<\/p>\n<h3><span>Second-stage ranking<\/span><\/h3>\n<p><span>After the first stage we apply the second-stage ranker, which predicts the probability of different engagement events (click, like, etc.) using the multi-task multi label (MTML) neural network model.<\/span><\/p>\n<p><span>The MTML model is much heavier than the Two Towers model. But it can also consume the most powerful user-item interaction features.<\/span><\/p>\n<p><span>Applying a much heavier MTML model during peak hours could be tricky. That\u2019s why we precompute recommendations for some users during off-peak hours. This helps ensure the availability of our recommendations for every Explore user.<\/span><\/p>\n<p><span>In order to produce a final score that we can use for ordering of ranked items, predicted probabilities for P(click), P(like), P(see less), etc. could be combined with weights W_click, W_like, and W_see_less using a formula that we call value model (VM).<\/span><\/p>\n<p><span>VM is our approximation of the value that each media brings to a user.<\/span><\/p>\n<p><span>Expected Value = W_click * P(click) + W_like * P(like) \u2013 W_see_less * P(see less) + etc.<\/span><\/p>\n<p><span>Tuning the weights of the VM allows us to explore different tradeoffs between online engagement metrics.<\/span><\/p>\n<p><span>For example, by using higher W_like weight, final ranking will pay more attention to the probability of a user liking a post. Because different people might have different interests in regards to how they interact with recommendations it\u2019s very important that different signals are taken into account. The end goal of tuning weights is to find a good tradeoff that maximizes our goals without hurting other important metrics.<\/span><\/p>\n<h2><span>Final reranking<\/span><\/h2>\n<p><span>Simply returning results sorted with reference to the final VM score might not be always a good idea.<\/span> <span>For example, we might want to filter-out\/downrank some items based on integrity-related scores (e.g.,<\/span> <a href=\"https:\/\/ai.facebook.com\/blog\/harmful-content-can-evolve-quickly-our-new-ai-system-adapts-to-tackle-it\/\" target=\"_blank\" rel=\"noopener\"><span>removing potentially harmful content<\/span><\/a><span>).<\/span><\/p>\n<p><span>Also, in case we would like to increase the diversity of results, we might shuffle items based on some business rules (e.g., \u201cDo not show items from the same authors in a sequence\u201d).<\/span><\/p>\n<p><span>Applying these sorts of rules allows us to have a much better control over the final recommendations, which helps to achieve better online engagement.<\/span><\/p>\n<h2><span>Parameters tuning\u00a0<\/span><\/h2>\n<p><span>As you can imagine, there are literally hundreds of tunable parameters that control the behavior of the system (e.g., weights of VM, number of items to fetch from a particular source, number of items to rank, etc.).<\/span><\/p>\n<p><span>To achieve good online results, it\u2019s important to identify the most important parameters and to figure out how to tune them.<\/span><\/p>\n<p><span>There are two popular approaches to parameters tuning: Bayesian optimization and offline tuning.<\/span><\/p>\n<h3><span>Bayesian optimization<\/span><\/h3>\n<p><span>Bayesian optimization (BO) allows us to run parameters tuning online.<\/span><\/p>\n<p><span>The main advantage of this approach is that it only requires us to specify a set of parameters to tune, the goal optimization objective (i.e., goal metric), and the regressions thresholds for some other metrics, leaving the rest to the BO.<\/span><\/p>\n<p><span>The main disadvantage is that it usually requires a lot of time for the optimization process to converge (sometimes more than a month) especially when dealing with a lot of parameters and with low-sensitivity online metrics.<\/span><\/p>\n<p><span>We can make things faster by following the next approach.<\/span><\/p>\n<h3><span>Offline tuning<\/span><\/h3>\n<p><span>If we have access to enough historical data in the form of offline and online metrics, we can learn functions that map changes in offline metrics into changes in online metrics.<\/span><\/p>\n<p><span>Once we have such learned functions, we can try different values offline for parameters and see how offline metrics translate into potential changes in online metrics.<\/span><\/p>\n<p><span>To make this offline process more efficient, we can use BO techniques.<\/span><\/p>\n<p><span>The main advantage of offline tuning compared with online BO is that it requires a lot less time to set up an experiment (hours instead of weeks). However, it requires a strong correlation between offline and online metrics.<\/span><\/p>\n<h2><span>The growing complexity of ranking for Explore<\/span><\/h2>\n<p><span>The work we\u2019ve described here is far from done. Our systems\u2019 growing complexity will pose new challenges in terms of maintainability and feedback loops. To address these challenges, we plan to continue improving our current models and adopting new ranking models and retrieval sources. We\u2019re also investigating how to consolidate our retrieval strategies into a smaller number of highly customizable ML algorithms.<\/span><\/p>\n<p>The post <a href=\"https:\/\/engineering.fb.com\/2023\/08\/09\/ml-applications\/scaling-instagram-explore-recommendations-system\/\">Scaling the Instagram Explore recommendations system<\/a> appeared first on <a href=\"https:\/\/engineering.fb.com\/\">Engineering at Meta<\/a>.<\/p>\n<p>Engineering at Meta<\/p>","protected":false},"excerpt":{"rendered":"<p>Explore is one of the largest recommendation systems on Instagram. We leverage machine learning to make sure people are always seeing content that is the most interesting and relevant to them. Using more advanced machine learning models, like Two Towers neural networks, we\u2019ve been able to make the Explore recommendation system even more scalable and&hellip; <a class=\"more-link\" href=\"https:\/\/fde.cat\/index.php\/2023\/08\/09\/scaling-the-instagram-explore-recommendations-system\/\">Continue reading <span class=\"screen-reader-text\">Scaling the Instagram Explore recommendations system<\/span><\/a><\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","footnotes":""},"categories":[7],"tags":[],"class_list":["post-744","post","type-post","status-publish","format-standard","hentry","category-technology","entry"],"jetpack_featured_media_url":"","jetpack-related-posts":[{"id":620,"url":"https:\/\/fde.cat\/index.php\/2022\/08\/12\/how-instagram-suggests-new-content\/","url_meta":{"origin":744,"position":0},"title":"How Instagram suggests new content","date":"August 12, 2022","format":false,"excerpt":"A touring alien from a galaxy far, far away is an avid Instagram user. Her Instagram Feed is dominated by: Friends and family posts Some space travel magazines A few general news accounts Lots of science fiction blogs She logs in, scrolls through her feed gently \u2014 catching up with\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":165,"url":"https:\/\/fde.cat\/index.php\/2021\/01\/26\/how-machine-learning-powers-facebooks-news-feed-ranking-algorithm\/","url_meta":{"origin":744,"position":1},"title":"How machine learning powers Facebook\u2019s News Feed ranking algorithm","date":"January 26, 2021","format":false,"excerpt":"Designing a personalized ranking system for more than 2 billion people (all with different interests) and a plethora of content to select from presents significant, complex challenges. This is something we tackle every day with News Feed ranking. Without machine learning (ML), people\u2019s News Feeds could be flooded with content\u2026","rel":"","context":"In &quot;External&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":700,"url":"https:\/\/fde.cat\/index.php\/2023\/04\/11\/3-ways-salesforce-takes-ai-research-to-the-next-level\/","url_meta":{"origin":744,"position":2},"title":"3 Ways Salesforce Takes AI Research to the Next Level","date":"April 11, 2023","format":false,"excerpt":"In our \u201cEngineering Energizers\u201d Q&A series, we examine the life experiences and career paths that have shaped Salesforce engineering leaders. Meet Shelby Heinecke, a research manager for the Salesforce AI team. Shelby leads her diverse team on a variety of projects, ranging from identity resolution to recommendation systems to conversational\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":773,"url":"https:\/\/fde.cat\/index.php\/2023\/10\/18\/how-meta-is-creating-custom-silicon-for-ai\/","url_meta":{"origin":744,"position":3},"title":"How Meta is creating custom silicon for AI","date":"October 18, 2023","format":false,"excerpt":"With the recent launches of MTIA v1,\u00a0 Meta\u2019s first-generation AI inference accelerator, and Llama 2,\u00a0 the next generation of Meta\u2019s publicly available large language model, it\u2019s clear that Meta is focused on advancing AI for a more connected world. Fueling the success of these products are world-class infrastructure teams, including\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":646,"url":"https:\/\/fde.cat\/index.php\/2022\/10\/31\/improving-instagram-notification-management-with-machine-learning-and-causal-inference\/","url_meta":{"origin":744,"position":4},"title":"Improving Instagram notification management with machine learning and causal inference","date":"October 31, 2022","format":false,"excerpt":"We\u2019re sharing how Meta is applying statistics and machine learning (ML) to improve notification personalization and management on Instagram \u2013 particularly on daily digest push notifications. By using causal inference and ML to identify highly active users who are likely to see more content organically, we have been able to\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":652,"url":"https:\/\/fde.cat\/index.php\/2022\/11\/16\/move-faster-wait-less-improving-code-review-time-at-meta\/","url_meta":{"origin":744,"position":5},"title":"Move faster, wait less: Improving code review time at Meta","date":"November 16, 2022","format":false,"excerpt":"Code reviews are one of the most important parts of the software development process At Meta we\u2019ve recognized the need to make code reviews as fast as possible without sacrificing quality We\u2019re sharing several tools and steps we\u2019ve taken at Meta to reduce the time waiting for code reviews When\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/744","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/comments?post=744"}],"version-history":[{"count":0,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/744\/revisions"}],"wp:attachment":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/media?parent=744"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/categories?post=744"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/tags?post=744"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}