{"id":774,"date":"2023-10-17T19:13:14","date_gmt":"2023-10-17T19:13:14","guid":{"rendered":"https:\/\/fde.cat\/index.php\/2023\/10\/17\/how-the-einstein-team-operationalizes-ai-models-at-lightning-speed-and-massive-scale\/"},"modified":"2023-10-17T19:13:14","modified_gmt":"2023-10-17T19:13:14","slug":"how-the-einstein-team-operationalizes-ai-models-at-lightning-speed-and-massive-scale","status":"publish","type":"post","link":"https:\/\/fde.cat\/index.php\/2023\/10\/17\/how-the-einstein-team-operationalizes-ai-models-at-lightning-speed-and-massive-scale\/","title":{"rendered":"How the Einstein Team Operationalizes AI Models at Lightning Speed and Massive Scale"},"content":{"rendered":"<p><em>By Yuliya Feldman and Scott Nyberg<\/em><\/p>\n<p>In our \u201cEngineering Energizers\u201d Q&amp;A series, we examine the professional life experiences that have shaped Salesforce Engineering leaders. Meet Yuliya Feldman, a Software Engineering Architect at Salesforce. Yuliya works on <a href=\"https:\/\/www.salesforce.com\/products\/einstein-ai-solutions\/\">Salesforce Einstein\u2019s<\/a> Machine Learning Services team, responsible for operationalizing AI models, which serves as the engine behind Salesforce\u2019s generative AI products.<\/p>\n<p>Read on to learn how Yuliya\u2019s team overcomes critical engineering challenges to help create the future of generative AI.<\/p>\n<h4 class=\"wp-block-heading\"><strong>What is your team\u2019s AI mission?<\/strong><\/h4>\n<p>The team\u2019s mission is to make AI models operational \u2014 enabling them to support real-world scenarios. After research scientists create their generative AI models, our team provides a feature-rich infrastructure framework for ensuring customers have a clear path to the right model that helps them rapidly receive answers to their queries.<\/p>\n<p><em>A look at the team\u2019s AI platform in action.<\/em><\/p>\n<h4 class=\"wp-block-heading\"><strong>How do you define AI model operationalization?<\/strong><\/h4>\n<p>Model operationalization focuses on transforming trained machine learning models into useful tools for our customers. This transformation involves several phases:<\/p>\n<p><strong>Model storage:<\/strong> The trained model\u2019s data archive \u2014 comprised of weights and metadata needed during inferencing time that constitute the knowledge gained during training \u2014 is stored for accessibility.<\/p>\n<p><strong>Code integration<\/strong>: The model\u2019s data archive is conjoined with additional code, which translates the model\u2019s data, deciphers required actions to take based on input, and delivers results to customers.<\/p>\n<p><strong>Access and registration: <\/strong>The model and its related code must be registered \u2014 a process that specifies the attributes and locations for model access. This enables customers to use the model\u2019s services.<\/p>\n<p><strong>Model execution and scaling<\/strong>: Running the model requires the right hardware and software. Tools such as <a href=\"https:\/\/aws.amazon.com\/sagemaker\/\">AWS SageMaker<\/a>, Triton, and custom containers play a key role in loading, executing, and efficiently scaling multiple models or large models. Optimizing memory usage and incorporating intelligent routing also help drive the scaling process.<\/p>\n<h4 class=\"wp-block-heading\"><strong>How does your team contribute to the AI model operationalization process?<\/strong><\/h4>\n<p>Our talented team streamlines the complex operationalization process, ensuring that models are accessible, scalable, and feature-rich \u2014 meeting the specific needs of various customer use cases. Here\u2019s a look at what we do:<\/p>\n<p><strong>Model upload:<\/strong> We provide pipelines and guidelines for ensuring that models are smoothly uploaded into our serving infrastructure and are ready for future use.<\/p>\n<p><strong>Operationalization customization<\/strong>: After models are uploaded, our team\u2019s problem-solving skills kick into high gear, where we can optimize latency, throughput, and scalability, customizing each operationalization to satisfy our customers\u2019 specific use case requirements.<\/p>\n<p><strong>Feature enhancement<\/strong>. Typically, different use cases require distinct features. Consequently, the team may enhance platform capabilities to support new set of features.<\/p>\n<p><strong>Intelligent routing.<\/strong> To support instances of multiple models or complicated use cases that require data fetching and processing prior to performing predictions, our team develops intelligent routing strategies \u2014 ensuring seamless routing and complex inferencing pipelines execution.<\/p>\n<p><strong>Production<\/strong>: Once operationalized, the model moves into production, where our team leverages alerts and monitoring systems to detect any issues and provides quick triage for any issues, collaborating with other teams if needed.<\/p>\n<h4 class=\"wp-block-heading\"><strong>What are a couple big AI modeling challenges your team has recently tackled?<\/strong><\/h4>\n<p>One key challenge we faced was running various versions of the same AI model for different Salesforce tenant organizations. This required us to run the versions concurrently while ensuring that each tenant\u2019s requests were routed to the right model version. To address the challenge, we created additional logic and closely collaborated with other AI teams to manage routing based on tenant information and model metadata.<\/p>\n<p>Another challenge was managing thousands of AI models. Providing each model with its own container endpoint is unfeasible due to the tremendous amount of hardware that would be required. Nor could we load thousands of models in one container due to memory limitations. Consequently, our team pivoted, distributing models across multiple shared containers. This ultimately supported the efficient routing of customer data queries to the correct container.<\/p>\n<h4 class=\"wp-block-heading\"><strong>What risks does your team face in implementing your solution for customers?<\/strong><\/h4>\n<p>The team needed to design our framework to be horizontally scalable, supporting throughput and latency. Throughput equates to how many requests the framework processes per time unit. Maintaining this balance is challenging when the framework\u2019s capacity becomes strained. Ultimately, with each use case, we must support a variety of SLA requirements.<\/p>\n<div class=\"wp-block-group is-layout-constrained wp-container-1 wp-block-group-is-layout-constrained\">\n<p>Additionally, to mitigate performance risks, we regularly conduct performance testing, asking questions such as:<\/p>\n<p>Is our solution working as intended?<\/p>\n<p>How can we improve performance, especially when incorporating new features?<\/p>\n<\/div>\n<p>By focusing on these concerns, our team constantly analyzes and adapts our framework to ensure we meet our customers\u2019 ever-evolving needs.<\/p>\n<h4 class=\"wp-block-heading\"><strong>How does your framework help improve the generative AI experience for customers?<\/strong><\/h4>\n<p>We\u2019re focused on delivering a smoother and more satisfying customer experience in the generative AI space, where low latency is now a key requirement.<\/p>\n<p>For example, in the field of generative AI code generation, the customer query process can be quite lengthy. Some requests take more than 20 seconds to deliver a response, leading to a less satisfying user experience. This challenge sparked a new feature request: the ability to stream real-time responses. This would enable customers to watch their response be delivered line by line while it\u2019s being generated.<\/p>\n<p>This led us to enhance our framework to include model-serving services\u2019 (e.g. <a href=\"https:\/\/developer.nvidia.com\/triton-inference-server\">Triton Server,<\/a> SageMaker) response streaming capabilities, which will enable us to offer a highly fluid, real-time experience for Salesforce\u2019s Gen AI customers.<\/p>\n<p>The ability to stream lies in stark contrast with our other APIs, which used an synchronous request-response model that provided responses after the request was processed.<\/p>\n<h4 class=\"wp-block-heading\"><strong>L<\/strong><strong>earn more<\/strong><\/h4>\n<p>Hungry for more AI stories? Read this <a href=\"https:\/\/engineering.salesforce.com\/revealing-the-newest-data-science-tool-speeding-ai-development-and-securing-customer-data\/\">blog post<\/a> to explore how Salesforce accelerates AI development while keeping customer data secure.<\/p>\n<p>Stay connected \u2014 join our <a href=\"https:\/\/careers.mail.salesforce.com\/w2?cid=7017y00000CRDS7AAP\">Talent Community<\/a>!<\/p>\n<p><a href=\"https:\/\/www.salesforce.com\/company\/careers\/teams\/tech-and-product\/?d=cta-tms-tp-2\">Check out our Technology and Product teams<\/a> to learn how you can get involved.<\/p>\n<p>The post <a href=\"https:\/\/engineering.salesforce.com\/how-the-einstein-team-operationalizes-ai-models-at-lightning-speed-and-massive-scale\/\">How the Einstein Team Operationalizes AI Models at Lightning Speed and Massive Scale<\/a> appeared first on <a href=\"https:\/\/engineering.salesforce.com\/\">Salesforce Engineering Blog<\/a>.<\/p>\n<p><a href=\"https:\/\/engineering.salesforce.com\/how-the-einstein-team-operationalizes-ai-models-at-lightning-speed-and-massive-scale\/\" target=\"_blank\" class=\"feedzy-rss-link-icon\" rel=\"noopener\">Read More<\/a><\/p>","protected":false},"excerpt":{"rendered":"<p>By Yuliya Feldman and Scott Nyberg In our \u201cEngineering Energizers\u201d Q&amp;A series, we examine the professional life experiences that have shaped Salesforce Engineering leaders. Meet Yuliya Feldman, a Software Engineering Architect at Salesforce. Yuliya works on Salesforce Einstein\u2019s Machine Learning Services team, responsible for operationalizing AI models, which serves as the engine behind Salesforce\u2019s generative&hellip; <a class=\"more-link\" href=\"https:\/\/fde.cat\/index.php\/2023\/10\/17\/how-the-einstein-team-operationalizes-ai-models-at-lightning-speed-and-massive-scale\/\">Continue reading <span class=\"screen-reader-text\">How the Einstein Team Operationalizes AI Models at Lightning Speed and Massive Scale<\/span><\/a><\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","footnotes":""},"categories":[7],"tags":[],"class_list":["post-774","post","type-post","status-publish","format-standard","hentry","category-technology","entry"],"jetpack_featured_media_url":"","jetpack-related-posts":[{"id":789,"url":"https:\/\/fde.cat\/index.php\/2023\/10\/17\/how-the-einstein-team-operationalizes-ai-models-at-lightning-speed-and-massive-scale-2\/","url_meta":{"origin":774,"position":0},"title":"How the Einstein Team Operationalizes AI Models at Lightning Speed and Massive Scale","date":"October 17, 2023","format":false,"excerpt":"By Yuliya Feldman and Scott Nyberg In our \u201cEngineering Energizers\u201d Q&A series, we examine the professional life experiences that have shaped Salesforce Engineering leaders. Meet Yuliya Feldman, a Software Engineering Architect at Salesforce. Yuliya works on Salesforce Einstein\u2019s Machine Learning Services team, responsible for operationalizing AI models, which serve as\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":785,"url":"https:\/\/fde.cat\/index.php\/2023\/11\/07\/einstein-for-flow-bringing-ai-innovation-to-the-next-generation-of-automation\/","url_meta":{"origin":774,"position":1},"title":"Einstein for Flow: Bringing AI Innovation to the Next Generation of Automation","date":"November 7, 2023","format":false,"excerpt":"By Vera Vetter, Zeyuan Chen, Ran Xu, and Scott Nyberg In our \u201cEngineering Energizers\u201d Q&A series, we examine the professional journeys that have shaped Salesforce Engineering leaders. Meet Vera Vetter, Product Management Director for Salesforce AI Research and a co-Product Manager for Einstein for Flow, a game-changing AI product that\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":866,"url":"https:\/\/fde.cat\/index.php\/2024\/05\/15\/revealing-einsteins-blueprint-for-creating-the-new-unified-ai-platform-from-siloed-legacy-stacks\/","url_meta":{"origin":774,"position":2},"title":"Revealing Einstein\u2019s Blueprint for Creating the New, Unified AI Platform from Siloed Legacy Stacks","date":"May 15, 2024","format":false,"excerpt":"In our insightful \u201cEngineering Energizers\u201d Q&A series, we delve into the inspiring journeys of engineering leaders who have achieved remarkable success in their specific domains. Today, we meet Indira Iyer, Senior Vice President of Salesforce Engineering, leading Salesforce Einstein development. Her team\u2019s mission is to build Salesforce\u2019s next-gen AI Platform,\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":804,"url":"https:\/\/fde.cat\/index.php\/2023\/12\/19\/unveiling-salesforces-blueprint-for-sustainable-ai-where-responsibility-meets-innovation\/","url_meta":{"origin":774,"position":3},"title":"Unveiling Salesforce\u2019s Blueprint for Sustainable AI: Where Responsibility Meets Innovation","date":"December 19, 2023","format":false,"excerpt":"Salesforce is guided by its core values of trust, customer success, innovation, equality, and sustainability. These values are reflected in its commitment to responsibly develop and deploy new technologies like generative AI on behalf of stakeholders \u2014 from shareholders to customers to the planet. The Large Language Models (LLMs) that\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":733,"url":"https:\/\/fde.cat\/index.php\/2023\/07\/11\/how-is-salesforce-einstein-optimizing-ai-classification-model-accuracy\/","url_meta":{"origin":774,"position":4},"title":"How is Salesforce Einstein Optimizing AI Classification Model Accuracy?","date":"July 11, 2023","format":false,"excerpt":"In our \u201cEngineering Energizers\u201d Q&A series, we examine the professional journeys that have shaped Salesforce Engineering leaders. Meet Matan Rabi, Senior Software Engineer on Salesforce Einstein\u2019s Machine Learning Observability Platform (MLOP) team. Matan and his team strive to optimize the accuracy of Einstein\u2019s AI classification models, empowering customers across industries\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":854,"url":"https:\/\/fde.cat\/index.php\/2024\/04\/12\/innovating-tableau-pulse-hurdling-ai-integration-and-scalability-obstacles-for-next-gen-customer-insights\/","url_meta":{"origin":774,"position":5},"title":"Innovating Tableau Pulse: Hurdling AI Integration and Scalability Obstacles for Next-Gen Customer Insights","date":"April 12, 2024","format":false,"excerpt":"In our \u201cEngineering Energizers\u201d Q&A series, we examine the inspiring paths of engineering leaders who have made remarkable strides in their respective fields. Today, we meet Harini Nallan Chakrawarthy, Vice President of Software Engineering, who leads the development of Tableau Pulse. This new Salesforce feature that uses generative AI to\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/774","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/comments?post=774"}],"version-history":[{"count":0,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/774\/revisions"}],"wp:attachment":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/media?parent=774"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/categories?post=774"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/tags?post=774"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}