{"id":802,"date":"2023-12-12T22:13:00","date_gmt":"2023-12-12T22:13:00","guid":{"rendered":"https:\/\/fde.cat\/index.php\/2023\/12\/12\/developing-the-new-xgen-salesforces-foundational-large-language-models\/"},"modified":"2023-12-12T22:13:00","modified_gmt":"2023-12-12T22:13:00","slug":"developing-the-new-xgen-salesforces-foundational-large-language-models","status":"publish","type":"post","link":"https:\/\/fde.cat\/index.php\/2023\/12\/12\/developing-the-new-xgen-salesforces-foundational-large-language-models\/","title":{"rendered":"Developing the New XGen: Salesforce\u2019s Foundational Large Language Models"},"content":{"rendered":"<p><em>By Shafiq Rayhan Joty and Scott Nyberg<\/em><\/p>\n<p>In our \u201cEngineering Energizers\u201d Q&amp;A series, we examine the professional journeys that have shaped Salesforce Engineering leaders. Meet Shafiq Rayhan Joty, a Director at Salesforce AI Research. Shafiq co-leads the development of<a href=\"https:\/\/blog.salesforceairesearch.com\/xgen\/\"> XGen<\/a>, a series of groundbreaking large language models (LLMs) of different sizes.<\/p>\n<p>Delivering critical general knowledge, XGen serves as the initial foundational model from which Salesforce AI teams use, adapting the model through fine-tuning or continued pre-training to create safe, trusted, and customized models for distinct domains and use cases, supporting sales, service, and more.<\/p>\n<p><em>Shafiq dives deeper into XGen\u2019s role as a foundational model.<\/em><\/p>\n<p>Read on to discover how Shafiq\u2019s XGen team pushes the limits of LLMs to drive AI innovation and meet Salesforce customers\u2019 evolving needs.<\/p>\n<p><strong>What technical challenges did the XGen development team encounter?<\/strong><\/p>\n<div class=\"wp-block-group is-layout-constrained wp-container-1 wp-block-group-is-layout-constrained\">\n<p>From performing massive data collection to training the colossal model and fine-tuning it for unpredictable user needs, XGen\u2019s development journey posed multiple challenges.<\/p>\n<p><strong>Collecting the data:<\/strong> To train the model effectively and at a large scale, the team required a vast volume of high quality data. Leveraging their extensive experience in data mixing, the team assembled a massive and diverse dataset, drawing from public sources such as Common Crawl and code domains like GitHub. This enabled them to scale the training data, reaching an astonishing 2 trillion+ tokens, while curating a safe, unbiased, well-rounded, and legally compliant dataset that was derived from diverse knowledge domains.<\/p>\n<p><strong>Cleaning pre-training data<\/strong>: Cleaning the data at a tremendous scale presents a substantial hurdle as the team needed to eliminate toxicity, manage copyright issues, and ensure data quality. To address this, the team closely collaborated with Salesforce legal and ethics experts to establish a robust data cleaning pipeline, integrating model-based and keyword-based methods for high accuracy.<\/p>\n<p><strong>Modeling:<\/strong> The team navigated the complexities of training its large-scale LLM on Google\u2019s TPUs by integrating technologies such as <a href=\"https:\/\/arxiv.org\/abs\/2205.14135\">Flash Attention<\/a> and <a href=\"https:\/\/arxiv.org\/pdf\/2004.05150v2.pdf\">Sliding Window Attention<\/a>. This helped drive the fast and efficient modeling process, ensuring the model could manage the intricacies of various tasks.<\/p>\n<p><strong>Fine-tuning the model:<\/strong> The team was challenged to fine-tune its model to support user needs that were not anticipated during initial training. This involved interpreting diverse tasks and instructions, complicated by the random ways users interact with the model. Fine-tuning was a key phase for aligning the model with real-world user requirements and values. The team used both the standard supervised fine-tuning and learning methods from human\/AI feedback.<\/p>\n<\/div>\n<p><strong>How was the XGen model trained?<\/strong><\/p>\n<p>XGen\u2019s training process unfolded over multiple stages. In the <strong>pre-training stage<\/strong>, the model established its foundational knowledge about world and language to support various applications and domains. This stage did not involve any human annotation and the model was trained to simply predict the next token given a context of previous tokens, drawing from raw text data on a colossal scale, usually using trillions of tokens.<\/p>\n<p>Next, during the <strong>fine-tuning stage<\/strong>, the model was trained to interact with users just as a human would. Humans supervised this stage, which incorporated techniques like supervised fine-tuning and <a href=\"https:\/\/medium.com\/generative-ai-insights-for-business-leaders-and\/what-is-rlhf-reinforcement-learning-from-human-feedback-876da930bf16\">reinforcement learning from human feedback<\/a> to help the model understand and deliver what users need \u2013 learning their intent via numerous task instructions and the corresponding output or feedback on model generated outputs. Also at this stage, the team ensured that the model maintained ethical considerations \u2013 ensuring safety and legal compliance.<\/p>\n<p>Lastly, the <strong>evaluation stage<\/strong> measured the model\u2019s ability to perform tasks unseen during training, confirming its robust generalization across a diverse set of tasks. The team also benchmarked the model\u2019s performance against open source and closed-source counterparts \u2013 ensuring its accuracy.<\/p>\n<p><em>Shafiq shares more about the training process.<\/em><\/p>\n<div class=\"wp-block-group is-layout-constrained wp-block-group-is-layout-constrained\">\n<p><strong>How does XGen distinguish itself from general-purpose, external LLMs?<\/strong><\/p>\n<div class=\"wp-block-group is-layout-constrained wp-container-2 wp-block-group-is-layout-constrained\">\n<p>The model separates itself from general-purpose LLMs in three major ways:<\/p>\n<p><strong>Supports unique customer needs<\/strong>. While general purpose LLMs support a wide range of public needs, XGen was specifically tailored to power private Salesforce-based use cases and meet our customers\u2019 unique requirements.<\/p>\n<p><strong>Ensures data privacy<\/strong>. Unlike external API-based sources, XGen drives stringent data privacy controls. This complies with Salesforce\u2019s restrictions on keeping customer data inside its own secured platform. Consequently, the model remains the sole viable solution for highly sensitive industries like banking, where it is impractical to share customers\u2019 information with third-party LLMs and data privacy remains paramount.<\/p>\n<p><strong>Reduced cost to serve<\/strong>. By laser-focusing on Salesforce customer use cases, XGen is a smaller scale model that is customized for particular domains (use cases). As a result, its reduced size decreases its inference cost due to lessened computational needs.<\/p>\n<\/div>\n<\/div>\n<p><em>A look at validation set perplexity for XGen pre-trained models with different context sizes.<\/em><\/p>\n<p><strong>What are the significant new and upcoming developments for XGen?<\/strong><\/p>\n<p>XGen\u2019s 7 billion parameter model will soon evolve into much larger models, surpassing other open-source models at the same scales while subsequently leveraging a <a href=\"https:\/\/arxiv.org\/abs\/1701.06538\">Mixture of Experts<\/a> (MoE) architecture. Building this more powerful model involves leveraging experience from the 7 billion parameter model, which kept development costs reasonable because the team was not starting from scratch.<\/p>\n<p>This new model can serve as a teacher, driving knowledge distillation to smaller, cost-effective models which, in turn, support unique domains.<\/p>\n<p>Looking ahead, the team is developing XGen Mobile, a 4 billion parameter model version of XGen. This innovation allows customers to install XGen directly on their phones, removing the need for an Internet connection. As a result, users are empowered with on-the-go access to XGen, no matter the setting. For example, using XGen Mobile, field service agents could extract information from their offline documents, enabling them to efficiently complete jobs such as fixing appliances while on the go.<\/p>\n<div class=\"wp-block-group is-layout-constrained wp-container-4 wp-block-group-is-layout-constrained\">\n<p><strong>Learn more<\/strong><\/p>\n<p>Interested in more AI stories? <a href=\"https:\/\/engineering.salesforce.com\/einstein-for-flow-bringing-ai-innovation-to-the-next-generation-of-automation\/\">Read this blog<\/a> to learn how Salesforce\u2019s cross-cloud Scrum team built Einstein for Flow, a game-changing AI product that revolutionizes Salesforce workflow automation.<\/p>\n<p>Stay connected \u2014 join our <a href=\"https:\/\/careers.mail.salesforce.com\/w2?cid=7017y00000CRDS7AAP\">Talent Community<\/a>!<\/p>\n<p><a href=\"https:\/\/www.salesforce.com\/company\/careers\/teams\/tech-and-product\/?d=cta-tms-tp-2\">Check out our Technology and Product teams<\/a> to learn how you can get involved.<\/p>\n<\/div>\n<p>The post <a href=\"https:\/\/engineering.salesforce.com\/developing-the-new-xgen-salesforces-foundational-large-language-models\/\">Developing the New XGen: Salesforce\u2019s Foundational Large Language Models<\/a> appeared first on <a href=\"https:\/\/engineering.salesforce.com\/\">Salesforce Engineering Blog<\/a>.<\/p>\n<p><a href=\"https:\/\/engineering.salesforce.com\/developing-the-new-xgen-salesforces-foundational-large-language-models\/\" target=\"_blank\" class=\"feedzy-rss-link-icon\" rel=\"noopener\">Read More<\/a><\/p>","protected":false},"excerpt":{"rendered":"<p>By Shafiq Rayhan Joty and Scott Nyberg In our \u201cEngineering Energizers\u201d Q&amp;A series, we examine the professional journeys that have shaped Salesforce Engineering leaders. Meet Shafiq Rayhan Joty, a Director at Salesforce AI Research. Shafiq co-leads the development of XGen, a series of groundbreaking large language models (LLMs) of different sizes. Delivering critical general knowledge,&hellip; <a class=\"more-link\" href=\"https:\/\/fde.cat\/index.php\/2023\/12\/12\/developing-the-new-xgen-salesforces-foundational-large-language-models\/\">Continue reading <span class=\"screen-reader-text\">Developing the New XGen: Salesforce\u2019s Foundational Large Language Models<\/span><\/a><\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","footnotes":""},"categories":[7],"tags":[],"class_list":["post-802","post","type-post","status-publish","format-standard","hentry","category-technology","entry"],"jetpack_featured_media_url":"","jetpack-related-posts":[{"id":873,"url":"https:\/\/fde.cat\/index.php\/2024\/06\/04\/revolutionizing-ai-how-sagemaker-enhances-salesforce-einsteins-large-language-model-latency-and-throughput\/","url_meta":{"origin":802,"position":0},"title":"Revolutionizing AI: How SageMaker Enhances Salesforce Einstein\u2019s Large Language Model Latency and Throughput","date":"June 4, 2024","format":false,"excerpt":"Written by Pawan Agarwal and Peiheng Hu In our \u201cEngineering Energizers\u201d Q&A series, we explore the transformative journeys of Salesforce engineering leaders who are spearheading significant advancements in their fields. Today, we meet Pawan Agarwal, Senior Director of Software Engineering, who leads the Einstein AI Platform team \u2014 a team\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":785,"url":"https:\/\/fde.cat\/index.php\/2023\/11\/07\/einstein-for-flow-bringing-ai-innovation-to-the-next-generation-of-automation\/","url_meta":{"origin":802,"position":1},"title":"Einstein for Flow: Bringing AI Innovation to the Next Generation of Automation","date":"November 7, 2023","format":false,"excerpt":"By Vera Vetter, Zeyuan Chen, Ran Xu, and Scott Nyberg In our \u201cEngineering Energizers\u201d Q&A series, we examine the professional journeys that have shaped Salesforce Engineering leaders. Meet Vera Vetter, Product Management Director for Salesforce AI Research and a co-Product Manager for Einstein for Flow, a game-changing AI product that\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":887,"url":"https:\/\/fde.cat\/index.php\/2024\/06\/25\/the-future-of-ai-testing-salesforces-next-gen-framework-for-ai-model-performance\/","url_meta":{"origin":802,"position":2},"title":"The Future of AI Testing: Salesforce\u2019s Next Gen Framework for AI Model Performance","date":"June 25, 2024","format":false,"excerpt":"In our \u201cEngineering Energizers\u201d Q&A series, we explore the innovative minds shaping the future of Salesforce engineering. Today, we meet Erwin Karbasi, who leads the development of the Salesforce Central Evaluation Framework (SF Eval), a revolutionary internal tool used by Salesforce engineers to assess the performance of generative AI models.\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":813,"url":"https:\/\/fde.cat\/index.php\/2024\/01\/16\/inside-ai-research-conquering-complex-challenges-to-power-next-generation-innovations\/","url_meta":{"origin":802,"position":3},"title":"Inside AI Research: Conquering Complex Challenges to Power Next Generation Innovations","date":"January 16, 2024","format":false,"excerpt":"By Yingbo Zhou and Scott Nyberg In our \u201cEngineering Energizers\u201d Q&A series, we examine the professional life experiences that have shaped Salesforce Engineering leaders. Meet Yingbo Zhou, a Senior Director of Research for Salesforce AI Research, where he leads his team to advance AI \u2014 focusing on the fields of\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":804,"url":"https:\/\/fde.cat\/index.php\/2023\/12\/19\/unveiling-salesforces-blueprint-for-sustainable-ai-where-responsibility-meets-innovation\/","url_meta":{"origin":802,"position":4},"title":"Unveiling Salesforce\u2019s Blueprint for Sustainable AI: Where Responsibility Meets Innovation","date":"December 19, 2023","format":false,"excerpt":"Salesforce is guided by its core values of trust, customer success, innovation, equality, and sustainability. These values are reflected in its commitment to responsibly develop and deploy new technologies like generative AI on behalf of stakeholders \u2014 from shareholders to customers to the planet. The Large Language Models (LLMs) that\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":848,"url":"https:\/\/fde.cat\/index.php\/2024\/04\/01\/unveiling-the-cutting-edge-features-of-ml-console-for-ai-model-lifecycle-management\/","url_meta":{"origin":802,"position":5},"title":"Unveiling the Cutting-Edge Features of ML Console for AI Model Lifecycle Management","date":"April 1, 2024","format":false,"excerpt":"In our \u201cEngineering Energizers\u201d Q&A series, we explore the journeys of engineering leaders who have made remarkable contributions in their fields. Today, we meet Venkat Krishnamani, a Lead Member of the Technical Staff for Salesforce Engineering and the lead engineer for Salesforce Einstein\u2019s Machine Learning (ML) Console. This vital tool\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/802","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/comments?post=802"}],"version-history":[{"count":0,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/802\/revisions"}],"wp:attachment":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/media?parent=802"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/categories?post=802"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/tags?post=802"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}