{"id":705,"date":"2023-04-18T02:12:00","date_gmt":"2023-04-18T02:12:00","guid":{"rendered":"https:\/\/fde.cat\/index.php\/2023\/04\/18\/ai-based-identity-resolution-the-key-for-linking-diverse-customer-data\/"},"modified":"2023-04-18T02:12:00","modified_gmt":"2023-04-18T02:12:00","slug":"ai-based-identity-resolution-the-key-for-linking-diverse-customer-data","status":"publish","type":"post","link":"https:\/\/fde.cat\/index.php\/2023\/04\/18\/ai-based-identity-resolution-the-key-for-linking-diverse-customer-data\/","title":{"rendered":"AI-based Identity Resolution: The Key for Linking Diverse Customer Data"},"content":{"rendered":"<p>Companies want a comprehensive view of their customers, enabling them to solve business and marketing challenges, such as personalization, segmentation, and targeting \u2014 but they face an uphill battle as they are drowning in data. For example, many companies cannot match the identity of a customer who visits their website with the same customer who visits their store. <a href=\"https:\/\/www.wsj.com\/articles\/data-deluge-businesses-struggle-with-tmi-5e41cca1\" target=\"_blank\" rel=\"noopener\">In fact, 33% of companies cannot glean actionable insights from their data and 30% simply cannot handle the volume<\/a><a href=\"https:\/\/www.wsj.com\/articles\/data-deluge-businesses-struggle-with-tmi-5e41cca1\">.<\/a><\/p>\n<p>How can a comprehensive customer view be efficiently achieved? It starts by leveraging multiple data sets. For example, companies may possess email interaction data, purchase records, and customer service records. These data sets are useful for understanding a customer\u2019s needs.<\/p>\n<p>However, across these data sets, a customer may be represented slightly differently. For example, \u201cMr. John Doe\u201d may be listed in one data set, while in another data set, that same person may be called, \u201cJ. Doe\u201d. Identity resolution \u2014 identifying customer identities across data sets and merging their data \u2014 can be a puzzling and time-consuming data cleaning problem, which sometimes requires a company to manually search and merge customer data to support downstream analytics.<\/p>\n<p>Could there be a better way? Enter <a href=\"https:\/\/www.salesforceairesearch.com\/about\" target=\"_blank\" rel=\"noopener\">Salesforce AI<\/a><a href=\"https:\/\/www.salesforceairesearch.com\/about\">.<\/a> Together with <a href=\"https:\/\/www.salesforce.com\/products\/genie\/overview\/\" target=\"_blank\" rel=\"noopener\">Salesforce Data Cloud<\/a> engineering teams, Salesforce AI kicked off AI-based identity resolution in 2021, developing a fuzzy first-name matching approach that leverages a large language model (LLM) to match individuals.<\/p>\n<p>That approach has since evolved to a soft matching system \u2014 empowering companies to select their level of rigor of matching per business case.<\/p>\n<p>For a deep dive into this technical solution, read on\u2026<\/p>\n<p><strong>How does AI-based fuzzy matching help with identity resolution?<\/strong><\/p>\n<p>As they examine customer data, companies must analyze multiple data sets, where a person\u2019s name may not be consistently represented. For example, in one data source, someone may have the first name Robert and in another data source, that same person may appear as Bob. Fuzzy first name matching effectively matches those first names, subject to potential real-world variations, enabling companies to identify a unique person across multiple data sets.<\/p>\n<p>Diving deeper, the <a href=\"https:\/\/arxiv.org\/pdf\/2111.10497.pdf\" target=\"_blank\" rel=\"noopener\">fuzzy first name matching model<\/a> developed by Salesforce AI and Salesforce Data Cloud consists of a fine-tuned LLM and data-derived rules, classifying a pair of first names as either a match or not a match. For example, a record for a particular customer may reveal the name (or initial), \u201cJ\u201d, while another may show \u201cJessica\u201d. In this case, the AI model would consider those matches. This lies in stark contrast to records such as \u201cMarisa\u201d and \u201cMario\u201d which would not be a match.<\/p>\n<p>However, matches can vary depending on the context, domain and data sets. In highly regulated fields \u2014 such as the medical industry \u2014 the cost of a potential name record mismatch could have serious consequences. For example, the name (or initial),\u201cJ\u201d and \u201cJessica\u201d would not constitute a match. However, in less regulated domains, \u201cJ\u201d and Jessica may be a suitable match.<\/p>\n<p>Understanding that there is not a \u201cone-size-fits-all\u201d approach to matching and that companies require a more sophisticated method for identifying their customers, Salesforce AI and Salesforce Data Cloud engineers took fuzzy matching to the next level with soft matching.<\/p>\n<p><strong>How does AI-based soft matching improve identity resolution?<\/strong><\/p>\n<p>Using advanced AI models, soft matching further enables identity resolution for diverse data sets and domains by supplanting binary answers with match scores \u2014 giving companies creative control of their data merging. As they select their desired precision \u2014 low, medium, and high \u2014 an AI model returns the matching first names accordingly.<\/p>\n<p><em>Examples of soft matching scoring.<\/em><\/p>\n<p>What does this look like? High precision matches are stringent and nearly exact matches. This includes nicknames \u2014 whereby \u201cWilliam\u201d and \u201cBill\u201d would be considered a strong match \u2014 and punctuation, whereby \u201cMary-Joe\u201d and Mary Joe are matches.<\/p>\n<p>Medium and low matches allow for more fuzziness, enabling companies to capture a wider range of potentially matching customers. For example, the initial \u201cS\u201d and the name, \u201cSharon\u201d or even loosely similar names (\u201cBob\u201d vs. \u201cRoberto\u201d) delivers medium precision matches. Additionally, selecting these lower levels may be helpful if the data contains name misspellings or typos.<\/p>\n<p>How should organizations select their levels of rigor? It depends on the domain, as the level of fuzziness that a user deems permissible for first names remains context dependent. For example, the medical field may select high precision matches to ensure the integrity of patient records. Alternatively, businesses might select medium or low matches to maximize the ROI of their data \u2014 significantly expanding the reach of their target market.<\/p>\n<p><strong>How did the Salesforce AI team innovate soft matching?<\/strong><\/p>\n<p>Identity resolution research began by training the AI model to find matches between two first names, delivering binary results. However, the team did not want to produce just zeroes and ones. They challenged themselves to enhance their fine-tuned LLM to produce a soft matching AI model that would generate a smooth range of confidence scores \u2014 supporting high, middle, and low precision matches.<\/p>\n<p>To produce this wide range of scores, the team trained a regularized <a href=\"https:\/\/towardsdatascience.com\/multilayer-perceptron-explained-with-a-real-life-example-and-python-code-sentiment-analysis-cb408ee93141\" target=\"_blank\" rel=\"noopener\">multilayer perceptron<\/a> (MLP) \u2014 a neural network \u2014 to align name similarity scores and name embeddings produced by their fine-tuned LLM to determine if two name strings are semantically similar. As a result, instead of producing too many values close to one or zero, the model produced a smoother distribution of scores, ranging from high to low.<\/p>\n<p>Next, the AI team iterated with Salesforce Data Cloud engineers, modifying the MLP\u2019s performance through various training approaches to better define the scores and implemented the MLP model in Java.<\/p>\n<p>Lastly, the Data Cloud team integrated MLP into the Data Cloud platform, making the model available for customers.<\/p>\n<p><strong>How does AI-based soft matching overcome the global language barrier?<\/strong><\/p>\n<p>The soft matching model must support Data Cloud customers around the world, which creates a challenge as international first names may involve different conventions, accents, alphabets, and other variables. How does the AI team overcome this hurdle?<\/p>\n<p>First, the team used a multilingual DistilBERT model that is pre-trained on over 100 different languages.<\/p>\n<p>Next, the team fine-tuned the multilingual DistilBERT model on first name data across several languages. This further improved multilingual performance on first names.<\/p>\n<p>Finally, the team leveraged multilingual nickname dictionaries to ensure that certain nicknames were consistently recognized.<\/p>\n<h4 class=\"wp-block-heading\">Learn More<\/h4>\n<p>Stay connected \u2013 join our<a href=\"https:\/\/careers.mail.salesforce.com\/w2?cid=7017y00000CRDS7AAP\"> <\/a><a href=\"https:\/\/careers.mail.salesforce.com\/w2?cid=7017y00000CRDS7AAP\" target=\"_blank\" rel=\"noopener\">Talent Community<\/a>!<\/p>\n<p><a href=\"https:\/\/www.salesforce.com\/company\/careers\/teams\/tech-and-product\/?d=cta-tms-tp-2\" target=\"_blank\" rel=\"noopener\">Check out our Technology and Product teams<\/a> to learn how you can get involved.<\/p>\n<p>For an inside look at Salesforce\u2019s AI team, check out this <a href=\"https:\/\/engineering.salesforce.com\/3-ways-salesforce-takes-ai-research-to-the-next-level\/\" target=\"_blank\" rel=\"noopener\">blog<\/a>.<\/p>\n<p>The post <a href=\"https:\/\/engineering.salesforce.com\/ai-based-identity-resolution-the-key-for-linking-diverse-customer-data\/\">AI-based Identity Resolution: The Key for Linking Diverse Customer Data<\/a> appeared first on <a href=\"https:\/\/engineering.salesforce.com\/\">Salesforce Engineering Blog<\/a>.<\/p>\n<p><a href=\"https:\/\/engineering.salesforce.com\/ai-based-identity-resolution-the-key-for-linking-diverse-customer-data\/\" target=\"_blank\" class=\"feedzy-rss-link-icon\" rel=\"noopener\">Read More<\/a><\/p>","protected":false},"excerpt":{"rendered":"<p>Companies want a comprehensive view of their customers, enabling them to solve business and marketing challenges, such as personalization, segmentation, and targeting \u2014 but they face an uphill battle as they are drowning in data. For example, many companies cannot match the identity of a customer who visits their website with the same customer who&hellip; <a class=\"more-link\" href=\"https:\/\/fde.cat\/index.php\/2023\/04\/18\/ai-based-identity-resolution-the-key-for-linking-diverse-customer-data\/\">Continue reading <span class=\"screen-reader-text\">AI-based Identity Resolution: The Key for Linking Diverse Customer Data<\/span><\/a><\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","footnotes":""},"categories":[7],"tags":[],"class_list":["post-705","post","type-post","status-publish","format-standard","hentry","category-technology","entry"],"jetpack_featured_media_url":"","jetpack-related-posts":[{"id":860,"url":"https:\/\/fde.cat\/index.php\/2024\/04\/24\/inside-data-clouds-secret-formula-for-processing-one-quadrillion-records-monthly\/","url_meta":{"origin":705,"position":0},"title":"Inside Data Cloud\u2019s Secret Formula for Processing One Quadrillion Records Monthly","date":"April 24, 2024","format":false,"excerpt":"In our \u201cEngineering Energizers\u201d Q&A series, we explore the inspiring journeys of engineering leaders who have significantly advanced their fields. Today, we meet Soumya KV, who spearheads the development of the Data Cloud\u2019s internal apps layer at Salesforce. Her India-based team specializes in advanced data segmentation and activation, enabling tailored\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":770,"url":"https:\/\/fde.cat\/index.php\/2023\/10\/10\/revealing-the-newest-data-science-tool-speeding-ai-development-and-securing-customer-data\/","url_meta":{"origin":705,"position":1},"title":"Revealing the Newest Data Science Tool: Speeding AI Development and Securing Customer Data","date":"October 10, 2023","format":false,"excerpt":"by Chi Wang and Scott Nyberg In today\u2019s data-powered world, leveraging customer data to improve AI capabilities remains key for providing highly personalized consumer experiences. In fact, 43% of customers believe AI has improved their lives, with 54% willing to provide their anonymized data to improve AI-related products. However, more\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":834,"url":"https:\/\/fde.cat\/index.php\/2024\/03\/06\/how-the-new-einstein-1-platform-manages-massive-data-and-ai-workloads-at-scale\/","url_meta":{"origin":705,"position":2},"title":"How the New Einstein 1 Platform Manages Massive Data and AI Workloads at Scale","date":"March 6, 2024","format":false,"excerpt":"In our \u201cEngineering Energizers\u201d Q&A series, we feature Leo Tran, Chief Architect of Platform Engineering at Salesforce. With over 15 years of engineering leadership experience, Leo is instrumental in developing the Einstein 1 Platform. This platform integrates generative AI, data management, CRM capabilities, and trusted systems to provide businesses with\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":712,"url":"https:\/\/fde.cat\/index.php\/2023\/05\/09\/automation-engineering-secrets-revealed-slashing-customer-processing-time-from-hours-to-seconds\/","url_meta":{"origin":705,"position":3},"title":"Automation Engineering Secrets Revealed: Slashing Customer Processing Time from Hours to Seconds","date":"May 9, 2023","format":false,"excerpt":"In our \u201cEngineering Energizers\u201d Q&A series, we examine the professional life experiences that have shaped Salesforce Engineering leaders. In this special edition, we meet Pratima Shukla, a software engineering manager based in Bangalore, India. In her role, Pratima leads Salesforce India\u2019s Industries Cloud Public Sector Solution (PSS) team, where she\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":709,"url":"https:\/\/fde.cat\/index.php\/2023\/05\/03\/inside-marketing-clouds-new-automation-systems-updating-500m-marketing-leads-daily\/","url_meta":{"origin":705,"position":4},"title":"Inside Marketing Cloud\u2019s New Automation Systems: Updating 500M+ Marketing Leads Daily","date":"May 3, 2023","format":false,"excerpt":"To remain competitive in today\u2019s marketplace, companies\u2019 marketing leads must convert into sales; however; 40% of business leaders believe their existing marketing efforts are outdated. Pivoting from archaic marketing tools to automated software like Salesforce\u2019s Marketing Cloud Account Engagement (MCAE) significantly enhances companies\u2019 lead generation process and helps boost revenue.\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":694,"url":"https:\/\/fde.cat\/index.php\/2023\/03\/23\/big-data-processing-driving-data-migration-for-salesforce-data-cloud\/","url_meta":{"origin":705,"position":5},"title":"Big Data Processing: Driving Data Migration  for Salesforce Data Cloud","date":"March 23, 2023","format":false,"excerpt":"The tsunami of data \u2014 set to exceed 180 zettabytes by 2025 \u2014 places significant pressure on companies. Simply having access to customer information is not enough \u2014 companies must also analyze and refine the data to find actionable pieces that power new business. As businesses collect these volumes of\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/705","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/comments?post=705"}],"version-history":[{"count":0,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/705\/revisions"}],"wp:attachment":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/media?parent=705"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/categories?post=705"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/tags?post=705"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}