{"id":860,"date":"2024-04-24T00:11:54","date_gmt":"2024-04-24T00:11:54","guid":{"rendered":"https:\/\/fde.cat\/index.php\/2024\/04\/24\/inside-data-clouds-secret-formula-for-processing-one-quadrillion-records-monthly\/"},"modified":"2024-04-24T00:11:54","modified_gmt":"2024-04-24T00:11:54","slug":"inside-data-clouds-secret-formula-for-processing-one-quadrillion-records-monthly","status":"publish","type":"post","link":"https:\/\/fde.cat\/index.php\/2024\/04\/24\/inside-data-clouds-secret-formula-for-processing-one-quadrillion-records-monthly\/","title":{"rendered":"Inside Data Cloud\u2019s Secret Formula for Processing One Quadrillion Records Monthly"},"content":{"rendered":"<p>In our \u201cEngineering Energizers\u201d Q&amp;A series, we explore the inspiring journeys of engineering leaders who have significantly advanced their fields. Today, we meet Soumya KV, who spearheads the development of the <a href=\"https:\/\/www.salesforce.com\/data\/\">Data Cloud\u2019s<\/a> internal apps layer at Salesforce. Her India-based team specializes in advanced data segmentation and activation, enabling tailored marketing strategies and enhanced decision-making for Salesforce customers.<\/p>\n<p>Join Soumya and her team as they tackle significant scalability challenges to unlock deeper insights for Salesforce customers.<\/p>\n<p><strong>What is your team\u2019s mission?<\/strong><\/p>\n<p><strong>Our mission is to design, develop, test, and continuously improve Data Cloud internal applications<\/strong> that optimize customer targeting and engagement through segmentation and activation.<\/p>\n<p>Segmentation divides customer data into specific groups based on criteria like age, location, and interests. This helps companies target their marketing efforts, increasing conversion rates or generate better business insights. For instance, a sports shoe company might target individuals between the ages of 20 and 35 who have a passion for sports. By utilizing the segmented data, the company can tailor marketing campaigns to resonate with this particular audience, potentially leading to higher engagement and conversion rates.<\/p>\n<p>Activation then enriches the segmented data and sends it to the appropriate destination within and outside of Salesforce. This may include <a href=\"https:\/\/www.salesforce.com\/products\/marketing\/\">Marketing Cloud<\/a> for email and SMS campaigns and <a href=\"https:\/\/www.salesforce.com\/products\/commerce\/\">Commerce Cloud<\/a> targeting commerce use cases and activating directly to a customer and the ecosystem partner target location.<\/p>\n<p>Our team\u2019s work is crucial because, <strong>without segmentation and activation, the data collected in the Data Cloud would remain untapped and unusable for customers. <\/strong>Ultimately, we enable customers to make informed decisions, target specific audiences, and derive business value from their data.<\/p>\n<p>The team is dedicated to developing latest solutions like segmentation on <a href=\"https:\/\/help.salesforce.com\/s\/articleView?id=sf.c360_a_byol_data_federation.htm&amp;type=5\">BYOL<\/a>, real-time segmentation computation while continuously refining segmentation capabilities, scalability, and usability. On the activation front, they are enhancing features for ecosystem activation, facilitating ISV driven integrations with the partners. Additionally, the team prioritizes the development of an egress platform that provides customers with flexibility to configure and utilize egressed data across various destinations such as GCS, Azure, and SFTP. This includes support for diverse file types, sizes, encryption methods, and compression techniques.<\/p>\n<p>We also take <a href=\"https:\/\/engineering.salesforce.com\/onboarding-slos-for-salesforce-services-299b6cf2d8e8\/\">ownership of the service<\/a>, maintaining the production systems and ensuring their health and stability. This involves monitoring, supporting, and meeting the availability, reliability, performance and the data security requirements of the systems.<\/p>\n<p><em>Soumya describes the culture of her Tableau engineering team.<\/em><\/p>\n<p><strong>What challenges does your team face while working on Data Cloud\u2019s internal apps layer?<\/strong><\/p>\n<p>The biggest challenge we face is managing scale. We handle a massive amount of data, with over<strong> 4500 tenants and processing one quadrillion records monthly.<\/strong> On a daily basis, we process<strong> 42+ trillion records and manage 36,000 segment and 12,500 activation job runs<\/strong>. This represents more than 100% growth compared to the previous year. <\/p>\n\n<div class=\"wp-block-group is-layout-constrained wp-container-1 wp-block-group-is-layout-constrained\">\n<p>To counter this challenge, we employ many strategies:<\/p>\n<p><strong>Continuous Monitoring and Analysis<\/strong>: The team continuously monitors the production system, conducting performance assessment, analyzing system behavior, latency, memory utilization, CPU, and cost. They closely monitor scale, usability patterns, and resource usage to assess patterns and optimize for performance.<\/p>\n<p><strong>Optimization and Fine-tuning<\/strong>: The team focuses on optimizing segmentation and activation jobs, database operations, and platform architecture to fine-tune scalability, performance and handling larger datasets efficiently. They continuously evaluate and refine their processes to improve overall system stability and performance.<\/p>\n<p><strong>Exploration of Optimization Techniques<\/strong>: Techniques like data batching and optimized scheduling are explored to group related jobs together and reduce processing time.<\/p>\n<p><strong>Adoption of New Technologies<\/strong>: The team stays updated on the latest trends and use technologies as applicable. This includes leveraging options like <a href=\"https:\/\/docs.aws.amazon.com\/emr\/latest\/EMR-on-EKS-DevelopmentGuide\/emr-eks.html\">AWS EMR on EKS<\/a> to enhance scalability and Spark DistCp for faster parallel data transfer capabilities.<\/p>\n<p><strong>Implementation of Guardrails<\/strong>: The team implements guardrails to ensure proper usage of the capabilities and prevent misuse. This includes setting limits, providing guidelines for optimal use, offering self-help tools and educating customers.<\/p>\n<\/div>\n<p><strong>Which technology does your team rely on the most to manage the scaling challenge?<\/strong><\/p>\n<p><a href=\"https:\/\/spark.apache.org\/\">Apache Spark<\/a> is a crucial technology for processing the vast amount of data we handle daily. With trillions of records to process on a daily basis, Spark\u2019s distributed processing capabilities empower us to distribute workloads across a cluster of machines, enabling parallel execution and scalability. This means we can efficiently process large datasets, running complex join and query operations in a efficient and timely manner.<\/p>\n<p>Spark excels at handling complex computations and join operations making it ideal for processing intricate queries to extract specific audiences basis the filter criteria for a given segment. Spark jobs execute these queries on distributed datasets, leveraging parallel processing to ensure fast processing despite massive data volumes.<\/p>\n<p>Spark is essential in the activation scenario as it allows us to enrich segmented data by adding necessary attributes through joins with other tables for improved customer engagement. Spark is efficiently applied for consent\/opt out filtering requirements. Spark DistCp is used for distributed copy enabling large dataset transfer at low latencies. And also used to encrypt large data payload before egress for data security. This ensures that the activated data is comprehensive and customized to meet our clients\u2019 specific needs.<\/p>\n<p>Under the hood, Spark\u2019s core computational engine manages various aspects of job execution. It handles job scheduling across the cluster, memory management and fault recovery, ensuring efficient utilization of resources and maintaining the stability of the processing environment. This robust engine allows us to run complex computations on large datasets with enhanced performance.<\/p>\n<p><strong>Can you provide insights into your team\u2019s testing and quality assurance processes for ensuring the reliability and stability of the Data Cloud apps layer?<\/strong><\/p>\n<div class=\"wp-block-group is-layout-constrained wp-container-2 wp-block-group-is-layout-constrained\">\n<p>Our team follows a comprehensive testing and quality assurance process that contains multiple layers:<\/p>\n<p><strong>Unit Testing<\/strong>: Developers perform thorough unit testing on the code they write, ensuring sufficient coverage and effectiveness.<\/p>\n<p><strong>Integration Testing<\/strong>: We test the seamless functionality between modules, including UI integration, database actions, and integration with upstream\/downstream systems.<\/p>\n<p><strong>Functionality Testing<\/strong>: We write comprehensive test cases covering all requirements, review them with peers and senior members, and validate functionality through various scenarios, including negative and happy path testing.<\/p>\n<p><strong>Automation<\/strong>: We automate UI and backend testing, using varied inputs to identify issues or bugs. This includes automating backend functional integration tests to safeguard existing functionality during future enhancements.<\/p>\n<p><strong>Performance Testing<\/strong>: We conduct performance testing to evaluate scalability and performance. This includes testing with large data sets, running hundreds of parallel segment\/activation jobs to evaluate scalability and performance. We benchmark system performance to determine achievable goals and SLAs.<\/p>\n<\/div>\n<p><em>Soumya shares why engineers should join Salesforce.<\/em><\/p>\n<p><strong>How does customer feedback shape your work on the Data Cloud apps layer?<\/strong><\/p>\n<p>As a customer-centric organization, <strong>we actively seek feedback from customers and stakeholders to guide the direction of our team\u2019s work.<\/strong> We collect insights through various channels, such as customer interactions and customer support team engagements. These feedback channels allow us to understand customer needs, usability preferences, and opportunities for enhancements\/improvement.<\/p>\n<p>One recent example of customer feedback that influenced our work was from a metrics-driven organization. They wanted to measure processing time for each step in our solution, including segmentation, activation, and data delivery, to optimize their processes. In response, we are developing a traceability metrics dashboard that provides insights into our processing stages.<\/p>\n<p>In addition to metrics, we also receive feedback on latency requirements, usability enhancements, support for varied connector frameworks, intelligent and optimal segment creation capabilities, and generating data output payloads that align with their system\u2019s processing capabilities. We also receive inputs on enhanced data security, encryption, and data masking capabilities.<\/p>\n<div class=\"wp-block-group is-layout-constrained wp-container-3 wp-block-group-is-layout-constrained\">\n<h4 class=\"wp-block-heading\">Learn More<\/h4>\n<p>Hungry for more Data Cloud stories? <a href=\"https:\/\/engineering.salesforce.com\/how-is-indias-brilliant-big-data-processing-team-engineering-salesforce-data-cloud\/\">Read this blog<\/a> to learn how India\u2019s Data Cloud big data processing compute layer team supports millions of Data Cloud-related tasks per month.<\/p>\n<p>Stay connected \u2014 join our <a href=\"https:\/\/flows.beamery.com\/salesforce\/eng-social-2023\">Talent Community<\/a>!<\/p>\n<p>Check out our <a href=\"https:\/\/www.salesforce.com\/company\/careers\/teams\/tech-and-product\/?d=cta-tms-tp-2\">Technology and Product<\/a> teams to learn how you can get involved.<\/p>\n<\/div>\n<p>The post <a href=\"https:\/\/engineering.salesforce.com\/inside-data-clouds-secret-formula-for-processing-one-quadrillion-records-monthly\/\">Inside Data Cloud\u2019s Secret Formula for Processing One Quadrillion Records Monthly<\/a> appeared first on <a href=\"https:\/\/engineering.salesforce.com\/\">Salesforce Engineering Blog<\/a>.<\/p>\n<p><a href=\"https:\/\/engineering.salesforce.com\/inside-data-clouds-secret-formula-for-processing-one-quadrillion-records-monthly\/\" target=\"_blank\" class=\"feedzy-rss-link-icon\" rel=\"noopener\">Read More<\/a><\/p>","protected":false},"excerpt":{"rendered":"<p>In our \u201cEngineering Energizers\u201d Q&amp;A series, we explore the inspiring journeys of engineering leaders who have significantly advanced their fields. Today, we meet Soumya KV, who spearheads the development of the Data Cloud\u2019s internal apps layer at Salesforce. Her India-based team specializes in advanced data segmentation and activation, enabling tailored marketing strategies and enhanced decision-making&hellip; <a class=\"more-link\" href=\"https:\/\/fde.cat\/index.php\/2024\/04\/24\/inside-data-clouds-secret-formula-for-processing-one-quadrillion-records-monthly\/\">Continue reading <span class=\"screen-reader-text\">Inside Data Cloud\u2019s Secret Formula for Processing One Quadrillion Records Monthly<\/span><\/a><\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","footnotes":""},"categories":[7],"tags":[],"class_list":["post-860","post","type-post","status-publish","format-standard","hentry","category-technology","entry"],"jetpack_featured_media_url":"","jetpack-related-posts":[{"id":694,"url":"https:\/\/fde.cat\/index.php\/2023\/03\/23\/big-data-processing-driving-data-migration-for-salesforce-data-cloud\/","url_meta":{"origin":860,"position":0},"title":"Big Data Processing: Driving Data Migration  for Salesforce Data Cloud","date":"March 23, 2023","format":false,"excerpt":"The tsunami of data \u2014 set to exceed 180 zettabytes by 2025 \u2014 places significant pressure on companies. Simply having access to customer information is not enough \u2014 companies must also analyze and refine the data to find actionable pieces that power new business. As businesses collect these volumes of\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":692,"url":"https:\/\/fde.cat\/index.php\/2023\/03\/22\/how-is-indias-brilliant-big-data-processing-team-engineering-salesforce-data-cloud\/","url_meta":{"origin":860,"position":1},"title":"How is India\u2019s Brilliant Big Data Processing Team Engineering Salesforce Data Cloud?","date":"March 22, 2023","format":false,"excerpt":"In our \u201cEngineering Energizers\u201d Q&A series, we examine the life experiences and career paths that have shaped Salesforce engineering leaders. Meet Archana Kumari, one of Salesforce\u2019s first India-based woman engineering leaders. In her role, Archana leads Salesforce India\u2019s Data Cloud big data processing compute layer team \u2014 charged with providing\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":709,"url":"https:\/\/fde.cat\/index.php\/2023\/05\/03\/inside-marketing-clouds-new-automation-systems-updating-500m-marketing-leads-daily\/","url_meta":{"origin":860,"position":2},"title":"Inside Marketing Cloud\u2019s New Automation Systems: Updating 500M+ Marketing Leads Daily","date":"May 3, 2023","format":false,"excerpt":"To remain competitive in today\u2019s marketplace, companies\u2019 marketing leads must convert into sales; however; 40% of business leaders believe their existing marketing efforts are outdated. Pivoting from archaic marketing tools to automated software like Salesforce\u2019s Marketing Cloud Account Engagement (MCAE) significantly enhances companies\u2019 lead generation process and helps boost revenue.\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":896,"url":"https:\/\/fde.cat\/index.php\/2024\/07\/16\/the-unstructured-data-dilemma-how-data-cloud-handles-250-trillion-transactions-weekly\/","url_meta":{"origin":860,"position":3},"title":"The Unstructured Data Dilemma: How Data Cloud Handles 250 Trillion Transactions Weekly","date":"July 16, 2024","format":false,"excerpt":"In our \u201cEngineering Energizers\u201d Q&A series, we delve into the journeys of engineering leaders who have made notable strides in their areas of expertise. This edition features Adithya Vishwanath, Vice President of Software Engineering at Salesforce. He leads the Data Cloud team, a pivotal platform that integrates diverse data sources,\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":892,"url":"https:\/\/fde.cat\/index.php\/2024\/07\/08\/unlocking-data-clouds-secret-for-scaling-massive-data-volumes-and-slashing-processing-bottlenecks\/","url_meta":{"origin":860,"position":4},"title":"Unlocking Data Cloud\u2019s Secret for Scaling Massive Data Volumes and Slashing Processing Bottlenecks","date":"July 8, 2024","format":false,"excerpt":"In our Engineering Energizers Q&A series, we explore engineers who have pioneered advancements in their fields. Today, we meet Rahul Singh, Vice President of Software Engineering, leading the India-based Data Cloud team. His team is focused on delivering a robust, scalable, and efficient Data Cloud platform that consolidates customer data\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":705,"url":"https:\/\/fde.cat\/index.php\/2023\/04\/18\/ai-based-identity-resolution-the-key-for-linking-diverse-customer-data\/","url_meta":{"origin":860,"position":5},"title":"AI-based Identity Resolution: The Key for Linking Diverse Customer Data","date":"April 18, 2023","format":false,"excerpt":"Companies want a comprehensive view of their customers, enabling them to solve business and marketing challenges, such as personalization, segmentation, and targeting \u2014 but they face an uphill battle as they are drowning in data. For example, many companies cannot match the identity of a customer who visits their website\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/860","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/comments?post=860"}],"version-history":[{"count":0,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/860\/revisions"}],"wp:attachment":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/media?parent=860"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/categories?post=860"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/tags?post=860"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}