{"id":857,"date":"2024-04-19T21:00:15","date_gmt":"2024-04-19T21:00:15","guid":{"rendered":"https:\/\/fde.cat\/index.php\/2024\/04\/19\/how-we-scaled-salesforce-edge-up-to-5-million-orgs\/"},"modified":"2024-04-19T21:00:15","modified_gmt":"2024-04-19T21:00:15","slug":"how-we-scaled-salesforce-edge-up-to-5-million-orgs","status":"publish","type":"post","link":"https:\/\/fde.cat\/index.php\/2024\/04\/19\/how-we-scaled-salesforce-edge-up-to-5-million-orgs\/","title":{"rendered":"How We Scaled Salesforce Edge up to 5 Million Orgs"},"content":{"rendered":"<p><strong>What was the business trigger for your project?<\/strong><\/p>\n<p>Since 2018, <a href=\"https:\/\/help.salesforce.com\/s\/articleView?id=000380380&amp;type=1\">Salesforce Edge<\/a> has been providing internal content delivery network (CDN) services \u2014 onboarding approximately 130,000 web domain names, including some of the largest internal web properties.<\/p>\n<p>Over the years, we\u2019ve worked on improving the stability of the service, but it has struggled to keep up with our rapid business growth. We realized that our control plane architecture is nearing its scaling limits in terms of memory utilization and visible latencies. <strong>This realization provided us the opportunity to design a new architecture.<\/strong> We can now reflect on the lessons learned from the initial project, identifying what worked well, what was overlooked, what is no longer needed, and what has not scaled up effectively.<\/p>\n<p><em>Growth in number of onboarded customers over time.<\/em><\/p>\n<p><strong>Have you considered rewriting your software from scratch?<\/strong><\/p>\n<p>Yes, we considered rewriting our software from scratch for various reasons, including a significant change in underlying technologies or the belief that starting fresh would be faster than dealing with existing technical debts. In the case of our Edge software, <strong>we approached the re-architecture of the control plane and data plane components differently.<\/strong><\/p>\n<p><strong>For the control plane server, <\/strong>we made the decision to switch from Python to Java, Etcd to Aurora, Docker-compose to Kubernetes, and private cloud to public cloud. These drastic changes warranted starting from scratch.<\/p>\n<p><strong>For the data plane<\/strong>, which handles internet traffic, we took a different approach. We valued Trust as our top priority and were concerned about losing the interoperability, compliance, and security hardening that we had gained over years of operation. Therefore, we opted for an intensive refactoring exercise on the existing code.<\/p>\n<p>By choosing the refactoring approach for the data plane, we were able to leverage over 300 functional tests running in our existing CI, reducing the risk of introducing functional regressions. Our rollout strategy involved feature-flagging the new refactored code, allowing for a slow and staggered rollout of each feature. This approach also provided the flexibility to quickly roll back to the legacy code if needed.<\/p>\n<p><strong>How did you kick off the refactoring project?<\/strong><\/p>\n<p>We followed a systematic approach. First, we identified the measurable metrics we wanted to improve and agreed on their target values. Then, we collected profiling data on the current code to pinpoint the areas that needed improvement. Next, we generated ideas to address the identified pain points, prototyped them, and measured their impact on each metric.<\/p>\n<p>Additionally, we estimated the effort required for implementing each idea and used these figures to calculate a \u201ccost-effectiveness score\u201d for each idea. <strong>By prioritizing the highest-scored ideas, we focused on the \u201clow hanging fruits\u201d that would deliver the largest impact.<\/strong> This approach ensured a structured and data-driven kick off for the refactoring project.<\/p>\n<p><em>Feature prioritizing matrix based on their cost-effectiveness.<\/em><\/p>\n<p><strong>What was the biggest pain point you needed to address?<\/strong><\/p>\n<p>Our biggest pain point was the limited scale of configuration size, specifically the number of onboarded customers, which frequently triggered the kernel\u2019s Out-Of-Memory Killer. It became clear that keeping all the configuration in memory was not a viable solution.<\/p>\n<p>To address this, we decided to change our configuration processing model. We adopted a streaming approach, where we loaded one customer\u2019s configuration at a time, processed it, and then discarded it, similar to streaming. This shift to streaming mode allowed us to optimize our memory footprint and scale indefinitely, overcoming the limitations imposed by the configuration size.<\/p>\n<p><em>Memory footprint growth before and after the refactoring.<\/em><\/p>\n<p><strong>Have you also considered scaling vertically?<\/strong><\/p>\n<p>To maximize core utilization, we implemented multi-threading for our configuration processing. Instead of using locks to serialize access to shared resources, which can be inefficient, we adopted a map-reduce-like approach. Each worker thread operates on its own local resource, which is later combined into the global resource by the main thread. This approach minimizes contention and achieves a near-linear relationship between the number of worker cores and the configuration processing time.<\/p>\n<p><em>Time to load the configuration by the number of config processing workers.<\/em><br \/><strong><br \/>Can you share one learning from the original Edge architecture?<\/strong><\/p>\n<p>One key learning from the original Edge architecture was<strong> the need to prioritize scalability as a nonfunctional requirement<\/strong>. Initially, the focus was on developing new features to cater to a wide range of customer use cases. However, introducing new features without considering their impact on scalability could lead to a complex and inefficient architecture.<\/p>\n<p>The realization was that, even if a feature was not heavily utilized, it could still have a negative impact on performance due to the architectural changes it introduced. This raised the question of whether to clean up these features or keep them as part of the refactoring process.<\/p>\n<p>To address this, a decision was made to plan the new architecture as if these features did not exist. <strong>The highest strategic priority became ensuring that the predominant features, which required scalability, were well-supported.<\/strong> By taking this approach, the team aimed to create an architecture that prioritized scalability and avoided unnecessary complexity caused by less-utilized features.<\/p>\n<p><strong>How drastic was the refactoring impact?<\/strong><\/p>\n<p>After several months of iterating over the new code, we successfully addressed all the initial pain points, optimized our performance metrics, and resumed the massive migration of domains into Edge. This was achieved through continuous integration, with bi-weekly releases, health-mediated rollout, and no downtime. <strong>Notably, we were able to onboard 5 million customers in our test lab and increase our production customer base from 130,000 to 2 million.<\/strong><\/p>\n<div class=\"wp-block-group is-layout-constrained wp-container-1 wp-block-group-is-layout-constrained\">\n<h4 class=\"wp-block-heading\">Learn More<\/h4>\n<p>Stay connected \u2014 join our <a href=\"https:\/\/flows.beamery.com\/salesforce\/eng-social-2023\">Talent Community<\/a>!<\/p>\n<p>Check out our <a href=\"https:\/\/www.salesforce.com\/company\/careers\/teams\/tech-and-product\/?d=cta-tms-tp-2\">Technology and Product<\/a> teams to learn how you can get involved.<\/p>\n<\/div>\n<p>The post <a href=\"https:\/\/engineering.salesforce.com\/how-we-scaled-salesforce-edge-up-to-5-million-orgs\/\">How We Scaled Salesforce Edge up to 5 Million Orgs<\/a> appeared first on <a href=\"https:\/\/engineering.salesforce.com\/\">Salesforce Engineering Blog<\/a>.<\/p>\n<p><a href=\"https:\/\/engineering.salesforce.com\/how-we-scaled-salesforce-edge-up-to-5-million-orgs\/\" target=\"_blank\" class=\"feedzy-rss-link-icon\" rel=\"noopener\">Read More<\/a><\/p>","protected":false},"excerpt":{"rendered":"<p>What was the business trigger for your project? Since 2018, Salesforce Edge has been providing internal content delivery network (CDN) services \u2014 onboarding approximately 130,000 web domain names, including some of the largest internal web properties. Over the years, we\u2019ve worked on improving the stability of the service, but it has struggled to keep up&hellip; <a class=\"more-link\" href=\"https:\/\/fde.cat\/index.php\/2024\/04\/19\/how-we-scaled-salesforce-edge-up-to-5-million-orgs\/\">Continue reading <span class=\"screen-reader-text\">How We Scaled Salesforce Edge up to 5 Million Orgs<\/span><\/a><\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","footnotes":""},"categories":[7],"tags":[],"class_list":["post-857","post","type-post","status-publish","format-standard","hentry","category-technology","entry"],"jetpack_featured_media_url":"","jetpack-related-posts":[{"id":229,"url":"https:\/\/fde.cat\/index.php\/2021\/02\/02\/ml-lake-building-salesforces-data-platform-for-machine-learning\/","url_meta":{"origin":857,"position":0},"title":"ML Lake: Building Salesforce\u2019s Data Platform for Machine Learning","date":"February 2, 2021","format":false,"excerpt":"Salesforce uses machine learning to improve every aspect of its product suite. With the help of Salesforce Einstein, companies are improving productivity and accelerating key decision-making. Data is a critical component of all machine learning applications and Salesforce is no exception. In this post I will share some unique challenges\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":538,"url":"https:\/\/fde.cat\/index.php\/2022\/02\/01\/behind-the-scenes-of-hyperforce-salesforces-infrastructure-for-the-public-cloud\/","url_meta":{"origin":857,"position":1},"title":"Behind the Scenes of Hyperforce: Salesforce\u2019s Infrastructure for the Public Cloud","date":"February 1, 2022","format":false,"excerpt":"Salesforce has been running cloud infrastructure for over two decades, bringing companies and their customers together. When Salesforce first started out in 1999, the world was very different; back then, the only practical way to provide our brand of Software-As-A-Service was to run everything yourself\u200a\u2014\u200anot just the software, but the\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":536,"url":"https:\/\/fde.cat\/index.php\/2022\/01\/25\/scaling-cross-team-contributions-to-a-native-mobile-app\/","url_meta":{"origin":857,"position":2},"title":"Scaling cross-team contributions to a native mobile app","date":"January 25, 2022","format":false,"excerpt":"By Stephen Goldberg, Alex Sikora, and Jean\u00a0Bovet Flagship applications are home to myriad functionalities that serve different parts of your userbase. Often, adding a new feature unintentionally causes reduced velocity, single points of failure, and monoliths that are hard to navigate. Such flagship apps are built from contributions from multiple\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":845,"url":"https:\/\/fde.cat\/index.php\/2024\/03\/25\/mission-impossible-inside-the-unprecedented-integration-of-100-products-within-salesforces-new-marketing-cloud-growth-edition\/","url_meta":{"origin":857,"position":3},"title":"Mission Impossible: Inside the Unprecedented Integration of 100 Products Within Salesforce\u2019s New Marketing Cloud Growth Edition","date":"March 25, 2024","format":false,"excerpt":"In this edition of our \u201cEngineering Energizers\u201d Q&A series, we spotlight Jeanine Walters, Principal Architect and lead architect behind Marketing Cloud Growth Edition at Salesforce. With over 20 years of architecting innovative solutions at Salesforce, Jeanine has played a pivotal role in creating a game-changing marketing application that empowers small\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":840,"url":"https:\/\/fde.cat\/index.php\/2024\/03\/20\/aiops-engineering-secrets-revealed-how-ai-and-automation-slash-thousands-of-manual-hours-annually\/","url_meta":{"origin":857,"position":4},"title":"AIOps Engineering Secrets Revealed: How AI and Automation Slash Thousands of Manual Hours Annually","date":"March 20, 2024","format":false,"excerpt":"In our \u201cEngineering Energizers\u201d Q&A series, we explore the remarkable journeys of engineering leaders who have made significant contributions in their respective fields. Today, we meet Sravanthi Konduru, a Lead Member of the Technical Staff for Salesforce Engineering, who helps drive the development of the Warden AIOps platform. Explore how\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":729,"url":"https:\/\/fde.cat\/index.php\/2023\/06\/27\/simplifying-oauth-2-0-how-slacks-new-external-authentication-feature-boosts-developer-productivity\/","url_meta":{"origin":857,"position":5},"title":"Simplifying OAuth 2.0: How Slack\u2019s New External Authentication Feature Boosts Developer Productivity","date":"June 27, 2023","format":false,"excerpt":"In our \u201cEngineering Energizers\u201d Q&A series, we examine the professional journeys that have shaped Salesforce Engineering leaders. Say hello to Nupur Goyal, Staff Software Engineer at Slack. Nupur\u2019s core platform team at Slack helps developers increase their productivity and efficiency \u2014 empowering them to create cutting-edge applications that integrate with\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/857","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/comments?post=857"}],"version-history":[{"count":0,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/857\/revisions"}],"wp:attachment":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/media?parent=857"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/categories?post=857"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/tags?post=857"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}