{"id":644,"date":"2022-10-24T17:45:07","date_gmt":"2022-10-24T17:45:07","guid":{"rendered":"https:\/\/fde.cat\/index.php\/2022\/10\/24\/how-salesforce-built-a-cloud-native-task-execution-service\/"},"modified":"2022-10-24T17:45:07","modified_gmt":"2022-10-24T17:45:07","slug":"how-salesforce-built-a-cloud-native-task-execution-service","status":"publish","type":"post","link":"https:\/\/fde.cat\/index.php\/2022\/10\/24\/how-salesforce-built-a-cloud-native-task-execution-service\/","title":{"rendered":"How Salesforce Built a Cloud-Native Task Execution Service"},"content":{"rendered":"<p>If you\u2019re paying attention to Salesforce technology, you\u2019ve no doubt heard about\u00a0<a target=\"_blank\" href=\"https:\/\/www.salesforce.com\/products\/platform\/hyperforce\/\" rel=\"noopener\">Hyperforce<\/a>, our new approach to deploying Salesforce on public cloud providers. Start with\u00a0<a target=\"_blank\" href=\"https:\/\/engineering.salesforce.com\/behind-the-scenes-of-hyperforce-salesforces-infrastructure-for-the-public-cloud-429309542d8e?source=friends_link&amp;sk=0a22b253fffe0c7265ed602bd7e4e7fb\" rel=\"noopener\">a look at Hyperforce\u2019s architecture<\/a>. There are many compelling reasons to move to Hyperforce, both for us and our customers. We\u2019re excited to do it in the way that only Salesforce would \u2014 with\u00a0<strong>trust<\/strong>,\u00a0<strong>availability,<\/strong>\u00a0and\u00a0<strong>security<\/strong>\u00a0at the forefront from day one. Building a\u00a0<a target=\"_blank\" href=\"https:\/\/engineering.salesforce.com\/the-unified-infrastructure-platform-behind-salesforce-hyperforce-ad8f4c2cf789\" rel=\"noopener\">unified infrastructure platform<\/a>\u00a0for Hyperforce meant relooking at our automation tools to scale our operations with a fresh lens.\u00a0<\/p>\n<p>Salesforce has been around for over two decades. In 1999, when the company was founded, if you wanted to run a public internet software service (Software as a Service, or SaaS), you first had to get some servers and hook them up to the internet. So we built a few tools to perform our releases and database maintenance operations using SSH. Fast forward to 2015, when Salesforce\u00a0<a target=\"_blank\" href=\"https:\/\/engineering.salesforce.com\/adopting-kubernetes-46b6c13b204b\" rel=\"noopener\">took a very early bet on Kubernetes<\/a>\u00a0(K8s) to help manage an extensive suite of microservices. We\u2019re proudly using it\u00a0<a target=\"_blank\" href=\"https:\/\/engineering.salesforce.com\/7-ways-we-put-kubernetes-to-work-at-salesforce-e658de98ef2d\" rel=\"noopener\">today<\/a>\u00a0across product lines and business units. And with our transformation to Hyperforce, building and using cloud-native tools, security and process made the most sense.\u00a0<\/p>\n<p>To leverage the scale and agility of the world\u2019s leading public cloud platforms, our Technology and Products team has worked together over the past few years to build a cloud-native task execution system to execute remote operational tasks at scale. Because we believe you may need to walk down this path, too, we\u2019d like to share some challenges we faced and the solutions we identified.<\/p>\n<h2><strong>Transitioning away from SSH<\/strong><\/h2>\n<p>By default, many companies take a \u201clift and shift\u201d approach to running in the public cloud; they make the minimum set of changes needed to their software so that it\u2019ll be possible to run it in public cloud infrastructure. As Salesforce has grown over the past two decades, the volume of secure shell (SSH) keys and their use has grown exponentially. As a result, SSH-based attacks are becoming a popular choice for attackers targeting business networks. Over the past few years,\u00a0<a target=\"_blank\" href=\"https:\/\/threatpost.com\/botnet-mac-android\/159714\/\" rel=\"noopener\">Interplanetary Storm<\/a>\u00a0and crypto-miner campaigns like\u00a0<a target=\"_blank\" href=\"https:\/\/threatpost.com\/blackrota-golang-backdoor-obfuscation\/161544\/\" rel=\"noopener\">Golang<\/a>\u00a0and\u00a0<a target=\"_blank\" href=\"https:\/\/threatpost.com\/lemon-duck-cryptocurrency-botnet\/160046\/\" rel=\"noopener\">Lemon_Duck<\/a>\u00a0have been used by attackers for backdoor creation. These incidents exploit SSH access vulnerabilities to use SSH keys in several ways for network access and exploitation. So Hyperforce was our chance to completely re-envision those practices in a cloud-native way, with uncompromising security and availability as part of our approach.<\/p>\n<h2><strong>Build-Your-Own vs. Open-Source<\/strong><\/h2>\n<p>Our prior experience was with static infrastructure and using Puppet to roll out automation scripts across our fleet of servers. However, as we started our research and development on operating a remote task execution service, we were clear with our fundamental design principles:<\/p>\n<p><strong>Secure-by-default <\/strong>\u2013\u00a0Security was baked in from the start into the Hyperforce architecture through its universal authentication architecture \u2013 principles, pathways, and processes that create security by default. Our task execution service had to meet the high bar set by our security team.\u00a0<strong>Simple and Easy to Use<\/strong>\u00a0\u2013 We want the architecture to be simple so that the cost to maintain and operate is not high. We are solving a single problem: having an automated workflow execution system that can help automate commonly run operational tasks. The service must be easy to use to have an excellent Developer Experience (DX).<strong>Immutability<\/strong>\u00a0\u2013 With Hyperforce, we adopted a modern approach to infrastructure, i.e., all infrastructure is\u00a0<strong>immutable.\u00a0<\/strong>This approach required us to rethink how we colocated our operations automation scripts with our applications.\u00a0<strong>Multi-substrate<\/strong>\u00a0\u2013 We wanted a flexible solution to support operating Salesforce on top of any substrate, aka cloud provider infrastructure, i.e., Amazon, Google, Microsoft, etc.\u00a0<\/p>\n<p>As we were in the transition phase of transforming our services to adopt microservices design patterns and operate as containers using Kubernetes, we required a\u00a0<strong>hybrid solution<\/strong>\u00a0by supporting task execution on Servers, Virtual Machines (VMs), and Kubernetes-based deployments. Unfortunately, this ruled out container-native workflow engine solutions.<\/p>\n<p>We also evaluated several other open-source workflow orchestration engines, and we ultimately decided that to stay close to our design principles, we march on the journey to develop this task execution service in-house.<\/p>\n<h2><strong>Decoupling automation scripts from application source code\u00a0<\/strong><\/h2>\n<p>Going with our immutability design goal in Hyperforce, we needed to decouple automation scripts from our application deployments to reduce the operational cost of performing new releases via our CI\/CD platform.\u00a0<\/p>\n<p>To promote a standard model for our task execution, we devised a Task Recipe Execution Framework. A recipe file is a declarative interface for an operator to define the main business logic for task execution. We quickly iterated through the framework and adopted Object-Oriented principles. These principles helped us to provide boilerplate code for new task declaration through a Base Recipe class. The task execution workers pass a recipe context containing input parameters and environment metadata that the recipe can leverage.\u00a0<\/p>\n<p>We created a\u00a0<a target=\"_blank\" href=\"https:\/\/www.atlassian.com\/git\/tutorials\/monorepos\" rel=\"noopener\">mono repository<\/a>\u00a0in our source control system and centralized the delivery of these recipe files via our CI\/CD pipeline to regional storage buckets (such as Amazon S3, Google Cloud Storage, Azure Storage Services, etc.).\u00a0\u00a0<\/p>\n<h2><strong>Architecture<\/strong><\/h2>\n<p>The task execution control plane consists of key components of an API server, coordinator, and status reporter. In this case, workers were deployed as RPM packages on Servers and Virtual Machines (baked as part of the image). For Kubernetes workloads, we built a mutating webhook (using our\u00a0<a target=\"_blank\" href=\"https:\/\/engineering.salesforce.com\/a-generic-sidecar-injector-for-kubernetes-c05eede1f6bb\" rel=\"noopener\">open-source tool<\/a>). Below is a detailed description of each of the components.\u00a0<\/p>\n<p>During the design phase, deciding between service mesh vs. message queues for communication between the control plane and the workers was a critical choice. Given the sporadic nature of task requests, picking a message queue pattern made the most sense. Decoupling the control plane and workers helped eliminate many complexities, such as endpoint discovery, health checks, routing, and load balancing. Workers could execute tasks at their own pace using a\u00a0<a target=\"_blank\" href=\"https:\/\/en.wikipedia.org\/wiki\/Work_stealing\" rel=\"noopener\">work stealing pattern<\/a>. And queues were created based on the infrastructure topology and fault partitions, and the coordinator routed the task by publishing messages on the proper queue based on input parameters.\u00a0<\/p>\n<p><strong>API Server <\/strong>\u2013<strong>\u00a0<\/strong>The API Server is an always-on RESTful interface to receive requests from operators and other trusted services. After completing the AuthN\/AuthZ check, the API server delegates the request processing to the coordinator.\u00a0<strong>Coordinator<\/strong>\u00a0\u2013 Coordinator is a stateless daemon deployed in each Hyperforce region and stripped across multiple availability zones. The coordinator ensures to subscribe to messages from the API server and routes the message to the right workers based on the request criteria.\u00a0<strong>Status Reporter<\/strong>\u00a0\u2013 Workers communicate their heart-beat and task execution progress to status reporters. Status reporter helps to centralize the updates to our backend storage and helps to eliminate each worker from having a persistent connection to our storage system.\u00a0<strong>Workers<\/strong>\u00a0\u2013 Workers are stateless daemons running either as a sidecar for Kubernetes applications or a system daemon on servers and virtual machines. The workers at runtime pull the latest copy of the recipe file, perform file integrity checks and then complete the execution of the task.<\/p>\n<h2><strong>Conclusion<\/strong><\/h2>\n<p>The content of this article is just scratching the surface. Our next challenge is to make the task execution system feature-rich with near real-time monitoring. We plan to do this by adding the ability to support cron and schedule-based task executions and integrate with substrate-specific queuing technologies, resulting in a truly multi-cloud compatible service. We are setting the security bar high to uphold our company commitment to Trust. Finally, based on our experiments and testing, we\u2019ve documented some best practices for utilizing queues and asynchronous processing that we\u2019ll publish soon, so stay tuned!<\/p>\n<p>The post <a href=\"https:\/\/engineering.salesforce.com\/how-salesforce-built-a-cloud-native-task-execution-service\/\">How Salesforce Built a Cloud-Native Task Execution Service<\/a> appeared first on <a href=\"https:\/\/engineering.salesforce.com\/\">Salesforce Engineering Blog<\/a>.<\/p>\n<p><a href=\"https:\/\/engineering.salesforce.com\/how-salesforce-built-a-cloud-native-task-execution-service\/\" target=\"_blank\" class=\"feedzy-rss-link-icon\" rel=\"noopener\">Read More<\/a><\/p>","protected":false},"excerpt":{"rendered":"<p>If you\u2019re paying attention to Salesforce technology, you\u2019ve no doubt heard about\u00a0Hyperforce, our new approach to deploying Salesforce on public cloud providers. Start with\u00a0a look at Hyperforce\u2019s architecture. There are many compelling reasons to move to Hyperforce, both for us and our customers. We\u2019re excited to do it in the way that only Salesforce would&hellip; <a class=\"more-link\" href=\"https:\/\/fde.cat\/index.php\/2022\/10\/24\/how-salesforce-built-a-cloud-native-task-execution-service\/\">Continue reading <span class=\"screen-reader-text\">How Salesforce Built a Cloud-Native Task Execution Service<\/span><\/a><\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","footnotes":""},"categories":[7],"tags":[],"class_list":["post-644","post","type-post","status-publish","format-standard","hentry","category-technology","entry"],"jetpack_featured_media_url":"","jetpack-related-posts":[{"id":870,"url":"https:\/\/fde.cat\/index.php\/2024\/05\/23\/hyperforce-behind-the-scenes-ushering-in-a-new-age-of-ai-driven-cloud-scalability\/","url_meta":{"origin":644,"position":0},"title":"Hyperforce Behind the Scenes: Ushering in a New Age of AI-Driven Cloud Scalability","date":"May 23, 2024","format":false,"excerpt":"In our latest edition of our \u201cEngineering Energizers\u201d Q&A series, we meet Paul Constantinides, Executive Vice President of Engineering. With an extensive history in the technology industry, and a 20-year career at Salesforce, Paul leads the Hyperforce Platform Services team and is responsible for developing Hyperforce, Salesforce\u2019s public cloud-native architecture\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":538,"url":"https:\/\/fde.cat\/index.php\/2022\/02\/01\/behind-the-scenes-of-hyperforce-salesforces-infrastructure-for-the-public-cloud\/","url_meta":{"origin":644,"position":1},"title":"Behind the Scenes of Hyperforce: Salesforce\u2019s Infrastructure for the Public Cloud","date":"February 1, 2022","format":false,"excerpt":"Salesforce has been running cloud infrastructure for over two decades, bringing companies and their customers together. When Salesforce first started out in 1999, the world was very different; back then, the only practical way to provide our brand of Software-As-A-Service was to run everything yourself\u200a\u2014\u200anot just the software, but the\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":625,"url":"https:\/\/fde.cat\/index.php\/2022\/08\/30\/hyperpacks-using-buildpacks-to-build-hyperforce\/","url_meta":{"origin":644,"position":2},"title":"Hyperpacks: Using Buildpacks to Build Hyperforce","date":"August 30, 2022","format":false,"excerpt":"At Salesforce we regularly use our products and services to scale our own business. One example is Buildpacks, which we created nearly a decade ago and is now a part of Hyperforce. Hyperpacks are an innovative new way of using Cloud Native Buildpacks (CNB) to manage our public cloud infrastructure.\u00a0\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":688,"url":"https:\/\/fde.cat\/index.php\/2023\/03\/07\/automated-environment-build-salesforces-secret-sauce-for-rapid-cloud-expansion\/","url_meta":{"origin":644,"position":3},"title":"Automated Environment Build: Salesforce\u2019s Secret Sauce for Rapid Cloud Expansion","date":"March 7, 2023","format":false,"excerpt":"Around the world, companies must satisfy global compliance regulations or face pricey fines, where failure to comply results in 2.71 higher costs than the cost to comply. For example, Fortune 500 companies are projected to lose $8 billion per year as a result of GDPR non-compliance. In response, Salesforce created\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":679,"url":"https:\/\/fde.cat\/index.php\/2023\/02\/14\/3-ways-salesforce-boosts-developer-productivity-on-hyperforce\/","url_meta":{"origin":644,"position":4},"title":"3 Ways Salesforce Boosts Developer Productivity on Hyperforce","date":"February 14, 2023","format":false,"excerpt":"In 2018, Salesforce began development of Hyperforce \u2014 a next-gen infrastructure platform that leverages public cloud to securely and swiftly deliver Salesforce software to customers worldwide. The platform development\u2019s team priorities were focused: build Hyperforce, get it sentient, and provide cloud-native tools that drive internal product developers\u2019 innovations, empowering them\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":544,"url":"https:\/\/fde.cat\/index.php\/2022\/02\/22\/the-unified-infrastructure-platform-behind-salesforce-hyperforce\/","url_meta":{"origin":644,"position":5},"title":"The Unified Infrastructure Platform Behind Salesforce Hyperforce","date":"February 22, 2022","format":false,"excerpt":"If you\u2019re paying attention to Salesforce technology at all, you\u2019ve no doubt heard about Hyperforce, our new approach to deploying Salesforce on public cloud providers. As with any big announcement, it can be a little hard to cut through the hyperbolic language and understand what\u2019s going\u00a0on. In this blog series,\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/644","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/comments?post=644"}],"version-history":[{"count":0,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/644\/revisions"}],"wp:attachment":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/media?parent=644"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/categories?post=644"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/tags?post=644"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}