{"id":499,"date":"2021-11-01T19:05:14","date_gmt":"2021-11-01T19:05:14","guid":{"rendered":"https:\/\/fde.cat\/index.php\/2021\/11\/01\/the-journey-of-building-a-scalable-api\/"},"modified":"2021-11-01T19:05:14","modified_gmt":"2021-11-01T19:05:14","slug":"the-journey-of-building-a-scalable-api","status":"publish","type":"post","link":"https:\/\/fde.cat\/index.php\/2021\/11\/01\/the-journey-of-building-a-scalable-api\/","title":{"rendered":"The Journey of Building a Scalable API"},"content":{"rendered":"<p>APIs are an essential tool to allow partners, developers, and applications to consume, communicate\u00a0,or build on top of the various capabilities your microservices provide.<\/p>\n<p>Building a high quality API that can scale and perform with the business ecosystem is not easy and requires putting thought and planning into everything, from choosing an execution environment to even determining what API technology you will\u00a0use.<\/p>\n<p>So how did <em>we<\/em> do it? In this blog post, I will share my experience of building the API for Activity Platform at Salesforce as a guide to writing an API for your own needs. Activity Platform is a big data event processing engine that ingests and analyzes over 100 million customer interactions every day to <a href=\"https:\/\/help.salesforce.com\/articleView?id=sf.einstein_sales_aac.htm&amp;type=5\">automatically capture data<\/a> and to generate <a href=\"https:\/\/help.salesforce.com\/articleView?id=einstein_sales_setup_enable_email_insights.htm&amp;type=0\">insights<\/a>, <a href=\"https:\/\/help.salesforce.com\/articleView?id=sf.einstein_sales_setup_recommended_connections.htm&amp;type=5\">recommendations<\/a>, and <a href=\"https:\/\/help.salesforce.com\/s\/articleView?id=sf.hvs_engagement.htm&amp;type=5\">feeds.<\/a> Activity Platform provides APIs to serve these to our\u00a0clients.<\/p>\n<h3>Choosing an Execution Environment<\/h3>\n<p>Depending on the requirement, an execution environment could be bare metal, a virtual machine (VM), or an application container. We chose application containers, as these can run on a physical machine or in a VM, and a single operating system instance can support multiple containers, each running within its own, separate execution environment. In a nutshell, containers are lightweight, portable, fast, and easy to deploy and scale, so they are a natural fit for microservices.<\/p>\n<h3>A note on container orchestration<\/h3>\n<p>If you decide to go with containers, like we did, container orchestration will help you automate the deployment, management, scaling, and networking of containers. There are many container orchestration tools to consider: Kubernetes, Apache Mesos, or DC\/OS (with Marathon), Amazon EKS, Google Kubernetes Engine (GKE), and\u00a0others.<\/p>\n<p>We use <a href=\"https:\/\/www.nomadproject.io\/\">Nomad clusters from Hashicorp<\/a>. It\u2019s simple, lightweight, and can orchestrate applications of any type\u200a\u2014\u200anot just containers. It integrates seamlessly with Consul and Vault for service discovery and secrets management. You can easily describe the requirements a task needs to execute such as memory, network, CPU, and more along with specifying the number of instances you need to horizontally scale your\u00a0service.<\/p>\n<h3>Choosing an API Technology<\/h3>\n<p>To build an API, we chose<a href=\"https:\/\/graphql.org\/\"> GraphQL<\/a>. If you haven\u2019t heard of it, it is a popular alternative to other available options like REST, SOAP, Apache Thrift, OpenAPI\/Swagger or\u00a0gRPC.<\/p>\n<p><strong>Why did we choose\u00a0GraphQL?<\/strong><\/p>\n<p>We wanted to build an API that can serve various clients ranging from web to mobile app. It needed to be efficient, powerful and flexible.<\/p>\n<p>GraphQL was the best fit for our needs for a few\u00a0reasons:<\/p>\n<p>1). GraphQL is database agnostic and can serve data from anywhere you want for your defined business domain. This means that underneath you can use Cassandra, Elasticsearch, or an existing API from other modules for a single\u00a0query.<\/p>\n<p>2). It allows clients to request exactly what they need, avoiding overfetching or underfetching. If an API returns more than what a client needs, there is a performance hit, and if it returns less, multiple network calls will slow the rendering time. GraphQL avoids both of these outcomes.<\/p>\n<p>3). While most APIs do versioning, GraphQL serves a versionless API, as it only returns the data that\u2019s explicitly requested, so new capabilities can be added via new types and new fields on those types without creating a breaking\u00a0change.<\/p>\n<p>4). GraphQL uses a strong type system where all the types are written in schema using the Graph SDL. It serves as the contract between the client and the server with no confusion about request\/response structure.<\/p>\n<p>5). GraphLQ supports introspection, so schema definition can easily be shared or downloaded using various tools like GraphiQL\u00a0, GraphQL- playground, or cli\u00a0tools.<\/p>\n<h3>GraphQL in\u00a0Action<\/h3>\n<p>We used GraphQL in our Classification Insight API. Classification Insight offers information about a user and helps meeting participants know the titles and roles of other people present at the meeting. For this API, we used <a href=\"https:\/\/kotlinlang.org\/\">Kotlin<\/a> and <a href=\"https:\/\/www.graphql-java.com\/\">graphql-java<\/a>, a Java implementation of\u00a0GraphQL.<\/p>\n<p><strong>Step 1<\/strong>: Define your schema (e.g. schema.graphqls). Every GraphQL service defines a set of types. The most basic components of a GraphQL schema are object types, which represent a kind of object you can fetch from your service. Query type is to define the <em>entry point<\/em> of every GraphQL\u00a0query.<\/p>\n<p>In the schema below, I have defined a query \u201cgetClassificationInsightsByUser\u201d which can be called later by posting this payload to your running api (e.g. localhost:8080\/api)\u00a0:<br \/>{ getClassificationInsightsByUser(<strong>emailAddresses<\/strong>: <strong>[\u201ctest1@gmail.com\u201d, \u201ctest2@gmail.com\u201d]<\/strong>) { userId, title }\u00a0}<\/p>\n<p>schema.graphqls<\/p>\n<p><em># object type to describe what you can fetch<br \/><\/em>type ClassificationInsightByUser {<br \/>  organizationId: ID!<br \/>  userId: String!<br \/>  emailAddress: String!<br \/>  title: String!<br \/>}<br \/><em># Query type to define all your queries<br \/><\/em>type Query {<br \/>  getClassificationInsightsByUser(<br \/>    emailAddresses: [String!]!<br \/>  ): [ClassificationInsightByUser]<br \/>}<\/p>\n<p>schema {<br \/>  query: Query<br \/>}<\/p>\n<p><strong>Step 2<\/strong>: Implement <a href=\"https:\/\/www.graphql-java.com\/documentation\/v16\/data-fetching\/\">Datafetcher<\/a> (also known as resolver) to resolve the field getClassificationInsightsByUser. A resolver is basically a function provided by the developer to resolve each field of type defined in schema and return its value from the configured resources like a database, other APIs, or from cache,\u00a0etc.<\/p>\n<p>In this example, our Query type provides a field called getClassificationInsightsByUser which accepts the argument emailAddresses. The resolver function for this field likely accesses a database and then constructs and returns a list of ClassificationInsightByUser object.<\/p>\n<p>\/\/ Assuming you already have your data class<br \/>\/\/ (e.g. ClassificationInsightByUser) defined to hold the data<\/p>\n<p>\/\/ Write your datafetcher class<br \/>class ClassificationInsightByUserDataFetcher:<br \/>  DataFetcher&lt;List&lt;ClassificationInsightByUser&gt;?&gt; {<\/p>\n<p>  \/\/ override DataFetcher&#8217;s get function.<br \/>  override fun get(env: DataFetchingEnvironment):<br \/>    List&lt;ClassificationInsightByUser&gt;? {    \/\/ get the argument passed in submitted query<br \/>    val emailAddresses = env.getArgument&lt;List&lt;String&gt;&gt;    (EMAIL_ADDRESSES)<br \/>    \/\/ write logic to get data from other API Or,<br \/>    \/\/ from your business layer calling your controller\/service<br \/>    \/\/ Here, just returning the static data to keep it simple.<br \/>    return EntityData.getClassificationInsightByUser(emailAddresses)<br \/>  }<br \/>}<\/p>\n<p><strong>Step 3<\/strong>: Initialize GraphQLSchema and GraphQL Object (using <a href=\"https:\/\/www.graphql-java.com\/\">graphql-java<\/a>) to help execute the\u00a0query.<\/p>\n<p>\/\/ load all your schema files as string using your own utility function<br \/>String schema = getResourceFileAsString(&#8220;schema.graphqls&#8221;)<\/p>\n<p>\/\/ create the typeRegistry from all your schema files<br \/>val schemaParser = SchemaParser()<br \/>val typeDefinitionRegistry = TypeDefinitionRegistry()<br \/>typeDefinitionRegistry.merge(schemaParser.parse(schema))<\/p>\n<p>\/\/ runtime wiring where you wire your query type to resolver<br \/>val runtimeWiring = RuntimeWiring()<br \/>  .type(&#8220;Query&#8221;, builder -&gt; builder.dataFetcher(<br \/>            &#8220;getClassificationInsightsByUser&#8221;, ClassificationInsightByUserDataFetcher()<br \/>          )<br \/>  )<br \/>  .build();<br \/>\/\/ create graphQL Schema<br \/>val schemaGenerator = SchemaGenerator();<br \/>val graphQLSchema = schemaGenerator<br \/>  .makeExecutableSchema(typeDefinitionRegistry,runtimeWiring);<br \/>\/\/ create graphQL<br \/>val graphQL = GraphQL.newGraphQL(graphQLSchema).build();<\/p>\n<p><strong>Step 4<\/strong>: Write a servlet (MyAppServlet) to handle incoming requests.<\/p>\n<p>override fun doPost(req: HttpServletRequest, resp:<br \/>    HttpServletResponse) {<br \/>  val jsonRequest = JSONObject(payloadString)<br \/>  val executionInput = ExecutionInput.newExecutionInput()<br \/>  .query(jsonRequest.getString(&#8220;query&#8221;))<br \/>  .build()<br \/>  \/\/ execute your query using graphQL. <br \/>  \/\/ It will call your resolvers to get the data<br \/>  \/\/ and only return the data that was requested.<br \/>  val executionResult = graphQL.execute(executionInput)<\/p>\n<p>  \/\/send the response<br \/>  resp.characterEncoding = &#8220;UTF-8&#8221;<br \/>  resp.writer.println(mapper.writeValueAsString(executionResult.toSpecification()))<br \/>  resp.writer.close()<\/p>\n<p>}<\/p>\n<p><strong>Step 5<\/strong>: Embed the web server (<a href=\"https:\/\/wiki.eclipse.org\/Jetty\/Tutorial\/Embedding_Jetty\">jetty<\/a> in this case) in your application.<\/p>\n<p>\/\/ The Server<br \/>val server = new Server();<\/p>\n<p>\/\/ HTTP connector, use HTTPS in production<br \/>val http = ServerConnector(server)<br \/>http.host = &#8220;localhost&#8221;<br \/>http.Port = 8080<br \/>http.idleTimeout = 30000<\/p>\n<p>\/\/ Setup handler<br \/>val servletContextHandler = ServletContextHandler()<br \/>servletContextHandler.contextPath = &#8220;\/&#8221;<br \/>servletContextHandler.addServlet(ServletHolder(MyAppServlet()), &#8220;\/api&#8221;)<br \/>server.handler = servletContextHandler<\/p>\n<p>\/\/start the jetty server to listen the request<br \/>server.start()<br \/>server.join()<\/p>\n<p><strong>Step 6<\/strong>: Build and start your application. Use your CI\/CD tool to create, publish, and deploy your Docker images to your\u00a0cluster.<\/p>\n<h3>Ensuring Your APIs are\u00a0Secure<\/h3>\n<p>At Salesforce, security is our top priority. Our APIs are accessible only to registered users, and they can access only the data that they have the permissions for. You may want to explore <a href=\"https:\/\/oauth.net\/2\/\">OAuth 2.0<\/a> (JWT grant type and role based access control) and <a href=\"https:\/\/www.openpolicyagent.org\/\">Open Policy Agent<\/a> (OPA) for your access control\u00a0needs.<\/p>\n<p>As a best practice, your authentication middleware should be placed before GraphQL and have a single source of truth for authorization in the business logic layer, avoiding the need to check at multiple places. In addition to authentication and authorization, rate limiting, data masking, and payload scanning should also be considered while designing your\u00a0API.<\/p>\n<h3>Conclusion<\/h3>\n<p>We have demonstrated how to build a scalable, efficient, secure API. We used application containers to scale, GraphQL and embedded Jetty to make it efficient and lightweight, and prioritized the security aspects of our API. We will discuss other aspects of API development, such as security and deployment, in more detail in upcoming\u00a0posts.<\/p>\n<h3>Acknowledgement<\/h3>\n<p>Thanks to Alex Oscherov for keeping me honest about our systems and architecture and to <a href=\"https:\/\/www.linkedin.com\/in\/lauralindeman\/\">Laura Lindeman<\/a> for her review and feedback on improving this blog post. Also, I\u2019d like to take the opportunity to mention it has been a wonderful learning experience working with the talented folks on the Activity Platform and Infra\u00a0teams.<\/p>\n<p>Please reach out to <a href=\"https:\/\/www.linkedin.com\/in\/nkumarsingh\">me<\/a> with any questions. I would love to hear your thoughts on the topic. If you\u2019re interested in solving challenges in the framework of software components built to ingest and process large volumes of streaming data from multiple sources, <a href=\"https:\/\/www.salesforce.com\/company\/careers\/teams\/tech-and-product\/\">we\u2019re\u00a0hiring<\/a>.<\/p>\n<p><a href=\"https:\/\/engineering.salesforce.com\/the-journey-of-building-a-scalable-api-df7ecf2f233e\">The Journey of Building a Scalable API<\/a> was originally published in <a href=\"https:\/\/engineering.salesforce.com\/\">Salesforce Engineering<\/a> on Medium, where people are continuing the conversation by highlighting and responding to this story.<\/p>\n<p><a href=\"https:\/\/engineering.salesforce.com\/the-journey-of-building-a-scalable-api-df7ecf2f233e?source=rss----cfe1120185d3---4\">Read More<\/a><\/p>","protected":false},"excerpt":{"rendered":"<p>APIs are an essential tool to allow partners, developers, and applications to consume, communicate\u00a0,or build on top of the various capabilities your microservices provide. Building a high quality API that can scale and perform with the business ecosystem is not easy and requires putting thought and planning into everything, from choosing an execution environment to&hellip; <a class=\"more-link\" href=\"https:\/\/fde.cat\/index.php\/2021\/11\/01\/the-journey-of-building-a-scalable-api\/\">Continue reading <span class=\"screen-reader-text\">The Journey of Building a Scalable API<\/span><\/a><\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","footnotes":""},"categories":[7],"tags":[],"class_list":["post-499","post","type-post","status-publish","format-standard","hentry","category-technology","entry"],"jetpack_featured_media_url":"","jetpack-related-posts":[{"id":315,"url":"https:\/\/fde.cat\/index.php\/2021\/08\/31\/api-federation-growing-scalable-api-landscapes\/","url_meta":{"origin":499,"position":0},"title":"API Federation: growing scalable API landscapes","date":"August 31, 2021","format":false,"excerpt":"Organizations embrace micro-services and event-driven APIs in their technology platforms to try to achieve the promise of greater agility, increased innovation, and more autonomy for their development teams. However, after the initial success, it is not unusual for organizations to face difficulties when they try to scale their distributed platforms.\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":837,"url":"https:\/\/fde.cat\/index.php\/2024\/03\/12\/from-concept-to-reality-developing-mulesofts-new-flex-gateway-api-management-solution\/","url_meta":{"origin":499,"position":1},"title":"From Concept to Reality: Developing MuleSoft\u2019s New Flex Gateway API Management Solution","date":"March 12, 2024","format":false,"excerpt":"In our \u201cEngineering Energizers\u201d Q&A series, we explore the remarkable journeys of engineering leaders who have made significant contributions in their respective fields. Today, we dive into the technical journey of Evangelina Martinez Ruiz Moreno, a Senior Director at Salesforce, who spearheaded the development of MuleSoft\u2019s new Anypoint Flex Gateway.\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":221,"url":"https:\/\/fde.cat\/index.php\/2021\/02\/02\/asyncapi-and-openapi-an-api-modeling-approach\/","url_meta":{"origin":499,"position":2},"title":"AsyncAPI and OpenAPI: an API Modeling Approach","date":"February 2, 2021","format":false,"excerpt":"AsyncAPI is gaining traction in the ecosystem of API tools. It solves an important problem: it provides a convenient way of describing the interface of event-driven systems independently of the underlying technology. With AsyncAPI, evented systems can be treated as any other API product: a productizable and reusable, self-describing building\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":288,"url":"https:\/\/fde.cat\/index.php\/2021\/08\/31\/building-a-successful-enterprise-ai-platform\/","url_meta":{"origin":499,"position":3},"title":"Building a Successful Enterprise AI Platform","date":"August 31, 2021","format":false,"excerpt":"IntroductionIn 2016, I started as a fresh grad software engineer at a small startup called MetaMind, which was acquired by Salesforce. Since then, it has been quite a journey to achieve a lot with a small team. I\u2019m part of Einstein Vision and Language Platform team. Our platform provides customers\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":558,"url":"https:\/\/fde.cat\/index.php\/2022\/03\/29\/investigate-issues-with-ease-by-adding-a-correlation-id-to-your-api\/","url_meta":{"origin":499,"position":4},"title":"Investigate Issues with Ease by Adding a Correlation ID to your API","date":"March 29, 2022","format":false,"excerpt":"With APIs becoming more complex and distributed, developers sometimes struggle to find the relevant logs when they need to investigate a specific issue. In the new Salesforce Commerce APIs (SCAPI), we created such an architecture of distributed systems and recognized this problem early. Our approach to mitigate it was the\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":593,"url":"https:\/\/fde.cat\/index.php\/2022\/03\/29\/investigate-issues-with-ease-by-adding-a-correlation-id-to-your-api-2\/","url_meta":{"origin":499,"position":5},"title":"Investigate Issues with Ease by Adding a Correlation ID to your API","date":"March 29, 2022","format":false,"excerpt":"With APIs becoming more complex and distributed, developers sometimes struggle to find the relevant logs when they need to investigate a specific issue. In the new\u00a0Salesforce Commerce APIs\u00a0(SCAPI), we created such an architecture of distributed systems and recognized this problem early. Our approach to mitigate it was the introduction of\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/499","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/comments?post=499"}],"version-history":[{"count":0,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/499\/revisions"}],"wp:attachment":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/media?parent=499"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/categories?post=499"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/tags?post=499"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}