{"id":334,"date":"2021-08-31T14:39:28","date_gmt":"2021-08-31T14:39:28","guid":{"rendered":"https:\/\/fde.cat\/?p=334"},"modified":"2021-08-31T14:39:28","modified_gmt":"2021-08-31T14:39:28","slug":"building-data-pipelines-using-kotlin","status":"publish","type":"post","link":"https:\/\/fde.cat\/index.php\/2021\/08\/31\/building-data-pipelines-using-kotlin\/","title":{"rendered":"Building Data Pipelines Using Kotlin"},"content":{"rendered":"<p><em>Co-written by Alex\u00a0Oscherov<\/em><\/p>\n<p>Up until recently, we, like many companies, built our data pipelines in any one of a handful of technologies using Java or Scala, including Apache Spark, Storm, and Kafka. But Java is a very verbose language, so writing these pipelines in Java involves a lot of boilerplate code. For example, simple bean classes require writing multiple trivial getters and setters and multiple constructors and\/or builders. Oftentimes, hash and equals methods have to be overwritten in a trivial but verbose manner. Furthermore, all function parameters need to be checked for \u201cnull,\u201d polluting code with multiple branching operators. It\u2019s time-consuming (and not trivial!) to analyze which function parameters can and cannot be\u00a0\u201cnull.\u201d<\/p>\n<p>Processing data from the pipelines written in Java often involves branching based on the types or values of data from the pipeline, but limitations to the Java \u201cswitch\u201d operator cause extensive use of sprawling \u201cif-then-elseif-\u2026\u201d constructs. Finally, most data pipelines work with immutable data\/collections, but Java has almost no built-in support for separating mutable and immutable constructs, which forces writing additional boilerplate code.<\/p>\n<p>In deciding how to address these shortcomings of Java for data pipelines, we selected <a href=\"https:\/\/kotlinlang.org\/\">Kotlin<\/a> as an alternative for our backend development.<\/p>\n<h3>Why Kotlin?<\/h3>\n<p>Our choice of Kotlin was driven mostly by the following factors:<\/p>\n<p>Rich support in Kotlin for data bean classes enables us to stop writing explicit getters and\u00a0setters.Optional parameters and simplified constructor syntax let us avoid writing multiple constructors and builders.The presence of a \u201cdata class\u201d construct prevents us from having to write explicit overriding hash\/equals functions with trivial boilerplate code\u00a0.The baked-in type system null pointer safety guarantees that no necessary null pointer checks are skipped, and we get warnings about unnecessary checks, thus greatly reducing boilerplate code. In our experience since switching to Kotlin we have pretty much forgotten about dreaded runtime NPE exceptions.A robust mechanism for separating mutable and immutable data allows much simpler reasoning about parallel data processing.A versatile \u201cwhen\u201d operator allows for writing flexible and concise branching expressions based on data types and\u00a0values.Seamless integration with Java allows us to use any and all Java APIs without any mental overhead. The use of Kotlin interfaces from Java is also almost frictionless, and seen our APIs implemented in Kotlin be consumed by other teams using\u00a0Java.<\/p>\n<p>Here is a trivial example of Kotlin code that demonstrate some of the points that were enumerated above:<\/p>\n<p>enum class RequestType {CREATE, DELETE}<br \/>data class RuleChange(val organizationId: String, val userIds: List&lt;String&gt;, val request: RequestType)<\/p>\n<p>The same implementation in Java would look like\u00a0this:<\/p>\n<p>enum RequestType {CREATE, DELETE}<br \/>public final class RuleChange {<br \/>    final private String orgraniztionId;<br \/>    final private List&lt;String&gt; userIds;<br \/>    final private RuleChange ruleChange;    RuleChange(String organizationId, List&lt;String&gt; userIds, RuleChange ruleChange) {<br \/>        this.orgraniztionId = organizationId;<br \/>        this.userIds = userIds;<br \/>        this.ruleChange = ruleChange;<br \/>    }    final public String getOrgraniztionId() {<br \/>        return orgraniztionId;<br \/>    }    final public List&lt;String&gt; getUserIds() {<br \/>        return Collections.unmodifiableList(userIds);<br \/>    }    final public RuleChange getRuleChange() {<br \/>        return ruleChange;<br \/>    }    @Override<br \/>    public boolean equals(Object o) {<br \/>        if (this == o) return true;<br \/>        if (o == null || getClass() != o.getClass()) return false;<br \/>        RuleChange that = (RuleChange) o;<br \/>        return Objects.equals(getOrgraniztionId(), that.getOrgraniztionId()) &amp;&amp; Objects.equals(getUserIds(), that.getUserIds()) &amp;&amp; Objects.equals(getRuleChange(), that.getRuleChange());<br \/>    }    @Override<br \/>    public int hashCode() {<br \/>        return Objects.hash(getOrgraniztionId(), getUserIds(), getRuleChange());<br \/>    }<br \/>}<\/p>\n<p>These two pieces of code do almost exactly the same thing. We left out some Kotlin goodies that would require additional boilerplate code to implement in Java, but the gist of this example is probably obvious by now\u200a\u2014\u200aKotlin code is much more concise and packs a lot of freebies for developers.<\/p>\n<h3>A Clear Code Example in\u00a0Kotlin<\/h3>\n<p>One good example of Kotlin\u2019s succinct and understandable code is our rule change processor Kafka streams job that does validations on input data for null safety, deserializes the data using extension function, and then uses exhaustive pattern matching to perform operations on the\u00a0data.<\/p>\n<p>Here you can clearly see several of the benefits Kotlin offers\u00a0us.<\/p>\n<p><strong>Null Safety:<\/strong>No more ugly if\/else null check. We used Kotlin\u2019s built-in null safety check, which prevents NPE and makes code more readable.<strong>Extension Function<\/strong>: Kotlin provides the ability to add new functions to the existing class without having to inherit that class. Doesn\u2019t it.deserialize()on line 4 look more readable than using some helper class to deserialize the\u00a0data?<strong>First Class Support for Properties<\/strong>: We don\u2019t need to write get\/set methods because Kotlin offers first-class support for properties, as seen on lines 5 and\u00a06.<strong>Exhaustive Pattern Matching Using <\/strong>when<strong>Construct: <\/strong>Kotlin\u2019s whenexpression starting on line 8 does exhaustive pattern matching with enum values and case classes. No more no-op default case like we have to write when using Java\u2019s switch construct.<\/p>\n<h3>Kotlin for Activity Platform in Salesforce<\/h3>\n<p>Activity Platform is a big data event processing engine that ingests and analyzes 100+ million customer interactions every day to <a href=\"https:\/\/help.salesforce.com\/articleView?id=sf.einstein_sales_aac.htm&amp;type=5\">automatically capture data<\/a>, generate <a href=\"https:\/\/help.salesforce.com\/articleView?id=einstein_sales_setup_enable_email_insights.htm&amp;type=0\">insights<\/a> and <a href=\"https:\/\/help.salesforce.com\/articleView?id=sf.einstein_sales_setup_recommended_connections.htm&amp;type=5\">recommendations<\/a>.<\/p>\n<p>We\u2019ve widely adopted Kotlin in place of Java for backend development across Activity Platform as you can see in the diagram above. Here\u2019s what the flow looks\u00a0like:<\/p>\n<p>We process activity data in streaming fashion and generate intelligent insights using AI and machine learning that power multiple products across Salesforce.To process this data and generate insights, we run big data systems (like Kafka-Streams, Spark, and Storm) and expose an HTTPS GraphQL API for other teams to consume\u00a0data.We write all our business logic libraries in\u00a0Kotlin.Kafka Streams jobs are written in Kotlin. We use Kafka Streams jobs for simple map, filter, and write operations.Apache Storm topologies are written in Kotlin. Storm topologies perform General Data Protection Regulation (GDPR) operations on our\u00a0data.Spark Jobs are written in Scala, but they consume libraries written in Kotlin. We run complex SparkML models using these Spark\u00a0jobs.GraphQL APIs are also written in Kotlin while being powered by a Jetty\u00a0server.<\/p>\n<p>So, essentially, we have used Kotlin in all the places we could have used Java or another JVM language.<\/p>\n<h3>Benefits We\u2019ve Seen from Moving to\u00a0Kotlin<\/h3>\n<p>Kotlin\u2019s data classes and immutability have ensured consistency (preventing accidental data corruption) when other teams use our libraries. Its functional syntax and immutability have provided an elegant way for us to write the processing streams we need for our data pipelines. Having multiple classes in one file and being able to use top level functions has made it simple for us to organize code, greatly reducing the number of files we need to navigate. And there\u2019s even more we love in Kotlin that we could cover in this blog post, like extension functions, type aliasing, string templating, concurrent code execution using coroutines and async-await.<\/p>\n<p>There is lot to gain by using Kotlin for building data pipelines, not least of which is developer productivity. It has been extremely easy to onboard engineers from different programming backgrounds like Java, Scala, Python, and they have all liked the programming constructs Kotlin has to offer. This is why it\u2019s one of the <a href=\"https:\/\/insights.stackoverflow.com\/survey\/2020#technology-most-loved-dreaded-and-wanted-languages-loved\">most loved programming languages<\/a>of 2020. We will continue to expand its usage while building new pipelines and switching over old pipelines to use Kotlin. We are also interested in using Kotlin for building Spark jobs as and when <a href=\"https:\/\/blog.jetbrains.com\/kotlin\/2021\/02\/kotlin-for-apache-spark-one-step-closer-to-your-production-cluster\/\">more stable support for Spark becomes available<\/a>. For all those folks who are interested in building data pipelines, we recommend giving Kotlin a try to see its advantages over other programming languages.<\/p>\n<p><a href=\"https:\/\/engineering.salesforce.com\/building-data-pipelines-using-kotlin-2d70edc0297c\">Building Data Pipelines Using Kotlin<\/a> was originally published in <a href=\"https:\/\/engineering.salesforce.com\/\">Salesforce Engineering<\/a> on Medium, where people are continuing the conversation by highlighting and responding to this story.<\/p>\n<p><a href=\"https:\/\/engineering.salesforce.com\/building-data-pipelines-using-kotlin-2d70edc0297c?source=rss----cfe1120185d3---4\">Read More<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Co-written by Alex\u00a0Oscherov Up until recently, we, like many companies, built our data pipelines in any one of a handful of technologies using Java or Scala, including Apache Spark, Storm, and Kafka. But Java is a very verbose language, so writing these pipelines in Java involves a lot of boilerplate code. For example, simple bean&hellip; <a class=\"more-link\" href=\"https:\/\/fde.cat\/index.php\/2021\/08\/31\/building-data-pipelines-using-kotlin\/\">Continue reading <span class=\"screen-reader-text\">Building Data Pipelines Using Kotlin<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","footnotes":""},"categories":[7],"tags":[],"class_list":["post-334","post","type-post","status-publish","format-standard","hentry","category-technology","entry"],"jetpack_featured_media_url":"","jetpack-related-posts":[{"id":656,"url":"https:\/\/fde.cat\/index.php\/2022\/11\/22\/retrofitting-null-safety-onto-java-at-meta\/","url_meta":{"origin":334,"position":0},"title":"Retrofitting null-safety onto Java at Meta","date":"November 22, 2022","format":false,"excerpt":"We developed a new static analysis tool called Nullsafe that is used at Meta to detect NullPointerException (NPE) errors in Java code. Interoperability with legacy code and gradual deployment model were key to Nullsafe\u2019s wide adoption and allowed us to recover some null-safety properties in the context of an otherwise\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":643,"url":"https:\/\/fde.cat\/index.php\/2022\/10\/24\/from-zero-to-10-million-lines-of-kotlin\/","url_meta":{"origin":334,"position":1},"title":"From zero to 10 million lines of Kotlin","date":"October 24, 2022","format":false,"excerpt":"We\u2019re sharing lessons learned from shifting our Android development from Java to Kotlin. Kotlin is a popular language for Android development and offers some key advantages over Java.\u00a0 As of today, our Android codebase contains over 10 million lines of Kotlin code. We\u2019re open sourcing various examples and utilities we\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":188,"url":"https:\/\/fde.cat\/index.php\/2021\/02\/02\/pegasus-data-language-evolving-schema-definitions-for-data-modeling\/","url_meta":{"origin":334,"position":2},"title":"Pegasus Data Language: Evolving schema definitions for data modeling","date":"February 2, 2021","format":false,"excerpt":"Pegasus Data Schema (PDSC) is a Pegasus schema definition language that has been used for data modeling with Rest.li services for years. It's the underlying language that helps define data models, describe the data returned by REST endpoints, and generate derivative schemas for other uses, such as XML schemas and\u2026","rel":"","context":"In &quot;External&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":306,"url":"https:\/\/fde.cat\/index.php\/2021\/08\/31\/blazing-the-trail-one-year-with-openjdk-11\/","url_meta":{"origin":334,"position":3},"title":"Blazing the Trail: One Year with OpenJDK 11","date":"August 31, 2021","format":false,"excerpt":"Early Adoption of Java Runtime Innovations in Production at\u00a0ScaleCo-written by Donna\u00a0ThomasIntroductionSalesforce was one of the first major enterprises to adopt OpenJDK 11 at scale in production, starting our adoption journey shortly after its release in late 2018. Cutting edge? Sure. Safe? Absolutely. You might not know this, but Salesforce has\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":548,"url":"https:\/\/fde.cat\/index.php\/2022\/03\/08\/an-open-source-compositional-deadlock-detector-for-android-java\/","url_meta":{"origin":334,"position":4},"title":"An open source compositional deadlock detector for Android Java","date":"March 8, 2022","format":false,"excerpt":"What the research is: We\u2019ve developed a new static analyzer that catches deadlocks in Java code for Android without ever running the code. What distinguishes our analyzer from past research is its ability to analyze revisions in codebases with hundreds of millions of lines of code. We have deployed our\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":480,"url":"https:\/\/fde.cat\/index.php\/2021\/09\/29\/open-sourcing-mariana-trench-analyzing-android-and-java-app-security-in-depth\/","url_meta":{"origin":334,"position":5},"title":"Open-sourcing Mariana Trench: Analyzing Android and Java app security in depth","date":"September 29, 2021","format":false,"excerpt":"We\u2019re sharing details about Mariana Trench (MT), a tool we use to spot and prevent security and privacy bugs in Android and Java applications. As part of our effort to help scale security through building automation, we recently open-sourced MT to support security engineers at Facebook and across the industry.\u00a0\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/334","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/comments?post=334"}],"version-history":[{"count":1,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/334\/revisions"}],"predecessor-version":[{"id":375,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/334\/revisions\/375"}],"wp:attachment":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/media?parent=334"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/categories?post=334"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/tags?post=334"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}