{"id":176,"date":"2021-01-25T18:32:00","date_gmt":"2021-01-25T18:32:00","guid":{"rendered":"https:\/\/fde.cat\/index.php\/2021\/01\/25\/smart-argument-suite-seamlessly-connecting-python-jobs\/"},"modified":"2021-02-02T13:46:55","modified_gmt":"2021-02-02T13:46:55","slug":"smart-argument-suite-seamlessly-connecting-python-jobs","status":"publish","type":"post","link":"https:\/\/fde.cat\/index.php\/2021\/01\/25\/smart-argument-suite-seamlessly-connecting-python-jobs\/","title":{"rendered":"Smart Argument Suite: Seamlessly connecting Python jobs"},"content":{"rendered":"<div class=\"resourceParagraph section\">\n<div class=\"component-anchor-container\">\n  <a class=\"component-anchor\" name=\"post_par_resourceparagraph\"><\/a>\n <\/div>\n<div class=\"resource-text-section\">\n<div class=\"resource-paragraph rich-text\">\n<p><i>Co-authors: <a href=\"https:\/\/www.linkedin.com\/in\/jun-jia-a2441a89\/\" target=\"_blank\" rel=\"noopener\">Jun Jia<\/a> and <a href=\"https:\/\/www.linkedin.com\/in\/yi-alice-wu-59570239\/\" target=\"_blank\" rel=\"noopener\">Alice Wu<\/a><\/i><\/p>\n<h2>Introduction<\/h2>\n<p>It\u2019s a very common scenario that an AI solution involves composing different jobs, such as data processing and model training or evaluation, into workflows and then submitting them to an orchestration engine for execution. At large companies such as LinkedIn, there may be hundreds of thousands of such executions per day, submitted and executed by multiple teams and engineers. Any improvements in the tools used by machine learning engineers lead to significant improvements in productivity, which highlights the need for robust productivity infrastructure to support machine learning engineers.<\/p>\n<p>In most cases, these jobs are launched via the command line interface (CLI). Passing the arguments through the CLI becomes a producer and consumer problem: on the workflow generation side, you need to produce a set of arguments which are passed to the CLI to launch the jobs; on the other side, the launched jobs would consume the arguments passed from the CLI. We built Smart Argument Suite (<a href=\"https:\/\/pypi.org\/project\/smart-arg\/\" target=\"_blank\" rel=\"noopener\">smart-arg<\/a>) to make this process standard, smooth, and safe while also being human-friendly.<\/p>\n<\/p><\/div>\n<\/p><\/div>\n<\/div>\n<div class=\"resourceImageBlock section\">\n<div class=\"component-anchor-container\">\n  <a class=\"component-anchor\" name=\"post_par_resourceimageblock\"><\/a>\n <\/div>\n<ul class=\"resource-image-block single\">\n<li class=\"resource-image\"> <img decoding=\"async\" src=\"https:\/\/i0.wp.com\/content.linkedin.com\/content\/dam\/engineering\/site-assets\/images\/blog\/posts\/2021\/01\/smart-arg-demo.gif?w=750&#038;ssl=1\" alt=\"gif-showing-smart-arg-coding-in-action\" data-recalc-dims=\"1\"> <\/li>\n<\/ul>\n<\/div>\n<div class=\"resourceParagraph section\">\n<div class=\"component-anchor-container\">\n  <a class=\"component-anchor\" name=\"post_par_resourceparagraph_1981592605\"><\/a>\n <\/div>\n<div class=\"resource-text-section\">\n<div class=\"resource-paragraph rich-text\">\n<h2>Designing the Smart Argument Suite<\/h2>\n<p>Most of the popular AI packages, e.g., Tensorflow or PyTorch, in the open source domain nowadays come in Python, as do orchestration engine SDKs such as Airflow, Kubeflow, and Cloudflow (a.k.a Azkaban-ng). There are many Python packages available for CLI argument parsing, and there is even one from Python standard library argparse\u2014all helping on the consumer side. However, none of them offer any functionality on the producer side, to the best of our knowledge. Engineers at LinkedIn developed this slim Python library (smart-arg) to help both sides of the problems: producing human-friendly CLI representation of the arguments, and consuming them consistently.<\/p>\n<\/p><\/div>\n<\/p><\/div>\n<\/div>\n<div class=\"resourceImageBlock section\">\n<div class=\"component-anchor-container\">\n  <a class=\"component-anchor\" name=\"post_par_resourceimageblock_1384020146\"><\/a>\n <\/div>\n<ul class=\"resource-image-block single\">\n<li class=\"resource-image\"> <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/content.linkedin.com\/content\/dam\/engineering\/site-assets\/images\/blog\/posts\/2021\/01\/smartarg3.png?resize=750%2C775&#038;ssl=1\" alt=\"smart-arg-code-example\" height=\"775\" width=\"750\"  data-recalc-dims=\"1\"> <\/li>\n<\/ul>\n<\/div>\n<div class=\"resourceParagraph section\">\n<div class=\"component-anchor-container\">\n  <a class=\"component-anchor\" name=\"post_par_resourceparagraph_874501398\"><\/a>\n <\/div>\n<div class=\"resource-text-section\">\n<div class=\"resource-paragraph rich-text\">\n<p>For the ease of discussion, we will assume the argument container is defined as a class in Python and call the conversion of such a class instance to and from a CLI compatible form \u201cserialization and deserialization\u201d (SerDes).<\/p>\n<\/p><\/div>\n<\/p><\/div>\n<\/div>\n<div class=\"resourceImageBlock section\">\n<div class=\"component-anchor-container\">\n  <a class=\"component-anchor\" name=\"post_par_resourceimageblock_1963412808\"><\/a>\n <\/div>\n<ul class=\"resource-image-block single\">\n<li class=\"resource-image\"> <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/content.linkedin.com\/content\/dam\/engineering\/site-assets\/images\/blog\/posts\/2021\/01\/smartarg4.png?resize=750%2C267&#038;ssl=1\" alt=\"flowchart-showing-serialization-and-deserialization\" height=\"267\" width=\"750\"  data-recalc-dims=\"1\"> <\/li>\n<\/ul>\n<\/div>\n<div class=\"resourceParagraph section\">\n<div class=\"component-anchor-container\">\n  <a class=\"component-anchor\" name=\"post_par_resourceparagraph_1787445360\"><\/a>\n <\/div>\n<div class=\"resource-text-section\">\n<div class=\"resource-paragraph rich-text\">\n<h2>Why smart-arg?<\/h2>\n<p>There are many excellent existing choices for parsing command lines, such as Click, docopt, TAP, or the bare metal argparse\/optparse, so you may be wondering, \u201cWhy smart-arg?\u201d<\/p>\n<p>The answer is simple: smart-arg is not (just) a command line parser. It\u2019s also for creating and passing typed arguments through CLI as seamlessly as passing arguments through function calls. Its design goal is to hide all the low-level parsing\/deserialization and the additional serialization work and let users directly work with typed Python objects.\u00a0<\/p>\n<h2>Why not Click, docopt, or \u2026?<\/h2>\n<p>These options are perfectly fine for command line parsing and invocation of command line applications.<\/p>\n<p>They can parse the command line (deserialization), but none of them offer a way to create the command lines (serialization) programmatically, to our best knowledge. Their intended use case is for a human to manually type in those commands to run the utilities.<\/p>\n<p>If you work with orchestration engine SDKs to create workflow pipelines or prefer not to manually type the command line, or just simply don\u2019t want to worry about the parsing, smart-arg is here for you!<\/p>\n<h2>Principles<\/h2>\n<p>When designing our solution, we knew that we wanted to specifically address a few pain points. We formed these into principles that dictated how we approached the creation of smart-arg.<\/p>\n<p><b>It should be simple<br \/> <\/b>We wanted the usage of our tool to be as simple as defining an argument container object and passing it through a function call. We felt it should give the user peace-of-mind around handling the argument passing through CLI. It should let the user simply focus on how to define an argument container class that makes sense, instead of how to create a CLI using a raw argument parsing tool, such as argparse, or how to compose the command line correctly.<\/p>\n<p>smart-arg allows you to simply define your argument container class \u201cArgClass\u201d as a <a href=\"https:\/\/docs.python.org\/3.7\/library\/typing.html?highlight=namedtuple#typing.NamedTuple\" target=\"_blank\" rel=\"noopener\">NamedTuple<\/a> or <a href=\"https:\/\/docs.python.org\/3.7\/library\/dataclasses.html#dataclasses.dataclass\" target=\"_blank\" rel=\"noopener\">dataclass<\/a>, annotate it with the decorator @arg_suite, and, voil\u00e0, \u201carg_class.__to_argv__()\u201d gives the serialized form for CLI, while \u201cArgClass.__from_argv__()\u201d deserializes the command line to the corresponding \u201cArgClass\u201d instance.<\/p>\n<p><b>It should be safe<br \/> <\/b>We wanted our tool to have a verifiable and testable systematic SerDe process with certain safety guarantees, including type-safety. We wanted it to help users minimize human errors around the argument handling. Given that Python is a dynamic language, our solution would need to maximize the utilization of all the existing tools to improve type-safety.<\/p>\n<p>smart-arg deploys the well-trusted Python standard library <a href=\"https:\/\/docs.python.org\/3\/library\/argparse.html\" target=\"_blank\" rel=\"noopener\">argparse<\/a> under the hood for deserializations and keeps the corresponding serialization process well-tested.<\/p>\n<p>smart-arg enables IDEs&#8217; code autocompletion and type hints functionalities by utilizing the commonly used, typed and immutable <a href=\"https:\/\/docs.python.org\/3.7\/library\/typing.html?highlight=namedtuple#typing.NamedTuple\" target=\"_blank\" rel=\"noopener\">NamedTuple<\/a> and <a href=\"https:\/\/docs.python.org\/3.7\/library\/dataclasses.html#dataclasses.dataclass\" target=\"_blank\" rel=\"noopener\">dataclass<\/a> from the standard Python library to help users spot errors early. It also brings in field value validation against its declared type, in addition to argparse, which it uses for parsing or the container class instantiation.<\/p>\n<p><b>It should be human-friendly<br \/> <\/b>We have mentioned this phrase multiple times now. Why? Because it\u2019s important! There is always a need for human intervention with the workflows, whether by an AI engineer, a devops practitioner, or an SRE. We need to make it easy for people to do inspection or debugging on the serialized form.<\/p>\n<p>smart-arg serializes an argument container class instance to a sequence of strings that is compatible with the standard Python library argparse, which can be easily inspected by human eyes.<\/p>\n<p><b>It should be extensible<br \/> <\/b>A user should be able to extend the support to the argument container classes when desirable.<\/p>\n<p>They should also be able to extend the support to their own types of the fields of any argument container classes.<\/p>\n<p>smart-arg supports <a href=\"https:\/\/docs.python.org\/3.7\/library\/typing.html?highlight=namedtuple#typing.NamedTuple\" target=\"_blank\" rel=\"noopener\">NamedTuple<\/a> and <a href=\"https:\/\/docs.python.org\/3.7\/library\/dataclasses.html#dataclasses.dataclass\" target=\"_blank\" rel=\"noopener\">dataclass<\/a> out-of-box, and other classes by implementing a simple interface. To extend the support to any additional field types, type handlers can be implemented for the SerDe process.<\/p>\n<h2>Implementation and usage<\/h2>\n<\/p><\/div>\n<\/p><\/div>\n<\/div>\n<div class=\"resourceImageBlock section\">\n<div class=\"component-anchor-container\">\n  <a class=\"component-anchor\" name=\"post_par_resourceimageblock_1785928622\"><\/a>\n <\/div>\n<ul class=\"resource-image-block single\">\n<li class=\"resource-image\"> <img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/content.linkedin.com\/content\/dam\/engineering\/site-assets\/images\/blog\/posts\/2021\/01\/smartarg5.png?resize=750%2C429&#038;ssl=1\" alt=\"chart-showing-components-of-smart-arg\" height=\"429\" width=\"750\"  data-recalc-dims=\"1\"> <\/li>\n<\/ul>\n<\/div>\n<div class=\"resourceParagraph section\">\n<div class=\"component-anchor-container\">\n  <a class=\"component-anchor\" name=\"post_par_resourceparagraph_1734212512\"><\/a>\n <\/div>\n<div class=\"resource-text-section\">\n<div class=\"resource-paragraph rich-text\">\n<p><i>The general working principle of smart-arg<\/i><\/p>\n<p>For each supported argument container class (NamedTuple or dataclass by default), there is a proxy class to define the communication to the actual container class. For any supported types, there are corresponding TypeHandlers to specify the SerDe process for those types.<\/p>\n<p>Users only need to define a Python NamedTuple or dataclass with all the argument options defined in the class if the arguments were not modeled this way already and then decorate the container class with @arg_suite. With such a decorator, smart-arg can dynamically decompose every field (with experimental support of nested container classes) into corresponding argparse arguments. Python argparse is a common library Python users use to digest the command line arguments. Whenever parsing the command line from the system, smart-arg will compose the defined Python container class object (either NamedTuple or dataclass). It is mostly type safe, given that smart-arg will cast the command line string into corresponding type information defined in the decorated Python class. Referencing the option is also much easier than before, because IDE would autocomplete and offer hints whenever the users tried to use the argument option. So, users can finally say \u201cgoodbye\u201d to the miserable experience of memorizing all the argument options. In addition, smart-arg also provides a bunch of nice add-ons for users, such as systematic post-validation and user defined post-initialization of the user arguments. It\u2019s also extensible, because users have choices when it comes to defining their own parsing behaviors, which is achieved through extending the smart-arg provided base classes. In short, smart-arg is a simple, safe, user-friendly, extensible Python library which can benefit day-to-day work for AI engineers and others.<\/p>\n<p><b>Caveats<\/b><br \/> The SerDe process won\u2019t be as universally applicable to any argument container class as compared to a generic-purposed standard, such as JSON. However, we believe the provided default type support covers the majority of the use cases already.<\/p>\n<p>To preserve human-friendliness, the serialized form of all user inputs or the actual field values are intact and inspectable\u2014not encoded to be fully CLI compatible\u2014so there is a chance that the CLI might be confused by special characters, such as quotation marks.<\/p>\n<h2>Current status and future work<\/h2>\n<p>The smart-arg has been released to <a href=\"https:\/\/pypi.org\/project\/smart-arg\" target=\"_blank\" rel=\"noopener\">PyPI<\/a> and the source code is on <a href=\"https:\/\/github.com\/linkedin\/smart-arg\" target=\"_blank\" rel=\"noopener\">GitHub<\/a>. It\u2019s already being battle tested in action with LinkedIn open source AI solutions: the deep personalization framework <a href=\"https:\/\/github.com\/linkedin\/gdmix\" target=\"_blank\" rel=\"noopener\">GDMix<\/a> and the deep NLU ranking and classification framework <a href=\"https:\/\/github.com\/linkedin\/detext\" target=\"_blank\" rel=\"noopener\">DeText<\/a>.<\/p>\n<p>There is still work that we foresee in the future, such as:<\/p>\n<ul>\n<li>\n<p>Adding escaping to make the serialization safer with CLI. Please reach out if you have a good solution to this problem, or better yet: create a PR!\u00a0<\/p>\n<\/li>\n<li>\n<p>Expanding beyond the language boundaries; for example, there are many Scala Spark jobs in LinkedIn\u2019s AI ecosystem, and it is desirable that we be able to seamlessly integrate between Python and JVM worlds.<\/p>\n<\/li>\n<\/ul>\n<p>We\u2019re looking forward to collaboration with the open source community to make smart-arg a useful tool.<\/p>\n<h2>Acknowledgements<\/h2>\n<p>Thanks to our open-source guru <a href=\"https:\/\/www.linkedin.com\/in\/chriseppstein\" target=\"_blank\" rel=\"noopener\">Christopher Eppstein<\/a> for the help all along the open source journey, and Python vetern <a href=\"https:\/\/www.linkedin.com\/in\/barry-warsaw\/\" target=\"_blank\" rel=\"noopener\">Barry Warsaw<\/a> for providing valuable feedback to improve the quality of the project all around.<\/p>\n<\/p><\/div>\n<\/p><\/div>\n<\/div>\n<p><a href=\"https:\/\/engineering.linkedin.com\/blog\/2021\/smart-argument-suite--seamlessly-connecting-python-jobs\" target=\"_blank\" rel=\"noopener\">Read More<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Co-authors: Jun Jia and Alice Wu Introduction It\u2019s a very common scenario that an AI solution involves composing different jobs, such as data processing and model training or evaluation, into workflows and then submitting them to an orchestration engine for execution. At large companies such as LinkedIn, there may be hundreds of thousands of such&hellip; <a class=\"more-link\" href=\"https:\/\/fde.cat\/index.php\/2021\/01\/25\/smart-argument-suite-seamlessly-connecting-python-jobs\/\">Continue reading <span class=\"screen-reader-text\">Smart Argument Suite: Seamlessly connecting Python jobs<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","footnotes":""},"categories":[1,7],"tags":[],"class_list":["post-176","post","type-post","status-publish","format-standard","hentry","category-external","category-technology","entry"],"jetpack_featured_media_url":"","jetpack-related-posts":[{"id":722,"url":"https:\/\/fde.cat\/index.php\/2023\/06\/06\/9-software-engineering-productivity-secrets-to-ignite-innovation-every-day\/","url_meta":{"origin":176,"position":0},"title":"9 Software Engineering Productivity Secrets to Ignite Innovation Every Day","date":"June 6, 2023","format":false,"excerpt":"During the COVID-19 pandemic, Salesforce and many other software companies asked its employees to work from home to help safeguard their safety and their families. The Salesforce Industries team \u2014 innovators of industry-specific digital solutions for global companies across verticals \u2014 remained highly productive, developing and delivering a cutting-edge emergency\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":751,"url":"https:\/\/fde.cat\/index.php\/2023\/08\/22\/how-is-einstein-gpt-shaping-the-future-of-salesforce-development-and-unleashing-developer-productivity\/","url_meta":{"origin":176,"position":1},"title":"How is Einstein GPT Shaping the Future of Salesforce Development and Unleashing Developer Productivity?","date":"August 22, 2023","format":false,"excerpt":"By Yingbo Zhou and Scott Nyberg In our \u201cEngineering Energizers\u201d Q&A series, we examine the professional life experiences that have shaped Salesforce Engineering leaders. Meet Yingbo Zhou, a Senior Director of Research for Salesforce AI Research, where he leads the team to develop the model for Einstein GPT for Developers\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":791,"url":"https:\/\/fde.cat\/index.php\/2023\/08\/22\/how-is-einstein-shaping-the-future-of-salesforce-development-and-unleashing-developer-productivity\/","url_meta":{"origin":176,"position":2},"title":"How is Einstein Shaping the Future of Salesforce Development and Unleashing Developer Productivity?","date":"August 22, 2023","format":false,"excerpt":"By Yingbo Zhou and Scott Nyberg In our \u201cEngineering Energizers\u201d Q&A series, we examine the professional life experiences that have shaped Salesforce Engineering leaders. Meet Yingbo Zhou, a Senior Director of Research for Salesforce AI Research, where he leads the team to develop the model for Einstein for Developers, a\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":897,"url":"https:\/\/fde.cat\/index.php\/2024\/07\/16\/ai-lab-the-secrets-to-keeping-machine-learning-engineers-moving-fast\/","url_meta":{"origin":176,"position":3},"title":"AI Lab: The secrets to keeping machine learning engineers moving fast","date":"July 16, 2024","format":false,"excerpt":"The key to developer velocity across AI lies in minimizing time to first batch (TTFB) for machine learning (ML) engineers. AI Lab is a pre-production framework used internally at Meta. It allows us to continuously A\/B test common ML workflows \u2013 enabling proactive improvements and automatically preventing regressions on TTFB.\u00a0\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":729,"url":"https:\/\/fde.cat\/index.php\/2023\/06\/27\/simplifying-oauth-2-0-how-slacks-new-external-authentication-feature-boosts-developer-productivity\/","url_meta":{"origin":176,"position":4},"title":"Simplifying OAuth 2.0: How Slack\u2019s New External Authentication Feature Boosts Developer Productivity","date":"June 27, 2023","format":false,"excerpt":"In our \u201cEngineering Energizers\u201d Q&A series, we examine the professional journeys that have shaped Salesforce Engineering leaders. Say hello to Nupur Goyal, Staff Software Engineer at Slack. Nupur\u2019s core platform team at Slack helps developers increase their productivity and efficiency \u2014 empowering them to create cutting-edge applications that integrate with\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":818,"url":"https:\/\/fde.cat\/index.php\/2024\/01\/29\/improving-machine-learning-iteration-speed-with-faster-application-build-and-packaging\/","url_meta":{"origin":176,"position":5},"title":"Improving machine learning iteration speed with faster application build and packaging","date":"January 29, 2024","format":false,"excerpt":"Slow build times and inefficiencies in packaging and distributing execution files were costing our ML\/AI engineers a significant amount of time while working on our training stack. By addressing these issues head-on, we were able to reduce this overhead by double-digit percentages.\u00a0 In the fast-paced world of AI\/ML development, it\u2019s\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/176","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/comments?post=176"}],"version-history":[{"count":1,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/176\/revisions"}],"predecessor-version":[{"id":204,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/176\/revisions\/204"}],"wp:attachment":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/media?parent=176"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/categories?post=176"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/tags?post=176"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}