{"id":652,"date":"2022-11-16T17:30:15","date_gmt":"2022-11-16T17:30:15","guid":{"rendered":"https:\/\/fde.cat\/index.php\/2022\/11\/16\/move-faster-wait-less-improving-code-review-time-at-meta\/"},"modified":"2022-11-16T17:30:15","modified_gmt":"2022-11-16T17:30:15","slug":"move-faster-wait-less-improving-code-review-time-at-meta","status":"publish","type":"post","link":"https:\/\/fde.cat\/index.php\/2022\/11\/16\/move-faster-wait-less-improving-code-review-time-at-meta\/","title":{"rendered":"Move faster, wait less: Improving code review time at Meta"},"content":{"rendered":"<p><span>Code reviews are one of the most important parts of the software development process<\/span><br \/>\n<span>At Meta we\u2019ve recognized the need to make code reviews as fast as possible without sacrificing quality<\/span><br \/>\n<span>We\u2019re sharing several tools and steps we\u2019ve taken at Meta to reduce the time waiting for code reviews<\/span><\/p>\n<p><span>When done well, code reviews can <a href=\"https:\/\/engineering.fb.com\/2022\/07\/20\/security\/how-meta-and-the-security-industry-collaborate-to-secure-the-internet\/\" target=\"_blank\" rel=\"noopener\">catch bugs<\/a>,<a href=\"https:\/\/engineering.fb.com\/2021\/02\/23\/data-infrastructure\/silent-data-corruption\/\"> teach best practices<\/a>, and <a href=\"https:\/\/engineering.fb.com\/2020\/12\/10\/developer-tools\/probabilistic-flakiness\/\" target=\"_blank\" rel=\"noopener\">ensure high code qualit<\/a>y. At Meta we call an individual set of changes made to the codebase a \u201cdiff.\u201d While we like to move fast at Meta, every diff must be reviewed, without exception. But, as the Code Review team, we also understand that when reviews take longer, people get less done.<\/span><\/p>\n<p><span>We\u2019ve studied several metrics to learn more about code review bottlenecks that lead to unhappy developers and used that knowledge to build features that help speed up the code review process without sacrificing review quality. We\u2019ve found a correlation between slow diff review times (P75) and engineer dissatisfaction. Our tools to surface diffs to the right reviewers at key moments in the code review lifecycle have significantly improved the diff review experience.<\/span><\/p>\n<h2><span>What makes a diff review feel slow?<\/span><\/h2>\n<p><span>To answer this question we started by looking at our data. We track a metric that we call \u201cTime In Review,\u201d which is a measure of how long a diff is waiting on review across all of its individual review cycles. We only account for the time when the diff is waiting on reviewer action.<\/span><\/p>\n<p>Time In Review is calculated as the sum of the time spent in blue sections.<\/p>\n<p><span>What we discovered surprised us. When we looked at the data in early 2021, our median (P50) hours in review for a diff was only a few hours, which we felt was pretty good. However, looking at P75 (i.e., the slowest 25 percent of reviews) we saw diff review time increase by as much as a day.\u00a0<\/span><\/p>\n<p><span>We analyzed the correlation between Time In Review and user satisfaction (as measured by a company-wide survey). The results were clear<\/span><span>: <\/span><span>The longer someone\u2019s slowest 25 percent of diffs take to review, the less satisfied they were by their code review process. We now had our north star metric: P75 Time In Review.\u00a0<\/span><\/p>\n<p><span>Driving down Time In Review would not only make people more satisfied with their code review process, it would also increase the productivity of every engineer at Meta. Driving down Time to Review for our diffs means our engineers are spending significantly less time on reviews \u2013 making them more productive and more satisfied with the overall review process.<\/span><\/p>\n<h2><span>Balancing speed with quality<\/span><\/h2>\n<p><span>However, simply optimizing for the speed of review could lead to negative side effects, like encouraging rubber-stamp reviewing. We needed a guardrail metric to protect against negative unintended consequences. We settled on \u201cEyeball Time\u201d \u2013 the total amount of time reviewers spent looking at a diff. An increase in rubber-stamping would lead to a decrease in Eyeball Time.<\/span><\/p>\n<p><span>Now we have established our goal metric, Time In Review, and our guardrail metric, Eyeball Time. What comes next?<\/span><\/p>\n<h2><span>Build, experiment, and iterate<\/span><\/h2>\n<p><span>Nearly every product team at Meta uses experimental and data-driven processes to release and iterate on features. However, this process is still very new to internal tools teams like ours. There are\u00a0 a number of challenges (sample size, randomization, network effect) that we\u2019ve had to overcome that product teams do not have. We address these challenges with new data foundations for running <\/span><a href=\"https:\/\/research.facebook.com\/blog\/2021\/08\/testing-product-changes-with-network-effects\/\" target=\"_blank\" rel=\"noopener\"><span>network experiments<\/span><\/a><span> and using techniques to reduce variance and increase sample size. This extra effort is worth it <\/span><span>\u2014 <\/span><span>by laying the foundation of an experiment, we can later prove the impact and the effectiveness of the features we\u2019re building.<\/span><\/p>\n<p>The experimental process: The selection of goal and guardrail metrics is driven by the hypothesis we hold for the feature. We built the foundations to easily choose different experiment units to randomize treatment, including randomization by user clusters.<\/p>\n<h2><span>Next reviewable diff<\/span><\/h2>\n<p><span>The inspiration for this feature came from an unlikely place \u2014 video streaming services. It\u2019s easy to binge watch shows on certain streaming services because of how seamless the transition is from one episode to another. What if we could do that for code reviews? By queueing up diffs we could encourage a diff review flow state, allowing reviewers to make the most of their time and mental energy.<\/span><\/p>\n\n<p><span>And so Next Reviewable Diff was born. We use machine learning to identify a diff that the current reviewer is highly likely to want\u00a0 to review. Then we surface that diff to the reviewer after they finish their current code review. We make it easy to cycle through possible next diffs and quickly remove themselves as a reviewer if a diff is not relevant to them.<\/span><\/p>\n<p><span>After its launch, we found that this feature resulted in a 17 percent overall increase in review actions per day (such as accepting a diff, commenting, etc.) and that engineers that use this flow perform 44 percent more review actions than the average reviewer!<\/span><\/p>\n<h2><span>Improving reviewer recommendations<\/span><\/h2>\n<p><span>The choice of reviewers that an author selects for a diff is very important. Diff authors want reviewers who are going to review their code well, quickly, and who are experts for the code their diff touches. Historically, Meta\u2019s reviewer recommender looked at a limited set of data to make recommendations, leading to problems with new files and staleness as engineers changed teams.<\/span><\/p>\n<p><span>We built a new reviewer recommendation system, incorporating work hours awareness and file ownership information. This allows reviewers that are available to review a diff and are more likely to be great reviewers to be prioritized. We rewrote the model that powers these recommendations to support backtesting and automatic retraining too.<\/span><\/p>\n\n<p><span>The result? A 1.5 percent increase in diffs reviewed within 24 hours and an increase in top three recommendation accuracy (how often the actual reviewer is one of the top three suggested) from below 60 percent to nearly 75 percent. As an added bonus, the new model was also 14 times faster (P90 latency)!<\/span><\/p>\n<h2><span>Stale Diff Nudgebot<\/span><\/h2>\n<p><span>We know that a small proportion of stale diffs can make engineers unhappy, even if their diffs are reviewed quickly otherwise.\u00a0 Slow reviews have other effects too <\/span><span>\u2014 <\/span><span>the code itself becomes stale, authors have to context switch, and overall productivity drops. To directly address this, we built Nudgebot, which was inspired by <\/span><a href=\"https:\/\/dl.acm.org\/doi\/abs\/10.1145\/3544791\" target=\"_blank\" rel=\"noopener\"><span>research done at Microsoft<\/span><\/a><span>.<\/span><\/p>\n<p><span>For diffs that were taking an extra long time to review, Nudgebot determines the subset of reviewers that are most likely to review the diff. Then it\u00a0 sends them a chat ping with the appropriate context for the diff along with a set of quick actions that allow recipients to jump right into reviewing.<\/span><\/p>\n<p><span>Our experiment with Nudgebot had great results. The average Time In Review for all diffs dropped 7 percent (adjusted to exclude weekends) and the proportion of diffs that waited longer than three days for review dropped 12 percent! The success of this feature was <\/span><a href=\"https:\/\/users.encs.concordia.ca\/~pcr\/paper\/NudgeBot2022FSE-preprint.pdf\" target=\"_blank\" rel=\"noopener\"><span>individually published<\/span><\/a><span> as well.<\/span><\/p>\n<p>This is what a chat notification about a set of stale diffs looks like to a reviewer, while showing one of the potential interactions of \u201cRemind Me Later.\u201d<\/p>\n<h2><span>What comes next?<\/span><\/h2>\n<p><span>Our current and future work is focused on questions like:<\/span><\/p>\n<p><span>What is the right set of people to be reviewing a given diff?<\/span><br \/>\n<span>How can we make it easier for reviewers to have the information they need to give a high quality review?<\/span><br \/>\n<span>How can we leverage AI and machine learning to improve the code review process?<\/span><\/p>\n<p><span>We\u2019re continually pursuing\u00a0 answers to these questions, and we\u2019re looking forward to finding more ways to streamline developer processes in the future!<\/span><\/p>\n<p><span>Are you interested in building the future of developer productivity? <\/span><a href=\"https:\/\/www.metacareers.com\/jobs\/\" target=\"_blank\" rel=\"noopener\"><span>Join us<\/span><\/a><span>!<\/span><\/p>\n<h2><span>Acknowledgements<\/span><\/h2>\n<p><span>We\u2019d like to thank the following people for their help and contributions to this post:\u00a0 Louise Huang, Seth Rogers, and James Saindon.<\/span><\/p>\n<p>The post <a href=\"https:\/\/engineering.fb.com\/2022\/11\/16\/culture\/meta-code-review-time-improving\/\">Move faster, wait less: Improving code review time at Meta<\/a> appeared first on <a href=\"https:\/\/engineering.fb.com\/\">Engineering at Meta<\/a>.<\/p>\n<p>Engineering at Meta<\/p>","protected":false},"excerpt":{"rendered":"<p>Code reviews are one of the most important parts of the software development process At Meta we\u2019ve recognized the need to make code reviews as fast as possible without sacrificing quality We\u2019re sharing several tools and steps we\u2019ve taken at Meta to reduce the time waiting for code reviews When done well, code reviews can&hellip; <a class=\"more-link\" href=\"https:\/\/fde.cat\/index.php\/2022\/11\/16\/move-faster-wait-less-improving-code-review-time-at-meta\/\">Continue reading <span class=\"screen-reader-text\">Move faster, wait less: Improving code review time at Meta<\/span><\/a><\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","footnotes":""},"categories":[7],"tags":[],"class_list":["post-652","post","type-post","status-publish","format-standard","hentry","category-technology","entry"],"jetpack_featured_media_url":"","jetpack-related-posts":[{"id":753,"url":"https:\/\/fde.cat\/index.php\/2023\/08\/29\/scheduling-jupyter-notebooks-at-meta\/","url_meta":{"origin":652,"position":0},"title":"Scheduling Jupyter Notebooks at Meta","date":"August 29, 2023","format":false,"excerpt":"At Meta, Bento is our internal Jupyter notebooks platform that is leveraged by many internal users. Notebooks are also being used widely for creating reports and workflows (for example, performing data ETL) that need to be repeated at certain intervals. Users with such notebooks would have to remember to manually\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":756,"url":"https:\/\/fde.cat\/index.php\/2023\/09\/05\/whats-it-like-to-write-code-at-meta\/","url_meta":{"origin":652,"position":1},"title":"What\u2019s it like to write code at Meta?","date":"September 5, 2023","format":false,"excerpt":"Ever wonder what it\u2019s like to write code at Meta\u2019s scale? On the latest episode of the Meta Tech Podcast, Meta engineer Pascal Hartig (@passy) sits down with Dustin Shahidehpour\u00a0and\u00a0Katherine Zak,\u00a0 two software engineers at Meta, about their careers and what it\u2019s really like to ship code at Meta. Why\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":551,"url":"https:\/\/fde.cat\/index.php\/2022\/03\/10\/code-verify-an-open-source-browser-extension-for-verifying-code-authenticity-on-the-web\/","url_meta":{"origin":652,"position":2},"title":"Code Verify: An open source browser extension for verifying code authenticity on the web","date":"March 10, 2022","format":false,"excerpt":"Since WhatsApp introduced multi-device capability last year, we\u2019ve seen an increase in people accessing WhatsApp directly through their web browser via WhatsApp Web. With this shift in mind, we\u2019ve been looking at ways to add additional layers of security to the WhatsApp Web experience. Starting today, you can now use\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":271,"url":"https:\/\/fde.cat\/index.php\/2021\/08\/31\/faster-more-efficient-systems-for-finding-and-fixing-regressions\/","url_meta":{"origin":652,"position":3},"title":"Faster, more efficient systems for finding and fixing regressions","date":"August 31, 2021","format":false,"excerpt":"Every workday, Facebook engineers commit thousands of diffs (which is a change consisting of one or more files) into production. This code velocity allows us to rapidly ship new features, deliver bug fixes and optimizations, and run experiments. However, a natural downside to moving quickly in any industry is the\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":610,"url":"https:\/\/fde.cat\/index.php\/2022\/07\/20\/how-meta-and-the-security-industry-collaborate-to-secure-the-internet\/","url_meta":{"origin":652,"position":4},"title":"How Meta and the security industry collaborate to secure the internet","date":"July 20, 2022","format":false,"excerpt":"Bug hunting is hard and can sometimes go unnoticed across our industry. Building scalable bug detection methods across large codebases and open source libraries is an underappreciated yet critical effort every engineering company has to work through. Because the ideal outcome is that bugs are found and fixed before they\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":777,"url":"https:\/\/fde.cat\/index.php\/2023\/10\/24\/automating-dead-code-cleanup\/","url_meta":{"origin":652,"position":5},"title":"Automating dead code cleanup","date":"October 24, 2023","format":false,"excerpt":"Meta\u2019s Systematic Code and Asset Removal Framework (SCARF) has a subsystem for identifying and removing dead code. SCARF combines static and dynamic analysis of programs to detect dead code from both a business and programming language perspective. SCARF automatically creates change requests that delete the dead code identified from the\u2026","rel":"","context":"In &quot;Technology&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/652","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/comments?post=652"}],"version-history":[{"count":0,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/posts\/652\/revisions"}],"wp:attachment":[{"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/media?parent=652"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/categories?post=652"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/fde.cat\/index.php\/wp-json\/wp\/v2\/tags?post=652"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}