Code Review Practices

Sep 23, 2018

Earlier this month, a post on the Hacker News forum brought to my attention a paper published through Google about their code review practices. It'd be wrong to assume that Google's code review practices constitute the best practices for every team, but its discussion yields a lot of interesting ideas to consider when establishing code review processes for your specific team.

For those that do not know what code reviews are: code review is the process by which one or more team members manually read through changes made to a codebase and provides feedback to the author of the code changes (often asynchronously). In typical setups, at least one reviewer who is not the author must approve of the changes before the updates are committed to the codebase. In other setups, teams may provide commit-first, review-later privileges to specific authors who demonstrate strong coding ability, or in situations where prototyping is more important than developing production-ready code. These days, some tooling is used to assist with the code review process. Open Source projects may use pull request systems through Github, Gitlab, Bitbucket, or some other code hosting service. The tech giants (as I understand it) pre-date many of these code review tools and so have built their own in-house versions, but the premise is generally the same.

Code review itself is an incredibly useful tool. When working on a team of 2 or more developers, the code review process, in my opinion, is one of the most important processes in your workflow. It can be the difference between having clean, maintainable code for the foreseeable future versus code that is incredibly silo'd, placing sections of your project at incredible hit-by-a-bus risk. Without maintaining standards for your codebase, you risk allowing the code to deteriorate into an awful spaghetti soup of unreadable crap.

By the way, just to make it absolutely clear: While it would be an absolute dream to work for Google, I do not work for them, nor am I affiliated with them in any way. All of the following discussion about their processes is entirely based on the above linked paper.

The Purpose of Code Reviews at Google

The paper interviewed one of Google's earliest employees who was around at the time when they first instituted their code review process. He described the impetus for code review as the following:

"...the introduction of code review was to force developers to write code that other developers could understand; this...signaled the transition from a research codebase, which is optimized towards quick prototyping, to a production codebase, where it is critical to think about future engineers reading source code."

"Future engineers reading source code" includes everyone on your team, including yourself! That is, unless you have eidetic memory and can remember the exact context for every code change you made months ago. For open source projects, this can be even more important because if you're hoping that other developers will collaborate with you on your project, the likelihood of people coming to your aid will tank if they can't understand what the code is doing in the first place. This should be obvious for anyone who has had their code reviewed before: if the reviewer can't understand what your code is trying to achieve without you walking them through it, then the code must be re-written. The code should be self-documenting with descriptive variable/function names, and an easy to follow logic that avoid making code spaghetti.

One interesting omission from the Google employee's statement is that code reviews are not meant to catch the introduction of bugs or regressions in the codebase. While it's nice to catch bugs as part of the code review process, teams should rely on other tooling for that -- an automated testing suite, CI, pre-commit test hooks, etc.

I once worked on a team where the code review process was such that both the reviewer and the author were jointly responsible for ensuring that code updates did not cause any bugs. This meant that as a reviewer, for each pull request, you had to:

Pull down the proposed changes locally, then
Manually test that the code changes completed what it set out to do, and
Check the code did not introduce any new bugs or issues.

It sounded good at the time to do it this way and ensure quality and all that, but this slowed development to an absolute crawl. My team was relatively small, and one person could end up reviewing multiple pull requests every week on top of their regular workload. In hindsight, the authors should have focused on minimizing bugs and regressions as part of their development process. Reviewers should solely focus on readability and maintainability.

Size and Velocity of Reviews

According to the paper, code reviews occur extremely fast at Google.

In terms of speed, we find that developers have to wait for initial feedback on their change a median time of under an hour for small changes and about 5 hours for very large changes. The overall (all code sizes) median latency for the entire review process is under 4 hours.

Code reviews within an hour... That is really incredible, and probably completely unrealistic for many teams big or small. Google's culture and expectation for timely reviews are to blame for such speediness, but another very important contributing factor is the size of each individual review:

At Google, over 35% of the changes under consideration modify only a single file and about 90% modify fewer than 10 files. Over 10% of changes modify only a single line of code, and the median number of lines modified is 24.

Whether or not this is achievable on other teams likely depends on the maturity of the codebase itself. Early on when developing new features, it's easy to wind up with very large code reviews that implement a fully completed feature. I don't necessarily find this to be a problem: I prefer to review changes that have a cohesive story to tell me in a way that can give me the big picture about why and what changes are being proposed. It would be a mistake to chop down a code review into tiny bite-sized parts simply for the sake of simply making them smaller.

That being said -- do not take this to the extreme. Usually when you set out to work on a new feature, I find that you typically have a sense from the outset whether it will yield a gigantic review or a little one. Authors should seriously consider whether a feature can be broken up into sub-features that can be committed in parts. If you work on a small team, you should be mindful about your eventual reviewer's time: if you know that the code you are submitting for review is going to be incredibly complex, either give them a heads up early on or find ways to simplify it before submitting it. I find that doing this grants you better, more helpful, and more timely feedback.

Selecting Reviewers

Google's in-house commit/code-review system automatically selects the most relevant user(s) for reviewing any bit of code. In many open source projects, the reviews typically fall upon the sole maintainer or group of maintainers for a project. In professional environments, teams of small to medium sizes typically "know" who the code reviewers are: they are often the project lead or some subject matter expert for a specific part of the codebase.

For the most part, this works just fine. On larger teams, more structure could be needed. At Google, there is a system of ownership where certain employees have responsibility over a tree of files in Google's monolithic file structure. In addition to that, Google apparently enforces that all code reviews must be approved by at least one person who has been "readability certified" for a particular language. I have never worked on a team so large that this would be necessary, but a set of "readability" rules or a coding style guide should be decided upon for all team members to adhere to.

I also think that it is helpful to assign as reviewers junior members or members on the team that would benefit from seeing more of the codebase. One of the biggest benefits of code review is that it educates the team on what is happening in and around the project. For junior members, this can be invaluable if they take the review process seriously. You don't necessarily need to wait for their particular approval before committing the changes, but including them as one of multiple reviewers can pay off dividends for their development as programmers.

Summary

Set up tooling outside of the code review process (like automated tests and CI) to guard against bugs and regressions. Focus on readability and maintainability when reviewing code.
Keep your code changes to a manageable size. Be mindful of the complexity of the work you will soon be placing on your reviewers.
Reviewers, review code that you are assigned in a timely fashion -- do not wait until the end of the sprint to review everything all at once.
Include junior and other team members in the code review discussions when it's appropriate. There is a ton of opportunity for them to learn from the code review process if they take it seriously.

Other policies/ideas I want to mention:

No pushing or merging to master without at least one code reviewer who has approved of the pull request! If possible, Have whatever hosting platform you're using block users from direct push access.
Major design decisions should be discussed among the team or the project lead long before any code gets written. I think it's a failure on the team's part if a pull request is outright sent back to the drawing board.
Don't waste your reviewer's time by asking them to review code that isn't already passing your automated test suite beforehand. If you need help with figuring out how to approach a problem, that discussion is best before the code review even takes place. Code reviews should focus on readability and maintainability -- not the approach your implementation takes.