All you need about code review
When you join a company as a software engineer or data engineer/ scientist part of daily work becomes code review and submission. There are multiple points to consider while reviewing code and it is better to spend additional time in order to deliver clean code. Important to have main points in your mind and follow and checkmark before pushing the code.
1. Know what you need to review
Because getting PRs merged quickly is so important you need to stay on top of the changes people want your input on. You can use tools like Trailer.app or even simply searches in GitHub using
"involves:<YOUR USERNAME>" to find PRs relevant to you.
Tolerate being interrupt driven. You need focussed time to your other work done but PRs are time sensitive because they block other people. You shouldn’t put off code review for more than a few hours. Never more than a day.
Should know which code review is highly important
It’s one of the most important things you can work on. The only things that should come ahead of it are time critical work such as fixing availability problems or dealing with high priority requests from customers. This means you should expect to spend some time every day doing reviews and that you will probably need to spend 2–3 sessions per day replying to PRs and reading other people’s code.
Reviewing code is hard and error prone. It is our last line of defense against downtime and tech debt. You must pay attention, ask questions and not +1 lightly. Sometimes you will slip and miss something or only notice late in the review process.
Don’t block progress
While code reviews need to be done thoughtfully and thoroughly we also need to avoid blocking progress. This means you should help people with solutions, not only identify problems. Often it might be faster to go and pair with someone for a bit to get their code tidied up instead of going back and forth on GitHub about it. Be aware that some test suites takes some time to run and it may make more sense for some fixes to be made in subsequent PRs instead of being added to this one.
Things to look for while reviewing the code
Things should be named well and should be easy to follow when reading. The code should attempt to be self documenting.
There must be unit tests. They should test the edge cases. The code should behave as the submitter described. The code should use other APIs correctly.
The design should not introduce any security problems such as potential denial of service attacks or unintended information disclosures. In particular we should be aware of potential CSRF and XSS attacks.
The code should perform within our targets for a particular area. It should not use obviously suboptimal algorithms. However optimisation is usually best left to later. Except when it can also improve other areas at the same time. Simpler code is often faster.
Questions to ask yourself when conducting a code review
- Code duplication: I go by the “three strikes” rule. If code is copied once, it’s usually okay although I don’t like it. If it’s copied again, it should be refactored so that the common functionality is split out.
- Squint-test offenses: If I squint my eyes, does the shape of this code look identical to other shapes? Are there patterns that might indicate other problems in the code’s structure?
- Code left in a better state than found: If I’m changing an area of the code that’s messy, it’s tempting to add in a few lines and leave. I recommend going one step further and leaving the code nicer than it was found.
- Potential bugs: Are there off-by-one errors? Will the loops terminate in the way we expect? Will they terminate at all?
- Error handling: Are errors handled gracefully and explicitly where necessary? Have custom errors been added? If so, are they useful?
- Efficiency: If there’s an algorithm in the code, is it using an efficient implementation? For example, iterating over a list of keys in a dictionary is an inefficient way to locate a desired value.
- Method names: Naming things is one of the hard problems in computer science. If a method is named
get_message_queue_nameand it is actually doing something completely different like sanitizing HTML from the input, then that’s an inaccurate method name. And probably a misleading function.
- Variable names:
barare probably not useful names for data structures.
eis similarly not useful when compared to
exception. Be as verbose as you need (depending on the language). Expressive variable names make it easier to understand code when we have to revisit it later.
- Function length: My rule of thumb is that a function should be less than 20 or so lines. If I see a method above 50, I feel it’s best that it be cut into smaller pieces.
- Class length: I think classes should be under about 300 lines total and ideally less than 100. It’s likely that large classes can be split into separate objects, which makes it easier to understand what the class is supposed to do.
- File length: For Python files, I think around 1000 lines of code is about the most we should have in one file. Anything above that is a good sign that it should be split into smaller, more focused files. As the size of a file goes up, discoverability goes down.
- Docstrings: For complex methods or those with longer lists of arguments, is there a docstring explaining what each of the arguments does, if it’s not obvious?
- Commented code: Good idea to remove any commented out lines.
- Number of method arguments: For the methods and functions, do they have 3 or fewer arguments? Greater than 3 is probably a sign that it could be grouped in a different way.
- Readability: Is the code easy to understand? Do I have to pause frequently during the review to decipher it?
- Test coverage: I like to see tests for new features. Are the tests thoughtful? Do they cover the failure conditions? Are they easy to read? How fragile are they? How big are the tests? Are they slow?
- Testing at the right level: When I review tests I’m also making sure that we’re testing them at the right level. In other words, are we as low a level as we need to be to check the expected functionality? Gary Bernhardt recommends a ratio of 95% unit tests, 5% integration tests. I find that particularly with Django projects, it’s easy to test at a high level by accident and create a slow test suite so it’s important to be vigilant.
- Number of Mocks: Mocking is great. Mocking everything is not great. I use a rule of thumb where if there’s more than 3 mocks in a test, it should be revisited. Either the test is testing too broadly or the function is too large. Maybe it doesn’t need to be tested at a unit test level and would suffice as an integration test. Either way, it’s something to discuss.
You should read over your own code, make sure you didn’t omit any files, check for style errors and so on. You should do this before you create the PR.
Keep your changes small
Small changes are much easier to review. They’re also much easier to explain to people. This means that mistakes are less likely to sneak through and the discussion of the change should be much more productive. It also means your PR should avoid ending up with tens of comments that go round and round in circles without ever getting your code merged. It is not always possible to break a change up into small parts but it very often is. A good guideline for this is that your change should not effect a combined total of more than 400 lines.
Write good descriptions
The title of the PR should include the reference for any tickets covered by this change. You may also want to copy the description from the ticket into your description. The description for your PR should tell the story of your change in a clear and concise way. It should explain both why you are making this change and what you changed. Not all reviewers will have a deep knowledge of that part of the system so you must make it possible for them to provide useful insight. This means explaining what the existing behaviour was, how you changed it, and why this was thought to be useful.
Write good commit messages
Your commits should be logically separate changes. This means that you should aim to separate things like lint cleanups from actual behaviour changes. They should have good commit messages that explain what was changed at each step.
If you’re working on a component or a service that others are not familiar with then offer to do an in-person pair review with someone. This guarantees you’ll get eyes on your changes and get your code through the door quicker. It’s also a great way for knowledge sharing and teaching people new techniques.
Hope you find this article helpful and please reach out @firstname.lastname@example.org.