Reverting Commits in Client Apps

Posted by Nick Entin on December 9, 2020

When faced with a critical bug in a client app that’s already shipped to customers, one of the key first steps in the debugging process is finding the earliest commit in which the bug is present. Tools like git bisect can make finding this commit much easier. But what do you do once you’ve found the offending change?

The obvious answer is to revert the commit, verify that the bug is resolved, and ship a new version of the app. Sometimes this is all it takes - you found the commit that introduced the bug and removing it took the bug away. But sometimes reverting a commit can cause even bigger problems. Rolling back code without understanding why it fixes the problem can be dangerous.

Discovering that the reversion resolved the issue is only one piece of understanding the effects of reverting that commit. There are multiple potential results when you roll back a commit, such as:

It fixes the bug you were seeing and changes nothing else. This is the ideal result. In practice, I’ve found that this rarely happens, at least organically, although you may see it around things like refactors.
It fixes the bug you were seeing, but also reverts other changes that you would like to have kept. This is still an acceptable trade off in many cases, since the urgency of fixing the bug takes precedence. For example, we recently found a crash that was introduced by a purely cosmetic change to a screen. Reverting this was an easy decision, since the negative effects of a crash greatly outweighed the benefits of those minor visual changes. When it involves more functional differences, the decision is a bit tougher, but the same trade offs can be weighed.
It looks like it fixes the bug, but actually it moves the issue to somewhere else. Maybe that issue is still mostly mitigated and it’s an improvement. Or maybe it’s now somewhere where it’s not easily detectable, but will still impact your customers. Either way, you’ve now lost time on pushing a new release that hasn’t really resolved the underlying issue.
It fixes the bug you were seeing, but it causes a different (and maybe even significantly worse) bug somewhere else.

That last case is obviously quite bad. But how can this happen? After all, you’re reverting back to code that you’ve already shipped to your customers in the past, right?

Unfortunately, that’s not entirely true. Yes, the specific lines of code you’re reverting to have shipped before, but in many cases other parts of the app it interacts with will have changed since then. Identical code is only guaranteed to behave the same if all of the code around it and the data it processes is also the same.

There’s an important distinction here that comes up a lot when talking with our friends on the server side of things. Client rollbacks typically involve reverting a commit, while server rollbacks typically involve reverting to a commit. Reverting to a commit allows the project to be redeployed with all of the code in the same state it was when that (hopefully stable) commit was originally shipped. There’s still a danger of the data being different, but this makes the risk analysis much easier.

So what questions can you ask to determine whether it’s safe to revert a commit? Here’s some to get started:

How recent is the commit? Is it the last commit in the project before shipping the app? Probably safe to revert. Is it a commit from a few months ago that’s had a lot of commits on top of it since then? Maybe not so safe - other code built on top of it may be depending on that new behavior.
What part of the app does the code affect? Is the code entirely in a top level view controller or some other object that’s very self-contained? Probably safe to revert. Are there changes in a more infrastructural layer of the app that lots of other code calls into? Maybe not so safe - code in other parts of the app may be depending on that new behavior.
Does the code affect data stored on device? Does the change affect only the presentation of the data and not the data itself? Probably safe to revert. Does it affect the data stored on device (e.g. Core Data or keychain migrations)? Maybe not so safe - the customer’s data will now be in a new state that your old code may not handle properly.
Does the code interact with a server (even indirectly)? Was the change made in coordination with a change on the server side? That server change probably expects the new client behavior. Was the change previously gated behind a feature flag (or combination of flags)? The state of those flags may have changed since this code last shipped.

This is not meant to be a complete checklist, but it illustrates some of the ways that reverting a commit can have unintended consequences. When faced with a crash and a potential commit to revert, your job is to perform the risk analysis to determine how likely you are to hit any of those consequences. When I’m unsure of the risk to revert, I find it best to think about a revert commit as if it were a newly written fix. Understanding the change you’re reverting is just as important as understanding new code you write.

With the time pressure of a crash in the wild, it can feel like a waste of time to do this extra analysis before shipping a potential fix. There’s a saying that I think applies really well to this situation though: slow is smooth, smooth is fast. Iterating on mobile releases can be a somewhat slow process given the delays caused by the app review process. Taking the time to increase confidence in your first fix can make the entire process faster. If your team has the resources available, it can be helpful to prepare a new version and submit it for review, but wait to start rolling out to customers until you’re confident in the fix.

I realize this probably paints a fairly bleak picture of the debugging process. Unfortunately there is no quick and easy answer for what to do when you have a serious bug out in the wild that will work for all situations. Reverting is a great solution when it works, but needs to be considered based on the current situation. The good news is that there are steps you can take to increase the likelihood of being able to safely revert changes in the future. Things like:

Keep changes small and contained. In general, smaller, more targeted changes are easier to understand, and therefore make it easier to understand the impact of reverting them.
Document why changes are being made. It’s important to be able to quickly gather context around the change in question in order to understand what reverting it will do. My recommendation is to document this context in commit messages, since this is the first thing you’ll see while tracking down the bugs, but linking out to any form of documentation is great.
When working on a risky change, plan ahead for a potential rollback. Hope for the best, plan for the worst. Spending a bit of time earlier in the process to think through how your changes could be undone can save time and stress later when there’s pressure to get a bug fixed ASAP. For example, you might be able to break out a small commit at the end of landing a new feature that can quickly be reverted to disable that feature.

Reverting an old commit is a great tool in the bug-fighting toolbox, but it isn’t always the right tool for the job. It’s important to understand when it’s safe to revert a commit, so you can keep the option available while avoiding a potential disaster.

← →