Keeping history clean with git rebase


Not too long ago I was present at the MariaDB Developer Meetup in Amsterdam, where I gave a presentation on code health. As a follow up, I got asked by quite a few people why and when we should prefer rebasing to merging and also how we should use it.

What followed was a hands on demonstration. The feedback was quite possitive and I figured a more thorough article would be appropriate. Lets begin!

What is a linear history and why do we care?

When working in a distributed environment, on a large project, you always have multiple developers working on the same codebase. A very common situation is when you are happily hacking away at your feature, you make some commits in your development branch and, in the meantime, somebody pushes some other commits to the master branch. Before you can push your changes, you have to merge with the new development branch.

Here's how things look before you do anything:

(master)]$ git log --oneline # This is our current master branch.
ffc9373 Add license message
f73b14d Initial commit

(master)]$ git checkout feature_branch # Going to our feature branch
Switched to branch 'feature_branch'

(feature_branch)]$ git log --oneline # This is our current feature branch
95a9188 Make sure to return 0 on success
e769f21 Print hello world!
f73b14d Initial commit

And here's a more graphical representation of the history.

Master Branch and Feature Branch Commits Graph

So what happens when you merge your changes into the development branch using git merge? Git will create an extra commit that represents the merging of the 2 branches. Merge commits are special, as they have 2 parents instead of 1. Merge commits also contain any conflicts that were resolved during the merge.

Let's do it the direct way first and merge our feature branch.

(feature_branch)]$ git checkout master
Switched to branch 'master'

(master)]$ git merge feature_branch
Auto-merging test.cc
CONFLICT (content): Merge conflict in test.cc
Automatic merge failed; fix conflicts and then commit the result.

Allright, so the merge can not be done directly. We now have to solve the merge conflicts. I will not go into how to fix conflicts in individual files in this post. What is important is that fixing conflicts here means resolving all the places where the feature does not apply cleanly to the master branch. In this case, it includes conflicts from both the e769f21 Print hello world! commit, as well as the 95a9188 Make sure to return 0 on success commit.

(master *+|MERGING)]$ vim test.cc      # Here we have fixed the conflicts in the test.cc file.
(master *+|MERGING)]$ git add test.cc # Now add the final version of the test file.
(master +|MERGING)]$ git commit # And we commit the changes.
[master a162203] Merge branch 'feature_branch'

(master)]$ git log --oneline --graph
* cd0dff4 Merge branch 'feature_branch'
|\
| * 95a9188 Make sure to return 0 on success
| * e769f21 Print hello world!
* | ffc9373 Add license message
|/
* f73b14d Initial commit

I don't know about you, but I find this commit history overly complicated. What's worse is that the merge commit contains all the changes from feature branch. It also gets more complicated if there are even more commits involved.

After Merge Graph

This sort of commit history is, often times, hard to analyze. This is especially true if, during development, the code on the master branch has significantly diverged from the code on the feature branch. In this case, the merge commit will contain a bunch of changes, just to resolve conflicts. What's worse is that these changes have to be made due to some changes in the initial commits of the feature branch. Basically, the merge commit points to all sort of places in the feature branch. This makes it incredibly hard to figure out and understand! Automation tools such git bisect.will also get stumped by this merge commit. Instead of finding the exact change set that introduced a bug, you'll be stuck with a huge merge diff between the development branch and the master branch. Not a pleasant situation to be in.

There must be a better way!

Pretending you work in a vacuum, with git rebase

If we were all single developers, each working on a separate project at any one time, we would never have this problem. All our new feature commits would always start off at the begining of the master branch and all our changes would apply cleanly each time. Turns out we can make git believe that's what we're doing, by rebasing our changes. This rebasing thing might seem arcane at first, so let's demistify it.

Let's go back to our initial example, where we've commited things in our development branch.So far we have 2 extra commits in our feature branch that the master branch does not have and one extra commit in our master branch that the feature branch does not have. To avoid the extra merge commit, we need to get the feature branch up to speed with the master branch. To do this, we have to rebase our feature branch on the current master branch. This means, we make our feature branch start from the current tip of the master branch, instead of some previous commit.

So let's have a look at what we need to do with git commands.

(master)]$ git reset --hard HEAD~1     # First, we will undo our previous merge.
(master)]$ git log --oneline # Double check to see what the master branch has now.
ffc9373 Add license message
f73b14d Initial commit

(master)]$ git checkout feature_branch # Let's go back to our feature branch to rebase it.
Switched to branch 'feature_branch'

(feature_branch)]$ git rebase master # Now lets rebase our branch on the current master.
First, rewinding head to replay your work on top of it...
Applying: Print hello world!
Using index info to reconstruct a base tree...
M test.cc
Falling back to patching base and 3-way merge...
Auto-merging test.cc
CONFLICT (content): Merge conflict in test.cc
error: Failed to merge in the changes.
Patch failed at 0001 Print hello world!
The copy of the patch that failed is found in: .git/rebase-apply/patch

When you have resolved this problem, run "git rebase --continue".
If you prefer to skip this patch, run "git rebase --skip" instead.
To check out the original branch and stop rebasing, run "git rebase --abort".

What git is actually doing now is it's taking each commit from the feature branch as a patch and it tries to apply it to the new tip of the master branch. If the patch applies cleanly, it will move on to the next commit. If it doesn't we get what we have now, a merge conflict. The difference now is that we have to fix the conflicts only for the first commit from the feature branch, not all of them at the same time. This is the key to how rebase works. Each commit from your feature branch is applied sequentially to the tip of the master branch.

(feature_branch *+|REBASE 1/2)]$ vim test.cc   # Now we solve the conflicts introduced by the
# first commit from the feature branch.

(feature_branch *+|REBASE 1/2)]$ git add test.cc # Add the changed file.
(feature_branch +|REBASE 1/2)]$ git rebase --continue # And finally continue
Applying: Print hello world!
Applying: Make sure to return 0 on success
Using index info to reconstruct a base tree...
M test.cc
Falling back to patching base and 3-way merge...
Auto-merging test.cc

(feature_branch)]$ git log --oneline
cf6d909 Make sure to return 0 on success
f3cc581 Print hello world!
ffc9373 Add license message
f73b14d Initial commit

Finally, rebase complete! Luckly for us, the second commit from the feature branch applied cleanly after we've resolved the conflicts for the first commit. Sometimes this doesn't work and we end up having to resolve some more conflicts again. The good thing is that all this conflict resolution happens one commit at a time. You no longer have to fix all the conflicts in one big commit.

Now for the final step, merging with master.

(feature_branch)]$ git checkout master
Switched to branch 'master'

(master)]$ git merge feature_branch
Updating ffc9373..cf6d909
Fast-forward
test.cc | 5 +++++
1 file changed, 5 insertions(+)

(master)]$ git log --oneline --graph
* cf6d909 Make sure to return 0 on success
* f3cc581 Print hello world!
* ffc9373 Add license message
* f73b14d Initial commit

And we're done! Git has identified that our feature branch is just an extension of our master branch so it performs a fast-forward of the master branch instead of creating an extra merge commit. (Note that this behaviour can be changed and you can always force a merge commit, but we're not interested in that right now). Our commit history is now entirely linear. All our feature commits are added on top of the master branch and are nicely grouped together.

Commit history after rebase

Pretending you're perfect

We all make mistakes sometimes. Perhaps you tried implementing a feature, only to find a better way to do it later. You already have some work in progress commits in your feature branch, but don't want to have it visible in your history. Perhaps you'd also like to rearrange some commits and get rid of some completely. There are multiple ways you could do this, but the most painless, by far, has to be git rebase -i(the -i stands for --interactive). Let's have a look at a practical example:

(feature_branch)]$ git log --oneline
c58745e Forgot to initialize variables, add some logic
575e472 Add comments to the main function
e749be3 More WIP
38c17ff WIP
cf6d909 Make sure to return 0 on success
f3cc581 Print hello world!
ffc9373 Add license message
f73b14d Initial commit

So here we've done more work on our feature branch. We have two work-in-progress commits. Afterwards we remembered to add some comments to an unrelated function. Later we realised that some variables are uninitialized and finally implemented the logic for our feature. Now we don't want to show everybody our work in progress code. Perhaps it even contains some weird debugging statements. (I would often print WOLOLO statements everywhere :) )

The way git rebase -i helps is that it gives you an editor prompt simillar to the commit one. Here we can do all sort of things with our commits, such as reorder them, specify which commit messages we'd like to change, squash multiple commits together etc. This makes rebasing in interactive mode very useful for cleaning up. Let's do some cleaning up with our current work tree!

(feature_branch)]$ git rebase -i cf6d909

Here we specified the commit hash for "Make sure to return 0 on success". We can now begin editting all our comits after that one. When you run this command, an editor prompt pops up with the following information:

pick 38c17ff WIP
pick e749be3 More WIP
pick 575e472 Add comments to the main function
pick c58745e Forgot to initialize variables, add some logic

# Rebase cf6d909..c58745e onto cf6d909 (4 commands)
#
# Commands:
# p, pick = use commit
# r, reword = use commit, but edit the commit message
# e, edit = use commit, but stop for amending
# s, squash = use commit, but meld into previous commit
# f, fixup = like "squash", but discard this commit's log message
# x, exec = run command (the rest of the line) using shell
# d, drop = remove commit
#
# These lines can be re-ordered; they are executed from top to bottom.
#
# If you remove a line here THAT COMMIT WILL BE LOST.
#
# However, if you remove everything, the rebase will be aborted.
#
# Note that empty commits are commented out

The older commits are at the top and the newer ones are at the bottom. Below that, we have a guide showing us various things we can do with our commits. The things we want to do are:

  1. Merge all our WIP commits into the final logic.
  2. Have the comments commit come first, as it's unrelated to the other ones.
  3. Reword our final logic commit message to explain things clearly.

We can do all this if we rewrite the lines like so:

pick 575e472 Add comments to the main function
reword 38c17ff WIP
f e749be3 More WIP
f c58745e Forgot to initialize variables, add some logic

The first commit that's going to be applied is 575e472 Add comments to the main function then we're going to change the commit message for 38c17ff WIP and finally the last two commits e749be3 c58745e are going to be merged into the WIP commit.

After exiting this prompt, git will start working. When it reaches the reword commit, we're going to get our commit message prompt that we can reword.

WIP

# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
#
# Date: Sun Oct 16 10:01:10 2016 +0200
#
# interactive rebase in progress; onto cf6d909
# Last commands done (2 commands done):
# pick 575e472 Add comments to the main function
# reword 38c17ff WIP
# Next commands to do (2 remaining commands):
# f e749be3 More WIP
# f c58745e Forgot to initialize variables, add some logic
# You are currently editing a commit while rebasing branch 'feature_branch' on 'cf6d909'.
#
# Changes to be committed:
# modified: test.cc

After exiting that one, it will continue merging the final 2 commits. Let's see how it all looks once it's done.

(feature_branch)]$ git rebase -i cf6d9099f1
[detached HEAD fa8eb59] Implement final addition business logic
Date: Sun Oct 16 10:01:10 2016 +0200
1 file changed, 2 insertions(+)
[detached HEAD ecde2f7] Implement final addition business logic
Date: Sun Oct 16 10:01:10 2016 +0200
1 file changed, 6 insertions(+), 1 deletion(-)
Successfully rebased and updated refs/heads/feature_branch.

(feature_branch)]$ git log --oneline
ecde2f7 Implement final addition business logic
a0e1436 Add comments to the main function
cf6d909 Make sure to return 0 on success
f3cc581 Print hello world!
ffc9373 Add license message
f73b14d Initial commit

And we're done! We've cleaned up after ourselves and as long as none of our previous work was pushed into production, none would be the wiser. This way, we can work in our own way and finally prep everything for release at the end.

Wrapping it up

So now we've seen how rebasing helps us keep our commits tidy. git rebase is a really powerful tool in this sense. However, as with all rewriting of history cases, make sure to never do it on public branches. Only use it on your own development branches.

This article has mostly focused on how and where it makes sense to use rebasing. There are a couple of use cases when this is not possible. At MariaDB we maintain a number of active public branches, one for each major release of the server. This means that we can not make use of rebasing when merging features between these branches. This is unfortunate but when branches are public, rewriting history is a big no.

With that said, I hope you've found this practical intro into git rebasing useful. Let me know if you find anything that is wrong or could be improved!

Comments