Cleaning Up Git History
A clean git commit history is often underrated and can be immensely useful to ease code reviews and understand changes in the future (potentially in the midst of an outage).
Of course we are talking about the final history here as committed to the shared repository, not the intermediate history while we are working on the code. Sometimes the intermediate history is good enough to be pushed directly, but this is actually fairly rare.
Cleaning up the history might seem tedious at first for marginal cosmetic benefits, but it gets much easier and faster with practice. Here I am collecting some tips for cleaning up a git commit history before publishing it to others, for example in the form of a pull request. Another good time to clean up the history is before merging, if we have added additional commits to address review comments. Don't merge commits that just exist to address comments, fix the original bad commits.
What Makes A Good Commit?
A clean commit history is one that is comprised only of good commits. But what makes a commit good? There are two parts to this, the content of the commit, and the commit description.
There are many standards for commit messages, but I personally use the OTP commit guidelines as the base. I then try to include any important technical and product consideration in the message, such as the motivation for the change, and maybe technical nuances such as alternative implementations considered and tradeoffs. See Dan Carley's amazing example.
The content of the commit is a bit more fuzzy, but here is what I am looking for:
Every commit should make sense by itself. There is a just right size for commits, where there is nothing to add and nothing to remove. We want to add/remove/change exactly one thing in each commit. If we are struggling to come up with a good commit summary, that is a hint that our commit is not well scoped. Ideally this means a reviewer can review our changes commit-by-commit. We try to tell a story in commits.
Every commit should be runnable, that is we should be able to git
checkout
any commit and get a functional code base. This means no
"WIP" commits or commit chains that only restore functionality at
the end. This is important so that we can revert or rollback to any
commit if things go sideways.
Techniques
Git is very flexible, and for many of the techniques described here there are faster ways of performing the same actions. I am describing multi-step methods which preserve several individual commits for longer, as it is much easier to combine two commits than to split a single commit. Specifically I tinker a lot with the history, combining several of these techniques here, before I do the final interactive rebase to resolve all changes. This way I can change my mind halfway through without any trouble undoing anything.
We are also using the long version of all flags for educational purposes. Refer to the man pages for the short versions.
We will be making heavy use of interactive rebasing, which can be triggered with
git rebase --interactive <ref>
Interactive rebase is incredibly powerful and can be used to perform many different operations. We can use it to avoid having to memorise many other commands that are specifically tailored to doing just one thing.
It is also usually used differently from regular rebasing, where we
move one or several commits onto a different base (hence rebasing).
When rebasing interactively we usually want to edit all commits
between HEAD
and <ref>
instead. Thus we need to chose <ref>
as
the other end of the range we want to edit.
When you start an interactive rebase, a special file will be opened in
$EDITOR
with all commits in the rebase range and actions for them.
Git will insert a guide below the list describing the available
actions. Once we are happy with the actions, we confirm the plan and
the actions resolve.
Think board games or card games.
Rewriting Commit Messages
While I encourage the writing of the full commit message directly at commit time, sometimes we can think of additions or changes to earlier commit messages. To change just the commit message of an existing commit, we can use:
git commit --fixup:reword=<ref>
These commands create additional "fixup commits" which are appended to the end of our history. Each of them is linked to another commit via its title. To actually combine the two (or more), we use:
git rebase --interactive --autosquash <ref>
This should setup the interactive rebase to do the right thing, so we just confirm the actions proposed by saving and closing the file.
Adding to Existing Commits
Let us assume we have written a commit, and then notice a small change necessary that falls within the scope of this commit. Instead of writing a new commit, we want to just fix up the existing commit. This is done simply by running:
git commit --fixup=HEAD
If we notice our error only after adding additional commits, and want to retroactively add a change to an earlier commit, this command becomes:
git commit --fixup=<ref>
# Could be a SHA, or also e.g. HEAD~2
Just as before, this creates fixup commits that can be combined with
an interactive rebase and the --autosquash
flag.
Combining Commits
If we have two already existing commits in our history that we would
like to combine into a single one, we can use an interactive rebase,
move the second commit after the first one if required, and select the
squash
option.
When resolving the rebase, git will stop at this point and open up a combined commit message for us. We definitely want to edit this one instead of sticking with the default concatenation of the two original messages.
Reordering Commits
If we want to change the order of commits, you have guessed it, we can do this with an interactive rebase and just changing the order of the commits in the list.
Splitting A Commit
If a commit turns out to contain several independent changes, we can
opt to split the commit to isolate those changes. Again we use an
interactive rebase, but this time we select the edit
action for the
commit we want to split. This will cause the rebase to pause when we
reach this commit. At this point we want to use
git reset HEAD
and then start creating the new commits. We can use
git add --patch
to incrementally add sections of our files. This works best when editing existing files, as the patch interface is a bit lacking, but it works. Emacs users out there will probably be familiar with the vastly superior Magit interface, which allows easy staging of individual lines. We can then use
git commit
to create the new commits. Once we are done creating new commits replacing the old one, we can use
git rebase --continue
to finish the rebase.
Removing A Commit
If we find ourselves with a commit that we decide we just do not need anymore, we can simply remove it from the history. This could be a change that we made but later realised we did not need after all, or something like the GitHub "update branch" merge commit noise.
The easiest way to do this is just using an interactive rebase and
selecting the drop
action for the commit in question.
Further reading
The git man pages are split by command, so for documentation on git
rebase
, check man git-rebase
. The man pages are good, but more of a
reference than a usage guide.
Oh Shit, Git!?! has a memorable domain and provides some information for how to perform self-rescue after a lot of git accidents.
As a parting note, when rewriting history and pushing frequently
--force-with-lease
is your friend.