Git Enchantment

History repeats itself, first as tragedy, second as farce.

Karl Marx

Several days ago, a few coworkers and I happened upon a perplexing problem. It can be stated as follows: "Given two Git repositories source and target, while preserving revision history, move the contents of a subdirectory a located in source to a subdirectory b located in target."

We ended up simply executing a mv to transfer the contents of a into b which, of course, purged all revision history. Out of curiousity, I researched if there was a lossless way to do what we wanted.

The solution was the Git subcommand filter-branch. This tool allows you to alter revision history by iterating over revisions and applying something called filters. A filter is an option provided to filter-branch which produces a side-effect on all revisions for a given repository branch.

Consider the command git filter-branch --subdirectory-filter foo. This steps through all revisions and sets the subdirectory foo as the new root of the Git repository.

For us, the subdirectory-filter option wasn't going to cut it. Instead we used an option which rewrites a revision by evaluating a user specified shell script, called --tree-filter.

Let's step through an example of utilizing --tree-filter to solve the problem we defined above. For our purposes, we assume the following:

  1. source and target are Git repositories in the same parent directory.
  2. source is on the master branch.

We begin by changing into source.

$ cd source

Next we run the filter-branch subcommand.

$ git filter-branch --tree-filter 'find . ! -name a -type d ! -name . -type d ! -name .. -type d -maxdepth 1 | xargs rm -rf; test -d a && mv a b || echo "Nothing to do"' --prune-empty HEAD 

Take a deep breath. Let's break apart this command so we can better digest it.

  • find . ! -name a -type d ! -name . -type d ! -name .. -type d -maxdepth 1 | xargs rm -rf;: Remove all files and directories except ., .., and a.
  • test -d a && mv a b || echo "Nothing to do": If the directory a exists move a to b. This prevents filter-branch from failing in the case of commits that don't contain the directory a.
  • --prune-empty: Ignore empty commits generated by filter-branch. In our case, this would be any commits that did not contain the directory a.

For other useful options refer to the appendix.

Now we change into the target Git repository:

$ cd ../target

We then add ../source as a new remote called source, fetch from this new remote, and merge in source/master.

$ git remote add source ../source
$ git fetch source
$ get merge source/master

There you have it. We have successfully migrated the contents of a subdirectory from one Git repository to a subdirectory in another Git repository. All the while, our revision history has remained intact.


Speeding up filter-branch

If you're dealing with a repository that has a large revision history a significant speed-up can be seen by using the -d option. This redirects the temporary directory used by filter-branch to what is specfied by the user. For example, on many Unix systems you can use -d tmpfs.