26 Nov 2011

The Coolest Merge EVER - My take

Linus Torvalds wrote about "The coolest merge EVER!" back in June 2005 ((http://www.gelato.unsw.edu.au/archives/git/0506/5511.html)) now I needed to do something similar a while back.

I had two git repositories that I use for database backup, the two repositories contains backup from a different mysql server in a master-slave setup and therefor contains mostly the same data. This was a problem because they had grown to a size which is a pain to sync to other servers.
So I decided to join the two repositories into one with full history. Unfortunately simply doing what Torvalds had done wouldn't work because the files in the two repositories are in the same path and with the same file names, so I had rewrite the commits so that the files were moved to an appropriate sub-directory.

For such a task git has a tool called filter-branch, which can rewrite the history according to rules I've made for it.
This is what I did:

      First clone the repositories
      git clone path-to-first-repository database_server1
      git clone path-to-second-repository database_server2
      Now that you have a clone of both repositories enter the first of them and rewrite it's history so all the files are in a subdirectory. You should also alter some of the files with sed or other cli utils.

      cd database_server1
      git filter-branch -f --prune-empty --tree-filter 'mkdir -p <subdirname>; find -mindepth 1 -maxdepth 1 -type f -exec mv '{}' <subdirname>/ \;;'

      Rewriting can take a very long time and requires a lot of CPU and memory, after you've done it on the first repository move on to the other repository and repeat the process by moving the files [bash]
      to different directory.
      cd ../database_server2
      git filter-branch -f --prune-empty --tree-filter 'mkdir -p <subdirname>; find -mindepth 1 -maxdepth 1 -type f -exec mv '{}' <subdirname>/ \;;'
      Now that you have rewritten the history of both repositories make a clone of the first one, so you don't have to start from scratch if something goes wrong.
      cd ..
      git clone database_server1 database_server1_merge
      cd database_server1_merge
      And now for "The Coolest Merge EVER", I've taken the liberty to fix the commands so they work with a resent version of git.
      git fetch ../database_server1
      GIT_INDEX_FILE=.git/tmp-index git read-tree FETCH_HEAD
      GIT_INDEX_FILE=.git/tmp-index git checkout-index -a -u
      git update-index --add -- (GIT_INDEX_FILE=.git/tmp-index git ls-files)
      cp .git/FETCH_HEAD .git/MERGE_HEAD
      git commit
      git gc --prune=now # Clean out old objects if needed.

That's it, now I have a single git repository with the full history of the two original repositories