Git Logo by Jason Long is licensed under the CC BY 3.0 License

Migrate from Subversion to git

Notes published the
7 - 9 minutes to read, 1773 words
Categories: version control systems
Keywords: git svn version control systems

pre-init

Useful tools

Well, of course, git, git-svn, and access to the Subversion repository.

This type of operation is generally much slower on a Windows System (at least one order of magnitude, of course, it also depends on the environment), so while in GNU/Linux a conversion might take a whole day, on Windows it might take more than a week.

To convert from Subversion to git, one needs to check out every single commit. Thus having a fast connection to the Subversion server is a big plus.

Search relevant information

Before converting a Subversion repository to git, one needs to do a little homework to get the best result.

  • find all branches

  • find all tags

  • list all authors

As branches and tags are, in Subversion, simply folders, they can be located anywhere, so it might not be that easy to find them all.

Listing authors is luckily easier, as they can be extracted from the log

svn log --xml | grep -P "^<author" | sort --unique | sed 's|<author>\(.*\)</author>|\1 = |g' > authors.txt;

Notice that this will show only the authors that appeared in the current branch, thus some might be missing. As long as we have the majority of authors this won’t be a big issue, as git will complain during the conversion and name those that are not on authors.txt (albeit one by one).

After collecting all names, one needs to edit authors.txt, it should look like

<svn name> = <git name>

Where <git name> normally is the name of the author, followed by its email address in brackets, like John Doe <john.doe@example.com>. While this step is completely optional, and one could do the transition without mapping the old names to the new names, it’s an operation that does not take much more time and makes inspecting the history easier.

git init and git clone

If branches are located at conventional locations (trunk, branches, tags), then it is as easy as

git svn clone https://example.com/project <out-dir> --prefix=svn/ --authors-file <file with authors>;

But if branches or tags are located at different locations, then one can specify those with --branches and --tags

git svn clone https://example.com/project <out-dir> --prefix=svn/ --trunk <trunk dir> --branches <branch dir> --tags <tag dir> --authors-file <file with authors>;

And if there are multiple locations where branches and tags are located, it is still not an issue

git svn clone https://example.com/project <out-dir> --prefix=svn/ --trunk <trunk dir> --branches <branch dir1> --branches <branch dir2> --tags <tag dir1> --tags <tag dir2> --authors-file <file with authors>;

unless there are multiple branches and/or tags with the same name.

In that case, it is still possible to use git-svn, but one has to adapt the config file manually.

git svn init https://example.com/project <out-dir> --prefix=svn/ --trunk <trunk dir> --branches <branch dir1> --branches <branch dir2> --tags <tag dir1> --tags <tag dir2>
cd <out-dir>;
vim .git/config; # edit branches and tags to avoid collisions
git svn fetch;

After editing it, the .git/config file should look similar to

[core]
        repositoryformatversion = 0
        filemode = true
        bare = false
        logallrefupdates = true

[svn-remote "svn"]
        url = https://example.com/my-project
        fetch = base/branches/my-project:refs/remotes/svn/trunk

        branches = base/branches/my-project/my-branches1/*:refs/remotes/svn/my-branches1/*
        branches = base/branches/my-project/my-branches1.1/*:refs/remotes/svn/my-branches1/*
        branches = base/branches/my-project/my-branches2/*:refs/remotes/svn/my-branches2/*

    tags = base/tags/my-project/my-tags1/*:refs/remotes/svn/tags/my-tags1/*
        tags = base/tags/my-project/my-tags2/*:refs/remotes/svn/tags/my-tags2/*

[svn]
        authorsfile = authors.txt

Notice that git svn init does not accept --authors-file as a parameter, so you need to add it manually in the .git/config file.

Note that there are multiple branches and tags values, but the "output" can be set at different locations. This way, there will be no collisions, just like there were none in Subversion.

Notice: Avoid using --no-metadata, it causes more harm than good.

First of all, in case of an error, any error, even something like a network hiccup with --no-metadata you might need to clone the whole repository from 0 again. Also, even if the transition to git is flawless, it might still be useful to search for something in the old repository. With --no-metadata, any reference to the Subversion repository is lost, without it, in the commit message there will be a comment referencing the svn commit. Even if it might look ugly, the simple information that a specific commit has been made in svn and not git might be relevant, as changing repository technology is a big step.

Third, it might be possible to remove all those comments from the commit messages afterward with a rebase, after the conversion is finished.

The long clone/fetch

As mentioned before, cloning the repository can take a lot of time. Time increases exponentially if there are a lot of branches and tags. You might want therefore to skip some dead and experimental tags and branches if those are not important.

The downside is that adding those branches afterward is more complicated…​ so I would prefer to take over as much data as possible.

From time to time, the fetch will fail. The main causes I experienced were that either there was not enough memory (and the OS killed the process), or there was a temporary connection issue.

In those cases, just running git svn fetch will resume the fetching.

To automate it, just run it in a loop

for i in $(seq 1 20) ; do :; git svn fetch; done;

Another source of error is if an author is missing from the authors.txt file. In that case, git will refuse to continue. Simply add the missing mapping name, and then continue with git svn fetch.

Ooops, I’ve started to clone, but forgot some branches and tags

First, stop cloning as soon as possible to avoid duplicated work. Adding those missing branches and/or tags to .git/config is correct, but not sufficient.

AFAIK currently the only possibility is removing .git/svn/.metadata. In this file, git tracks how many svn revisions it has already checked. As branches might start at an older revision, we need to tell git to check them all another time.

After removing .git/svn/.metadata, do git svn fetch.

Convert branches and tags

Finally git svn fetch finished to download all changes, it’s time to create "real" git branches and tags.

git already created the master branch (which is mapped to the trunk in svn) and checked it out, but we need to take care of all the others.

# tags
for tag in $(git for-each-ref --format='%(refname:short)' refs/remotes/svn/tags); do :;
  git tag "${tag/svn\/tags\//}" "$tag";
done

# branches
for branch in $(git for-each-ref --format='%(refname:short)' refs/remotes/svn/branches); do :;
  git branch "${branch/svn\/branches\//}" "refs/remotes/$branch";
done

Contrary to the example given in the git book I preferred not deleting the content of 'refs/remotes/svn', at least immediately, in case I need to update my git-svn copy.

All branches except those created by git-svn can be pushed to a normal git repository:

git remote add origin git@my-git-server:myrepository.git;
git push origin --all;
git push origin --tags;

In case svn got updated in the meantime, it’s sufficient to download the latest changes (git svn fetch) and use --force for updating all branches and tags (and remember to update the master branch too). As long as those branches have not been changed, it won’t be an issue, otherwise, it might be better to create a new branch with the changes.

# as before, download svn changes
for i in $(seq 1 20) ; do :; git svn fetch; done;

# tags
for tag in $(git for-each-ref --format='%(refname:short)' refs/remotes/svn/tags); do :;
  git tag --force "${tag/svn\/tags\//}" "$tag";
done

# branches
for branch in $(git for-each-ref --format='%(refname:short)' refs/remotes/svn/branches); do :;
  git branch --force "${branch/svn\/branches\//}" "refs/remotes/$branch";
done

# or use "git branch --force master remotes/svn/trunk;" if master is not checked out and you do not want to
git checkout master; git reset --hard remotes/svn/trunk;
git push origin --all;
git push origin --tags;

Note: At this point, one could use git checkout master && git svn rebase, which is a git svn fetch, and it updates the master branch too. All other branches need to be updated as shown above.

Once the transition is over, it is possible to delete the svn branches:

# tags
for tag in $(git for-each-ref --format='%(refname:short)' refs/remotes/svn/tags); do :;
  git branch --delete --force --remotes "$tag";
done

# branches
for branch in $(git for-each-ref --format='%(refname:short)' refs/remotes/svn/branches); do :;
  git branch --delete --force --remotes "$branch";
done
git branch --delete --force --remotes "svn/trunk";

Conclusion

At this point, you have a mirror of the Subversion repository that can be pushed to a git remote. It’s time to look if everything looks as expected, like branches and tags.

As, at least in my experience, git and Subversion workflows are quite different when handling feature branches and merges, the history will look linear with merge commits that have a lot of changes at once. Unfortunately, AFAIK, there is no way to have merge commits like those I’m accustomed to when working with feature branches, where the merge commit has two parents.

In git, the history I’m accustomed working with looks like

o---A---o---o---o---o---B---o---C--      (branch #1)
     \                   \
      o----o----o----o----M----X--       (branch #2)

Here M has two parents, and ideally, it is an "empty" commit that just establishes the relation and brings the changes between A and B in M.

In Subversion, and thus also in the svn-imported git repository, the history looks more like

o---A---o---o---o---o---B---o---C--       (branch #1)
     \
      o----o----o----o----M----X--        (branch #2)

And M is a single gigantic commit (squash in git) with all changes between A and B put together. The information where the changes came from is stored in svn:mergeinfo, and while git-svn also takes advantage of it, the way it is implemented and used in practice (no central location, not always used, and last but not least different workflows), makes it hard to create the desired relation between branches.


Do you want to share your opinion? Or is there an error, some parts that are not clear enough?

You can contact me anytime.