Migrate from Subversion to git
pre-init
Before converting a svn repository to git there are some useful tasks to do.
Useful tools
Well, of course, git
, git-svn
, svn
, and access to the Subversion repository.
This type of operation is generally much slower on a Windows System (at least one order of magnitude, of course, it also depends on the environment), so while in GNU/Linux a conversion might take a whole day, on Windows it might take more than a week.
To convert from Subversion to git
, one needs to check out every single commit. Thus having a fast connection to the Subversion server is a big plus.
Search for relevant information
Before converting a Subversion repository to git, one needs to do a little homework to get the best result.
-
find all branches
-
find all tags
-
list all authors
As branches and tags are, in Subversion, simply folders, they can be located anywhere, so it might not be that easy to find them all.
Authors can be extracted from the log
svn log --xml | grep -P "^<author" | sort --unique | sed 's|<author>\(.*\)</author>|\1 = |g' > authors.txt
This will show only the authors that appeared in the current branch, thus some might be missing. As long as authors.txt
contains the majority of authors this won’t be a big issue; git
will complain during the conversion and name the missing authors, unfortunately one by one.
The authors.txt
file should look like
svn-name = git-name
Where git-name
normally is the name of the author, followed by its email address in brackets, like John Doe <john.doe@example.com>
. While mapping the authors is completely optional, and one could do the transition without mapping the old names to the new names, it’s an operation that does not take much more time and makes inspecting the history a lot easier.
git init
and git clone
If branches are located at conventional locations (trunk
, branches
, tags
), then it is as easy as
git svn clone https://example.com/project "$OUT_DIR" --prefix=svn/ --authors-file ./authors.txt
But if branches or tags are located at different locations, then one can specify those with --branches
and --tags
git svn clone https://example.com/project "$OUT_DIR" --prefix=svn/ --trunk "$TRUNK_DIR" --branches "$BRANCH_DIR" --tags "$TAG_DIR" --authors-file ./authors.txt
And if there are multiple locations where branches and tags are located, it can still be done
git svn clone https://example.com/project "$OUT_DIR" --prefix=svn/ --trunk "$TRUNK_DIR" --branches "$BRANCH_DIR1" --branches "$BRANCH_DIR2" --tags "$TAG_DIR1" --tags "$TAG_DIR2" --authors-file ./authors.txt
unless there are multiple branches and/or tags with the same name.
In that case, it is still possible to use git-svn
, but one has to adapt the config file manually.
git svn init https://example.com/project "$OUT_DIR" --prefix=svn/ --trunk "$TRUNK_DIR" --branches "$BRANCH_DIR1" --branches "$BRANCH_DIR2" --tags "$TAG_DIR1" --tags "$TAG_DIR2"
cd <out-dir>;
vim .git/config; # edit branches and tags to avoid collisions
git svn fetch
git svn init
does not accept --authors-file
as a parameter, so you need to add it manually in the .git/config
file.
After editing it, the .git/config
file should look similar to
[core]
repositoryformatversion = 0
filemode = true
bare = false
logallrefupdates = true
[svn-remote "svn"]
url = https://example.com/my-project
fetch = base/branches/my-project:refs/remotes/svn/trunk
branches = base/branches/my-project/my-branches1/*:refs/remotes/svn/my-branches1/*
branches = base/branches/my-project/my-branches1-alt/*:refs/remotes/svn/my-branches1/*
branches = base/branches/my-project/my-branches2/*:refs/remotes/svn/my-branches2/*
tags = base/tags/my-project/my-tags1/*:refs/remotes/svn/tags/my-tags1/*
tags = base/tags/my-project/my-tags2/*:refs/remotes/svn/tags/my-tags2/*
[svn]
authorsfile = authors.txt
Note that there are multiple branches
and tags
values, but the "output" can be set at different locations. This way, there will be no collisions, just like there were none in Subversion.
Warning ⚠️ | Avoid using --no-metadata ; it causes more harm than good. |
First of all, in case of an error, even something like a network hiccup, you will need to clone the whole repository from the beginning. Also, even if the transition to git
is flawless, it might still be useful to search for something in the old repository. With --no-metadata
, any reference to the Subversion repository is lost. If you do not set that flag, all commit messages will have as additional comment a reference to the svn
commit. Even if it might look ugly, the simple information that a specific commit has been made in svn
and not git
might be relevant, as changing repository technology is a big step.
Las, but not least, you could remove all those comments from the commit messages afterward with rebases, after the conversion is finished.
The long clone/fetch
As mentioned before, cloning the repository can take a lot of time. Time increases exponentially if there are a lot of branches and tags. You might want therefore to skip some dead and experimental tags and branches if those are not important.
The downside is that adding those branches afterward is more complicated… so I would prefer to take over as much data as possible.
From time to time, the fetch will fail. The main causes I experienced were that either there was not enough memory (and the OS killed the process), or there was a temporary connection issue.
In those cases, just running git svn fetch
will resume the fetching.
To automate it, just run it in a loop
for i in $(seq 1 20) ; do :; git svn fetch; done
Another source of error is if an author is missing from the authors.txt
file. In that case, git
will refuse to continue. Simply add the missing mapping name, and then continue with git svn fetch
.
Ooops, I’ve started to clone, but forgot some branches and tags
First, stop cloning as soon as possible to avoid duplicated work. Adding those missing branches and/or tags to .git/config
is correct, but not sufficient.
AFAIK currently the only possibility is removing .git/svn/.metadata
. In this file, git
tracks how many svn
revisions it has already checked. As branches might start at an older revision, we need to tell git
to check them all another time.
After removing .git/svn/.metadata
, do git svn fetch
.
Convert branches and tags
Finally git svn fetch
finished to download all changes, it’s time to create "real" git branches and tags.
git already created the master branch (which is mapped to the trunk in SVN) and checked it out, but we need to take care of all the others.
# tags
for tag in $(git for-each-ref --format='%(refname:short)' refs/remotes/svn/tags); do :;
git tag "${tag/svn\/tags\//}" "$tag";
done
# branches
for branch in $(git for-each-ref --format='%(refname:short)' refs/remotes/svn/branches); do :;
git branch "${branch/svn\/branches\//}" "refs/remotes/$branch";
done
Contrary to the example given in the git book I preferred not deleting the content of 'refs/remotes/svn', at least immediately, in case I need to update my copy another time (for example because people might still be working with the current SVN repository).
All branches except those created by git-svn
can be pushed to a normal git
repository:
git remote add origin git@my-git-server:myrepository.git;
git push origin --all;
git push origin --tags;
In case the SVN repository got updated in the meantime, it’s sufficient to download the latest changes (git svn fetch
) and use --force
for updating all branches and tags (and remember to update the master branch too). As long as those branches have not been changed, it won’t be an issue, otherwise, it might be better to create a new branch with the changes.
# as before, download svn changes
for i in $(seq 1 20) ; do :; git svn fetch; done;
# tags
for tag in $(git for-each-ref --format='%(refname:short)' refs/remotes/svn/tags); do :;
git tag --force "${tag/svn\/tags\//}" "$tag";
done
# branches
for branch in $(git for-each-ref --format='%(refname:short)' refs/remotes/svn/branches); do :;
git branch --force "${branch/svn\/branches\//}" "refs/remotes/$branch";
done
# or use "git branch --force master remotes/svn/trunk;" if master is not checked out and you do not want to
git checkout master; git reset --hard remotes/svn/trunk;
git push origin --all;
git push origin --tags;
Note: At this point, one could use git checkout master && git svn rebase
, which is a git svn fetch
, and it updates the master branch too. All other branches need to be updated as shown above.
Once the transition is over, it is possible to delete the SVN branches:
# tags
for tag in $(git for-each-ref --format='%(refname:short)' refs/remotes/svn/tags); do :;
git branch --delete --force --remotes "$tag";
done
# branches
for branch in $(git for-each-ref --format='%(refname:short)' refs/remotes/svn/branches); do :;
git branch --delete --force --remotes "$branch";
done
git branch --delete --force --remotes "svn/trunk";
Conclusion
At this point, you have a mirror of the Subversion repository that can be pushed to a git remote. It’s time to see if everything looks as expected, like branches and tags.
As, at least in my experience, git and Subversion workflows are quite different when handling feature branches and merges, the history will look linear with merge commits that have a lot of changes at once. Unfortunately, AFAIK, there is no way to have merge commits like those I’m accustomed to when working with feature branches, where the merge commit has two parents.
In git
, the history looks like
o---A---o---o---o---o---B---o---C-- (branch #1)
\ \
o----o----o----o----M----X-- (branch #2)
Here M
has two parents, and ideally, it is an "empty" commit that just establishes the relation and brings the changes between A
and B
in M
.
In Subversion the history looks more like
o---A---o---o---o---o---B---o---C-- (branch #1)
\
o----o----o----o----M----X-- (branch #2)
And M
is a single gigantic commit (squash
in git) with all changes between A
and B
put together. The information where the changes came from is stored in svn:mergeinfo
, and while git-svn
also takes advantage of it, the way it is implemented and used in practice (no central location, not always used, and last but not least different workflows), makes it hard to create the desired relation between branches.
Do you want to share your opinion? Or is there an error, some parts that are not clear enough?
You can contact me anytime.