Wondering how to migrate a Subversion repository to a Git repository? If so, this is the blog for you. We will cover that and various issues that can be encountered. The steps discussed here are a summary of BitBucket’s step-by-step procedure for doing the migration and rely on some tools/scripts provided by Bitbucket (BB). Sometimes, there is additional preparation of the SVN repository required. That is discussed at the end of this article.
NOTE: you can have multiple branch paths specified.
NOTE: this differs from the instructions on BB (as of 2018-05-31). The instructions on BB do not include specifing the prefix as empty. Git changed its default behavior from 1.x to 2.x. In 1.x the default prefix when doing and svn clone was to use an empty prefix. In 2.x the default is to add a prefix named “origin”. This does 2 things. First, it screws up later BB tools for converting SVN branches to Git branches (although there is technically a different way to fix that). Second, it will add the prefix “origin/” to the name of all of your branches and tags once they are converted to Git branches in tags. That may be desirable in some instance, since it will remind you that those branches and tags came from SVN, but also adds unnecessary decoration if your true goal is to switch to git permanently. This is the recommended (Dan Santos) approach.
NOTE: remember to put the git repo in a case-sensitive file system (see above)
The git utilities won’t actually convert your SVN branches and tags to true git branches and tags. The BB utilities will help you do that though.
NOTE: If you didn’t specify an empty prefix in the git svn clone step, then you will first see all your branches converted/created first and then at the end the script will delete all of the created branches. It also seems to screw up the tags as well. You can get around this by passing the “–no-delete” option after the force option. However, as of this writing (2018-05-31) this still doesn’t fix issues with tags. Also, the conversion will keep all branches ever made. You’ll then have to go into git and manually delete all branches that were already merged back in the SVN repository and deleted.
The above steps should work fine for both standard SVN layouts and simple non-standard layouts where the location of the trunk, branches, and tags folders never changed. If you ever changed, moved, or renamed the trunk, branches, and tags folders you are going to have serious issues with the conversion. It is a good idea to consider whether or not you really need to maintain history or if it would be ok to just start a git repository fresh from the latest SVN version, as if it was the first commit into git. However, if you really want to maintain history, there is a way to do it – described here.
Basically, you need to “massage” your old SVN repository data into a new SVN repository structure that looks as-if you started with the standard trunk, branches, and tags layout and never changed it.
When you use an include filter, the process will end up with empty revision numbers for the revisions that didn’t work on your project. It’s generally better to get rid of those. However, if you refer to specific SVN revision numbers anywhere (e.g. commits, documentation), then you will not preserve that information. However, it’s generally recommended if you can live with it.
First, make a backup of the starting dumpfile. You will likely need to make multiple attempts at correcting all the issues you want to correct and creating the dumpfile can take a long time.
Next, start making a script that is going to use the “sed” command to find and replace text in the dumpfile that will correct things so it looks like your repository always just had the plain old trunk, branches, and tags structure.
There are 2 types of lines we need to start replacing in the SVN dump file. Type 1: lines that start with “Node-path: <x>” and Type 2: lines that start with “Node-copyfrom-path: <x>”.
The first type is related to lines in the dumpfile that is basically telling SVN where the operation is occurring.
An example of how this might get used in the dump file is:
A section like this is what you would see for an initial commit that create the trunk folder in the root directory structure.
The second type of line (the copyfrom) occurs when you move a file or folder in SVN. You might see something like:
A section like this is what you would see when you moved file1.h from folder2 to folder1. The SVN dumpfile keeps track of where and at what revision the file came from.
Let’s say that you have a project that you want to migrate that is in a subfolder of a “master” repository. In this particular case, you can use the built-in “git svn clone” options to just specify the paths to the trunk, branches, tags, but let’s instead use editing the dump file to achieve the same thing. Let’s suppose that in your repository you have the following structure:
You’ve first already created a dumpfile that only includes these folders (see above). Now we want to edit the dumpfile so it looks like none of the commits ever knew about the top-level folder.
We do that by doing the following in the script file:
sed -i ‘s/Node-path: Project\/trunk/Node-path: trunk/g’ <dumpfile>
sed -i ‘s/Node-copyfrom-path: Project\/trunk/Node-copyfrom-path: trunk/g’ <dumpfile>
sed -i ‘s/Node-path: Project\/branches/Node-path: branches/g’ <dumpfile>
sed -i ‘s/Node-copyfrom-path: Project\/branches/Node-copyfrom-path: branches/g’ <dumpfile>
sed -i ‘s/Node-path: Project\/tags/Node-path: tags/g’ <dumpfile>
sed -i ‘s/Node-copyfrom-path: Project\/tags/Node-copyfrom-path: tags/g’ <dumpfile>
What we are doing is replacing the paths to every single folder/file in every single commit in the dumpfile with a new path that strips the first folder out. We need to also include the “copyfrom” versions so that when things are moved, it matches the new directory structures we are creating.
Now, that is a relatively trivial example. But you can fix other much more non-trivial issues. Let’s say you actually started with a root trunk/branches/tags and then you moved it into a sub-folder. So this:
Got converted to this with some folder SVN-moves:
This is where things start to get complicated. Using “svn git clone” you cannot specify that the trunk got moved around. Now, you must fix it in the dumpfile first. The good news is that you can do this. The script to do this is actually the same as the script above for the trivial case. However, there are now some additional tweaks we need to make to the dumpfile.
Things can get even more complicated than this as well. Imagine if you moved a file to a location that was filtered out in the original dump because it’s not part of the project you want to recreate. But then the file got moved back in to the project that you do want to recreate. In those cases, you actually need to create a dump that contains the parts that you don’t (eventually) want, at least for now. Then, one suggestion is to move those folder locations using the tricks above to a temporary folder in the new “trunk”, as in “trunk/svn-git-migration/<folder-to-delete-later”. This will allow you to preserve the file moves during the recreation of the repo. After you make the new repo (see below), you can then SVN-delete the temporary folder right before you convert to git.
You can also use these tricks to deal with externals. This section describes how to handle when you moved some stuff around in a “master” repo to share it between some projects. Git has submodules and subtrees if you want to preserve and SVN “externals” as its own repo but let’s suppose that instead, you want to make it look like the contents of the externals file never moved around at all, so you can create one new repo with all the history of the externals too. Let’s suppose you started with:
Then, you took some stuff from Project1 and started moving it to a shared section in your repo (to share with Project2) and added an externals folder in your trunk to get back into Project1. So now your repo looks like:
You can use the same path replacement techniques to make it look like all you did was reorganize some files that were in other areas of “trunk” into a new subfolder you named “Library”. Just do the following with some sed commands:
sed -i ‘s/Node-path: Library\/trunk/Node-path: trunk\/Library/g’ <dumpfile>
sed -i ‘s/Node-copyfrom-path: Library\/trunk/Node-copyfrom-path: trunk\/Library/g’ <dumpfile>
There is one other thing that you need to do though. Since “trunk/Library” is created via an externals, the actual “Library” folder is never created in the dumpfile. So when recreating the repo, it’s going to fail when it trys to move stuff into the Library folder because it doesn’t exist yet. You will need to go into the dumpfile and manually “add” a directory in an appropriate existing commit (probably the one right before you need to use it). This is similar to how we had to manually delete some parts of the dumpfile discussed above. You can do this with something like:
You generally only need to do this for the “trunk”, however, this may not completely preserve what the state of the Library was in any branches you created. You probably fixed the revision of Library in those branches. You could (if you want) continue using these tricks to fix it all up.
Things can also get tricky sometimes regarding the order of the path replacement operations. For example, consider a folder structure that evolved as such:
If you first try to replace “Project” with trunk, it will also replace “Project-trunk” with “trunk-trunk”. That will screw up future replacement operations. So, generally speaking, do the replacement operation from most-specific to least-specific. In this example, the order might be: