From: Stephen Bash <bash@genarts.com>
To: Ramkumar Ramachandra <artagnon@gmail.com>
Cc: Matt Stump <mstump@goatyak.com>,
git@vger.kernel.org, Jonathan Nieder <jrnieder@gmail.com>,
David Michael Barr <david.barr@cordelta.com>,
Sverre Rabbelier <srabbelier@gmail.com>,
Tomas Carnecky <tom@dbservice.com>
Subject: Re: Converting to Git using svn-fe (Was: Speeding up the initial git-svn fetch)
Date: Mon, 18 Oct 2010 21:42:56 -0400 (EDT) [thread overview]
Message-ID: <8043579.526738.1287452576766.JavaMail.root@mail.hq.genarts.com> (raw)
In-Reply-To: <20101018051702.GD22376@kytes>
----- Original Message -----
> From: "Ramkumar Ramachandra" <artagnon@gmail.com>
> To: "Stephen Bash" <bash@genarts.com>
> Sent: Monday, October 18, 2010 1:17:05 AM
> Subject: Re: Converting to Git using svn-fe (Was: Speeding up the initial git-svn fetch)
>
> [sorry about the delayed reply; was ill]
No problem! It's taken me more than 12 hours to actually compose a response (literally, I hit "Reply All" over 12 hours ago!), I don't think I can complain :)
> Stephen Bash writes:
> > Converting to Git using svn-fe
> > ------------------------------
> > I was
> > pointed to David Barr's svn-dump-fast-export tool:
> > http://github.com/barrbrain/svn-dump-fast-export
>
> So you used the version that supports dumpfile v2 that's merged into
> git.git `master`.
Yes, thanks for the clarification.
> > Extracting SVN's History
> > ------------------------
> > First we want to understand SVN's branching/tagging history. Modify
> > buildSVNTree.pl as necessary, then run
> > perl buildSVNTree.pl > svnBranches.txt
>
> > ...
>
> Unnecessary
I'm going to collapse all these comments because I think we're coming at this from different angles. I agree, discovering the copies in git is "easy" (albeit an n^2 operation), and git will correctly identify file content. But when I was asked to preserve the SVN history, I decided to extract a DAG from SVN and migrate that DAG to Git. Thus the history itself is preserved (sans merges), not just the contents of the files. This is the purpose of buildSVNTree. I can elaborate further if requested.
> > There's also some logic in buildSVNTree to determine if a branch/tag
> > is deleted in the SVN head. That information is used by
> > hideFromGit.
>
> It'll be in the revision history in Git anyway- it doesn't require
> special handling.
See below.
> > Ah, I should probably mention: svn-fe can produce "empty"
> > commits, and filterBranch does nothing to remove them. By "empty" I
> > mean there will be a commit object without any content changes. So
> > creating a branch/tag in SVN creates a commit, but doesn't change
> > content. That commit will be part of the new Git history.
> > Similarly, filterBranch will create git tags from svn tags, but they
> > point to one of these "empty" commits rather than the branch they
> > are tagged from. It's not very git-ish, but it seems to work...
>
> Oh, I didn't realize that fast-import allows the creation of empty
> commits. We should probably fix this?
To be precise: svn-fe creates commits where
git diff-tree treeA treeB
is empty with treeA being the tree object of /trunk/project and treeB being the tree of /branches/foo/project. This version of my tools does not squash these commits, a future version probably will (this may cause problems with two-way communication?).
> > filterBranch is probably the longest step of the process; there's a
> > lot of filtering going on. It will be very verbose on STDOUT, so I
> > recommend tee'ing to a file or a terminal with infinite scroll back.
> > It also involves a lot of disk hits (somewhat reduced if $tempdir is
> > a RAM disk), and potentially a lot of space (it will create a git
> > repo for every branch/tag in your subversion history). For our
> > repository this step took about 1.5-2 hours IIRC.
>
> Wow, this really brute-force.
Yes it is. If I get around to writing a new version, I'll at least advance to a single pass using commit-tree. Beyond that I'm probably into the fast-import code, which I'll happily leave to the rest of you :)
> > Note that SVN rev to Git commit can be one to many!
>
> Unless there's a one-to-one mapping between Git revisions and SVN
> revisions, a two-way bridge will become very difficult to build. Can
> you think of any scenarios where a one-to-one mapping doesn't make
> sense?
I have 32 SVN revs in my history that touch multiple Git commit objects. The simplest example is
svn mv svn://svnrepo/branches/badBranchName svn://svnrepo/branches/goodBranchName
which creates a single SVN commit that touches two branches (badBranchName will have all it's contents deleted, goodBranchName will have an "empty commit" as described above). The more devious version is the SVN rev where a developer checked out / (yes, I'm not kidding) and proceeded to modify a single file on all branches in one commit. In our case, that one SVN rev touches 23 git commit objects. And while the latter is somewhat a corner case, the former is common and probably needs to be dealt with appropriately (it's kind of a stupid operation in Git-land, so maybe it can just be squashed).
> Grafts and filter-branch. db-svn-filter-root does this more elegantly.
I found a 'db-svn-filter-root' branch, but it was not entirely obvious to me what code I should be looking at...
> > Hiding 'Deleted' Branches
> > -------------------------
>
> Hm. You didn't include the history of deleted branches in the main
> repository. Why?
The commit objects are still there, I simply moved the refs to refs/hidden/{heads,tags}. Because my goal was to maintain the full SVN history I needed to somehow protect the objects from garbage collection. At the time I didn't know about "git merge -s ours", so this strategy achieved my goal of protecting the objects. In this case, the refs are not cloned, but are fetch-able, so I found it to be a reasonable solution.
> Does it make sense to provide the user an option to
> exclude some (deleted) branches in the SVN history? It'll make the
> two-way mapping extremely difficult.
I think there are cases where a user could say "I don't care about dead development branches". In my current system, all branches, even those that do not contribute back to the trunk are saved in the hidden namespace. But I could see users that don't care about some or all extraneous branches and would be happy to not convert them or to let them be garbage collected.
> Thanks for the interesting and insightful read :)
I'm glad it's stimulating conversation. I'm beginning to wonder if there might be competing design goals for one-way vs. two-way compatibility... Performance is one place where opinions probably greatly differ (I didn't mind taking an extra 30 minutes to mirror my SVN repo because it probably saved more than that in communication overhead later in the process, but that mirror operation is very taxing on your timeline); my exhaustive search of all SVN copies is another (I wanted to be *extremely* certain I knew about all the misplaced branches/tags, but it's inefficient for a casual developer who just wants to interact with an SVN server). It's all just food for thought, and I'm happy to carry on the conversation from my different point-of-view :)
Thanks,
Stephen
next prev parent reply other threads:[~2010-10-19 1:43 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-10-13 15:44 Speeding up the initial git-svn fetch Matt Stump
2010-10-13 16:02 ` Stephen Bash
2010-10-13 17:47 ` Matt Stump
2010-10-13 18:18 ` Stephen Bash
2010-10-14 16:22 ` Converting to Git using svn-fe (Was: Speeding up the initial git-svn fetch) Stephen Bash
2010-10-14 16:34 ` Jonathan Nieder
2010-10-14 20:07 ` Sverre Rabbelier
2010-10-15 14:50 ` Stephen Bash
2010-10-15 23:39 ` Sverre Rabbelier
2010-10-16 0:16 ` Stephen Bash
2010-10-17 2:25 ` Sverre Rabbelier
2010-10-17 3:33 ` David Michael Barr
2010-10-18 5:17 ` Ramkumar Ramachandra
2010-10-18 7:31 ` Jonathan Nieder
2010-10-18 16:38 ` Ramkumar Ramachandra
2010-10-18 16:46 ` Sverre Rabbelier
2010-10-18 16:56 ` Jonathan Nieder
2010-10-18 17:16 ` Ramkumar Ramachandra
2010-10-18 17:18 ` Sverre Rabbelier
2010-10-18 17:28 ` Jonathan Nieder
2010-10-18 18:10 ` Sverre Rabbelier
2010-10-18 18:13 ` Jonathan Nieder
2010-10-18 18:20 ` Sverre Rabbelier
2010-10-18 18:25 ` Jonathan Nieder
2010-10-18 18:35 ` Sverre Rabbelier
2010-10-18 19:33 ` Jonathan Nieder
2010-10-19 3:08 ` Ramkumar Ramachandra
2010-10-19 0:40 ` Stephen Bash
2010-10-19 1:42 ` Stephen Bash [this message]
2010-10-19 6:42 ` Ramkumar Ramachandra
2010-10-19 13:33 ` Stephen Bash
2010-10-19 14:28 ` David Michael Barr
2010-10-19 14:57 ` Stephen Bash
2010-10-20 8:39 ` Will Palmer
2010-10-20 11:59 ` Jakub Narebski
2010-10-20 13:42 ` Will Palmer
2010-10-20 20:44 ` Jakub Narebski
2010-10-21 1:54 ` mrevilgnome
2010-10-21 8:16 ` Jakub Narebski
2010-10-21 13:49 ` Stephen Bash
2010-10-21 9:08 ` Will Palmer
2010-10-21 14:00 ` Stephen Bash
2010-10-21 18:37 ` Jakub Narebski
2010-10-21 21:27 ` Stephen Bash
2010-10-21 22:49 ` Jakub Narebski
2010-10-21 23:26 ` Stephen Bash
2010-10-22 10:38 ` Jakub Narebski
2010-10-21 15:52 ` Jakub Narebski
2010-10-21 16:16 ` Jonathan Nieder
2010-10-20 14:05 ` Ramkumar Ramachandra
2010-10-20 14:21 ` Stephen Bash
2010-10-20 16:56 ` Ramkumar Ramachandra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8043579.526738.1287452576766.JavaMail.root@mail.hq.genarts.com \
--to=bash@genarts.com \
--cc=artagnon@gmail.com \
--cc=david.barr@cordelta.com \
--cc=git@vger.kernel.org \
--cc=jrnieder@gmail.com \
--cc=mstump@goatyak.com \
--cc=srabbelier@gmail.com \
--cc=tom@dbservice.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).