From: Ramkumar Ramachandra <artagnon@gmail.com>
To: Stephen Bash <bash@genarts.com>
Cc: Matt Stump <mstump@goatyak.com>,
git@vger.kernel.org, Jonathan Nieder <jrnieder@gmail.com>,
David Michael Barr <david.barr@cordelta.com>,
Sverre Rabbelier <srabbelier@gmail.com>,
Tomas Carnecky <tom@dbservice.com>
Subject: Re: Converting to Git using svn-fe (Was: Speeding up the initial git-svn fetch)
Date: Tue, 19 Oct 2010 12:12:15 +0530 [thread overview]
Message-ID: <20101019064210.GA14309@kytes> (raw)
In-Reply-To: <8043579.526738.1287452576766.JavaMail.root@mail.hq.genarts.com>
Hi Stephen,
Stephen Bash writes:
> > From: "Ramkumar Ramachandra" <artagnon@gmail.com>
> > Stephen Bash writes:
> > > Extracting SVN's History
> > > ------------------------
> > > First we want to understand SVN's branching/tagging history. Modify
> > > buildSVNTree.pl as necessary, then run
> > > perl buildSVNTree.pl > svnBranches.txt
> >
> > > ...
> >
> > Unnecessary
>
> I'm going to collapse all these comments because I think we're
> coming at this from different angles. I agree, discovering the
> copies in git is "easy" (albeit an n^2 operation), and git will
> correctly identify file content. But when I was asked to preserve
> the SVN history, I decided to extract a DAG from SVN and migrate
> that DAG to Git. Thus the history itself is preserved (sans
> merges), not just the contents of the files. This is the purpose of
> buildSVNTree. I can elaborate further if requested.
Yep, they're certainly two different ways to approach the problem: I'd
be interested in investigating why it will produce different
results. Since we both agree that it's easier (and faster) to do it in
Git-land, I'm looking into the the areas where it falls short.
Yes, I understand your script (although I can't actually read Perl
:p), but the differences are still not very clear to me.
> > > Ah, I should probably mention: svn-fe can produce "empty"
> > > commits, and filterBranch does nothing to remove them. By "empty" I
> > > mean there will be a commit object without any content changes. So
> > > creating a branch/tag in SVN creates a commit, but doesn't change
> > > content. That commit will be part of the new Git history.
> > > Similarly, filterBranch will create git tags from svn tags, but they
> > > point to one of these "empty" commits rather than the branch they
> > > are tagged from. It's not very git-ish, but it seems to work...
> >
> > Oh, I didn't realize that fast-import allows the creation of empty
> > commits. We should probably fix this?
>
> To be precise: svn-fe creates commits where
> git diff-tree treeA treeB
> is empty with treeA being the tree object of /trunk/project and
> treeB being the tree of /branches/foo/project. This version of my
> tools does not squash these commits, a future version probably will
> (this may cause problems with two-way communication?).
Right, that IS expected behavior. Don't they correspond to separate
SVN revisions anyway? Why would you want to squash them?
[Ignore this; see later in the email]
> > > filterBranch is probably the longest step of the process; there's a
> > > lot of filtering going on. It will be very verbose on STDOUT, so I
> > > recommend tee'ing to a file or a terminal with infinite scroll back.
> > > It also involves a lot of disk hits (somewhat reduced if $tempdir is
> > > a RAM disk), and potentially a lot of space (it will create a git
> > > repo for every branch/tag in your subversion history). For our
> > > repository this step took about 1.5-2 hours IIRC.
> >
> > Wow, this really brute-force.
>
> Yes it is. If I get around to writing a new version, I'll at least
> advance to a single pass using commit-tree. Beyond that I'm
> probably into the fast-import code, which I'll happily leave to the
> rest of you :)
*nod*
> > > Note that SVN rev to Git commit can be one to many!
> >
> > Unless there's a one-to-one mapping between Git revisions and SVN
> > revisions, a two-way bridge will become very difficult to build. Can
> > you think of any scenarios where a one-to-one mapping doesn't make
> > sense?
>
> I have 32 SVN revs in my history that touch multiple Git commit
> objects. The simplest example is
> svn mv svn://svnrepo/branches/badBranchName svn://svnrepo/branches/goodBranchName
> which creates a single SVN commit that touches two branches
> (badBranchName will have all it's contents deleted, goodBranchName
> will have an "empty commit" as described above). The more devious
> version is the SVN rev where a developer checked out / (yes, I'm not
> kidding) and proceeded to modify a single file on all branches in
> one commit. In our case, that one SVN rev touches 23 git commit
> objects. And while the latter is somewhat a corner case, the former
> is common and probably needs to be dealt with appropriately (it's
> kind of a stupid operation in Git-land, so maybe it can just be
> squashed).
Ouch! Thanks for the illustrative example- I understand now. We have
to bend backwards to perform a one-to-one mapping. It's finally struck
me- one-to-one mapping is nearly impossible to achieve, and I don't
know if it makes sense to strive for it anymore. Looks like Jonathan
got it earlier.
> > Grafts and filter-branch. db-svn-filter-root does this more elegantly.
>
> I found a 'db-svn-filter-root' branch, but it was not entirely
> obvious to me what code I should be looking at...
Um, there's just one commit that deviates from the branch it's based
on (but you don't know that, and I should have been clearer): look at
contrib/svn-fe/svn-filter-root.py
It's just a minimalistic mapper, but it's fast and done nicely. You
can use ideas from it when you're building yours.
> > > Hiding 'Deleted' Branches
> > > -------------------------
> >
> > Hm. You didn't include the history of deleted branches in the main
> > repository. Why?
>
> The commit objects are still there, I simply moved the refs to
> refs/hidden/{heads,tags}. Because my goal was to maintain the full
> SVN history I needed to somehow protect the objects from garbage
> collection. At the time I didn't know about "git merge -s ours", so
> this strategy achieved my goal of protecting the objects. In this
> case, the refs are not cloned, but are fetch-able, so I found it to
> be a reasonable solution.
Oh.
> > Does it make sense to provide the user an option to
> > exclude some (deleted) branches in the SVN history? It'll make the
> > two-way mapping extremely difficult.
>
> I think there are cases where a user could say "I don't care about
> dead development branches". In my current system, all branches,
> even those that do not contribute back to the trunk are saved in the
> hidden namespace. But I could see users that don't care about some
> or all extraneous branches and would be happy to not convert them or
> to let them be garbage collected.
When I made this comment, I was thinking of the one-to-one mapping. It
makes much more sense now.
> > Thanks for the interesting and insightful read :)
>
> I'm glad it's stimulating conversation. I'm beginning to wonder if
> there might be competing design goals for one-way vs. two-way
> compatibility... Performance is one place where opinions probably
> greatly differ (I didn't mind taking an extra 30 minutes to mirror
> my SVN repo because it probably saved more than that in
> communication overhead later in the process, but that mirror
> operation is very taxing on your timeline); my exhaustive search of
> all SVN copies is another (I wanted to be *extremely* certain I knew
> about all the misplaced branches/tags, but it's inefficient for a
> casual developer who just wants to interact with an SVN server).
> It's all just food for thought, and I'm happy to carry on the
> conversation from my different point-of-view :)
Ok, I still don't get this part- why mirror at all? Can't all the
information be mined out of the in-memory tree that svn-fe builds
while parsing the dumpfile? From the SVN-side, all that's required is
a streaming dumpfile like the one that `svnrdump dump` produces.
-- Ram
next prev parent reply other threads:[~2010-10-19 6:43 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-10-13 15:44 Speeding up the initial git-svn fetch Matt Stump
2010-10-13 16:02 ` Stephen Bash
2010-10-13 17:47 ` Matt Stump
2010-10-13 18:18 ` Stephen Bash
2010-10-14 16:22 ` Converting to Git using svn-fe (Was: Speeding up the initial git-svn fetch) Stephen Bash
2010-10-14 16:34 ` Jonathan Nieder
2010-10-14 20:07 ` Sverre Rabbelier
2010-10-15 14:50 ` Stephen Bash
2010-10-15 23:39 ` Sverre Rabbelier
2010-10-16 0:16 ` Stephen Bash
2010-10-17 2:25 ` Sverre Rabbelier
2010-10-17 3:33 ` David Michael Barr
2010-10-18 5:17 ` Ramkumar Ramachandra
2010-10-18 7:31 ` Jonathan Nieder
2010-10-18 16:38 ` Ramkumar Ramachandra
2010-10-18 16:46 ` Sverre Rabbelier
2010-10-18 16:56 ` Jonathan Nieder
2010-10-18 17:16 ` Ramkumar Ramachandra
2010-10-18 17:18 ` Sverre Rabbelier
2010-10-18 17:28 ` Jonathan Nieder
2010-10-18 18:10 ` Sverre Rabbelier
2010-10-18 18:13 ` Jonathan Nieder
2010-10-18 18:20 ` Sverre Rabbelier
2010-10-18 18:25 ` Jonathan Nieder
2010-10-18 18:35 ` Sverre Rabbelier
2010-10-18 19:33 ` Jonathan Nieder
2010-10-19 3:08 ` Ramkumar Ramachandra
2010-10-19 0:40 ` Stephen Bash
2010-10-19 1:42 ` Stephen Bash
2010-10-19 6:42 ` Ramkumar Ramachandra [this message]
2010-10-19 13:33 ` Stephen Bash
2010-10-19 14:28 ` David Michael Barr
2010-10-19 14:57 ` Stephen Bash
2010-10-20 8:39 ` Will Palmer
2010-10-20 11:59 ` Jakub Narebski
2010-10-20 13:42 ` Will Palmer
2010-10-20 20:44 ` Jakub Narebski
2010-10-21 1:54 ` mrevilgnome
2010-10-21 8:16 ` Jakub Narebski
2010-10-21 13:49 ` Stephen Bash
2010-10-21 9:08 ` Will Palmer
2010-10-21 14:00 ` Stephen Bash
2010-10-21 18:37 ` Jakub Narebski
2010-10-21 21:27 ` Stephen Bash
2010-10-21 22:49 ` Jakub Narebski
2010-10-21 23:26 ` Stephen Bash
2010-10-22 10:38 ` Jakub Narebski
2010-10-21 15:52 ` Jakub Narebski
2010-10-21 16:16 ` Jonathan Nieder
2010-10-20 14:05 ` Ramkumar Ramachandra
2010-10-20 14:21 ` Stephen Bash
2010-10-20 16:56 ` Ramkumar Ramachandra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101019064210.GA14309@kytes \
--to=artagnon@gmail.com \
--cc=bash@genarts.com \
--cc=david.barr@cordelta.com \
--cc=git@vger.kernel.org \
--cc=jrnieder@gmail.com \
--cc=mstump@goatyak.com \
--cc=srabbelier@gmail.com \
--cc=tom@dbservice.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.