From: Stephen Bash <bash@genarts.com>
To: Ramkumar Ramachandra <artagnon@gmail.com>
Cc: Matt Stump <mstump@goatyak.com>,
git@vger.kernel.org, Jonathan Nieder <jrnieder@gmail.com>,
David Michael Barr <david.barr@cordelta.com>,
Sverre Rabbelier <srabbelier@gmail.com>,
Tomas Carnecky <tom@dbservice.com>
Subject: Re: Converting to Git using svn-fe (Was: Speeding up the initial git-svn fetch)
Date: Tue, 19 Oct 2010 09:33:16 -0400 (EDT) [thread overview]
Message-ID: <6831849.526935.1287495195964.JavaMail.root@mail.hq.genarts.com> (raw)
In-Reply-To: <20101019064210.GA14309@kytes>
----- Original Message -----
> From: "Ramkumar Ramachandra" <artagnon@gmail.com>
> To: "Stephen Bash" <bash@genarts.com>
> Sent: Tuesday, October 19, 2010 2:42:15 AM
> Subject: Re: Converting to Git using svn-fe (Was: Speeding up the initial git-svn fetch)
>
> Stephen Bash writes:
> > I'm going to collapse all these comments because I think we're
> > coming at this from different angles. I agree, discovering the
> > copies in git is "easy" (albeit an n^2 operation), and git will
> > correctly identify file content. But when I was asked to preserve
> > the SVN history, I decided to extract a DAG from SVN and migrate
> > that DAG to Git. Thus the history itself is preserved (sans
> > merges), not just the contents of the files. This is the purpose of
> > buildSVNTree. I can elaborate further if requested.
>
> Yep, they're certainly two different ways to approach the problem: I'd
> be interested in investigating why it will produce different
> results. Since we both agree that it's easier (and faster) to do it in
> Git-land, I'm looking into the the areas where it falls short.
Ack! I left my example at home this morning... I'll explain it here, but perhaps I can actually send out a test script tonight or tomorrow (if there's need). The basic premise is git's copy detection finds files with the same content, not necessarily the source of an SVN copy.
It's also possible you can do this in svn-fe or in fast-import -- there may be more information there. I was looking strictly pre-svn-fe or post-fast-import...
Here's how I created a discrepancy between SVN and Git:
1) Create a new svn repo
2) Create the standard layout (trunk, branches, tags)
3) Create multiple files on the trunk
4) Create a branch (svn cp trunk branches/branchName)
5) Edit a file on the branch (leave some of the others alone)
6) (optional) edit a file on the trunk
7) Merge the branch back to the trunk
8) Create a tag from the trunk (svn cp trunk tags/tagName)
9) git fast-import the repo
Now "svn log -v svn://svnrepo/tags/tagName" will show something like
A /tags/tagName (from /trunk:rev)
OTOH "git log --name-status --find-copies-harder" will show something like
C100 /tags/tagName/foo (from /trunk/foo)
C100 /tags/tagName/bar (from /branches/branchName/bar)
C100 /tags/tagName/baz (from /trunk/baz)
assuming bar is the file edited on the branch and then merged back to the trunk (this is all from memory, so please forgive me if the output isn't quite right). I think from Git's point-of-view, this copy information is correct, but it doesn't describe SVN's history -- and I'm not entirely sure how a Git-only solution could identify precisely what's going on there... (hopefully I'm just being naive)
> > I found a 'db-svn-filter-root' branch, but it was not entirely
> > obvious to me what code I should be looking at...
>
> Um, there's just one commit that deviates from the branch it's based
> on (but you don't know that, and I should have been clearer): look at
> contrib/svn-fe/svn-filter-root.py
>
> It's just a minimalistic mapper, but it's fast and done nicely. You
> can use ideas from it when you're building yours.
Okay, David pointed me to that earlier, but I haven't dug into it yet. I'll take a look.
> > I'm glad it's stimulating conversation. I'm beginning to wonder if
> > there might be competing design goals for one-way vs. two-way
> > compatibility... Performance is one place where opinions probably
> > greatly differ (I didn't mind taking an extra 30 minutes to mirror
> > my SVN repo because it probably saved more than that in
> > communication overhead later in the process, but that mirror
> > operation is very taxing on your timeline); my exhaustive search of
> > all SVN copies is another (I wanted to be *extremely* certain I knew
> > about all the misplaced branches/tags, but it's inefficient for a
> > casual developer who just wants to interact with an SVN server).
> > It's all just food for thought, and I'm happy to carry on the
> > conversation from my different point-of-view :)
>
> Ok, I still don't get this part- why mirror at all? Can't all the
> information be mined out of the in-memory tree that svn-fe builds
> while parsing the dumpfile? From the SVN-side, all that's required is
> a streaming dumpfile like the one that `svnrdump dump` produces.
Oh, from that point of view the svn mirror is a bystander. I was developing these tools at the same time as svnrdump (or at least prior to a stable version of svnrdump). So when I found that running "svnadmin dump | svn-fe | git fast-import" on the server was taxing the system, I decided it was better to create a dump file, copy it to my local machine, and run svn-fe and fast-import locally. Once I had the dump file, the local mirror sped up the SVN::Ra calls in buildSVNTree, and made any "did that really happen in svn?!" questions a little easier to answer.
Thanks,
Stephen
next prev parent reply other threads:[~2010-10-19 13:33 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-10-13 15:44 Speeding up the initial git-svn fetch Matt Stump
2010-10-13 16:02 ` Stephen Bash
2010-10-13 17:47 ` Matt Stump
2010-10-13 18:18 ` Stephen Bash
2010-10-14 16:22 ` Converting to Git using svn-fe (Was: Speeding up the initial git-svn fetch) Stephen Bash
2010-10-14 16:34 ` Jonathan Nieder
2010-10-14 20:07 ` Sverre Rabbelier
2010-10-15 14:50 ` Stephen Bash
2010-10-15 23:39 ` Sverre Rabbelier
2010-10-16 0:16 ` Stephen Bash
2010-10-17 2:25 ` Sverre Rabbelier
2010-10-17 3:33 ` David Michael Barr
2010-10-18 5:17 ` Ramkumar Ramachandra
2010-10-18 7:31 ` Jonathan Nieder
2010-10-18 16:38 ` Ramkumar Ramachandra
2010-10-18 16:46 ` Sverre Rabbelier
2010-10-18 16:56 ` Jonathan Nieder
2010-10-18 17:16 ` Ramkumar Ramachandra
2010-10-18 17:18 ` Sverre Rabbelier
2010-10-18 17:28 ` Jonathan Nieder
2010-10-18 18:10 ` Sverre Rabbelier
2010-10-18 18:13 ` Jonathan Nieder
2010-10-18 18:20 ` Sverre Rabbelier
2010-10-18 18:25 ` Jonathan Nieder
2010-10-18 18:35 ` Sverre Rabbelier
2010-10-18 19:33 ` Jonathan Nieder
2010-10-19 3:08 ` Ramkumar Ramachandra
2010-10-19 0:40 ` Stephen Bash
2010-10-19 1:42 ` Stephen Bash
2010-10-19 6:42 ` Ramkumar Ramachandra
2010-10-19 13:33 ` Stephen Bash [this message]
2010-10-19 14:28 ` David Michael Barr
2010-10-19 14:57 ` Stephen Bash
2010-10-20 8:39 ` Will Palmer
2010-10-20 11:59 ` Jakub Narebski
2010-10-20 13:42 ` Will Palmer
2010-10-20 20:44 ` Jakub Narebski
2010-10-21 1:54 ` mrevilgnome
2010-10-21 8:16 ` Jakub Narebski
2010-10-21 13:49 ` Stephen Bash
2010-10-21 9:08 ` Will Palmer
2010-10-21 14:00 ` Stephen Bash
2010-10-21 18:37 ` Jakub Narebski
2010-10-21 21:27 ` Stephen Bash
2010-10-21 22:49 ` Jakub Narebski
2010-10-21 23:26 ` Stephen Bash
2010-10-22 10:38 ` Jakub Narebski
2010-10-21 15:52 ` Jakub Narebski
2010-10-21 16:16 ` Jonathan Nieder
2010-10-20 14:05 ` Ramkumar Ramachandra
2010-10-20 14:21 ` Stephen Bash
2010-10-20 16:56 ` Ramkumar Ramachandra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6831849.526935.1287495195964.JavaMail.root@mail.hq.genarts.com \
--to=bash@genarts.com \
--cc=artagnon@gmail.com \
--cc=david.barr@cordelta.com \
--cc=git@vger.kernel.org \
--cc=jrnieder@gmail.com \
--cc=mstump@goatyak.com \
--cc=srabbelier@gmail.com \
--cc=tom@dbservice.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).