From: Jonathan Nieder <jrnieder@gmail.com>
To: David Barr <david.barr@cordelta.com>
Cc: Git Mailing List <git@vger.kernel.org>,
Sverre Rabbelier <srabbelier@gmail.com>,
Ramkumar Ramachandra <artagnon@gmail.com>,
Eric Wong <normalperson@yhbt.net>
Subject: Re: [PATCH] contrib/svn-fe: Fast script to remap svn history
Date: Sat, 20 Nov 2010 23:17:34 -0600 [thread overview]
Message-ID: <20101121051734.GA11856@burratino> (raw)
In-Reply-To: <1286431561-24126-1-git-send-email-david.barr@cordelta.com>
Hi David,
David Barr wrote:
> This python script walks the commit sequence imported by svn-fe.
> For each commit, it tries to identify the branch that was changed.
> Commits are rewritten to be rooted according to the standard layout.
I like the idea and especially that the heuristics are simple.
Maybe this could be made git-agnostic using the new ls-tree command
you are introducing in fast-import? Though it would need to get a
revision list from somewhere. Alternatively, do you think it would
make sense for something like this to be implemented as a filter or
observer of the fast-import stream as it is generated during an
import?
> A basic heuristic of matching trees is used to find parents for the
> first commit in a branch and for tags.
More precisely, the rule used is:
> + # Find a common path prefix in the changes for the revision
> + subroot = ""
> + changes = Popen(["git","diff","--name-only",parent,git_commit], stdout=PIPE)
> + for path in changes.stdout:
> + match = subroot_re.match(path)
> + if match:
> + subroot = match.group()
> + changes.terminate()
> + break
The first change lying in one of
trunk
branch/*
tags/*
determines the branch. When a branch is renamed, this has a 50/50
chance of choosing the right branch.
> + # Choose a parent for the rewritten commit
> + if ref in ref_commit:
> + parent = ref_commit[ref]
> + elif subtree in tree_commit:
> + parent = tree_commit[subtree]
> + else:
> + parent = ""
If this is a live branch, the parent is the last commit from that
branch. Otherwise, we take the last commit whose resulting tree
looked like this one. Or...
> + # Default to trunk if the branch is new
> + if parent == "" and "refs/heads/trunk" in ref_commit:
> + parent = ref_commit["refs/heads/trunk"]
... if all else fails, we take the tip commit on the trunk.
For comparison, here's the git-svn rule:
> # look for a parent from another branch:
> my @b_path_components = split m#/#, $self->{path};
Among the paths above this commit's base directory [if this is
branches/foo, examine first branches/foo, then branches, then /]:
> while (@b_path_components) {
> $i = $paths->{'/'.join('/', @b_path_components)};
> last if $i && defined $i->{copyfrom_path};
> unshift(@a_path_components, pop(@b_path_components));
> }
> return undef unless defined $i && defined $i->{copyfrom_path};
Find the first one with copyfrom information (i.e., that was
renamed or copied from another rev in this revision).
> my $branch_from = $i->{copyfrom_path};
> if (@a_path_components) {
> print STDERR "branch_from: $branch_from => ";
> $branch_from .= '/'.join('/', @a_path_components);
> print STDERR $branch_from, "\n";
> }
Build back up the URL (so if branches was renamed to Branches but
branches/foo had no copyfrom information, we look for Branches/foo).
[...]
> my $gs = $self->other_gs($new_url, $url,
> $branch_from, $r, $self->{ref_id});
> my ($r0, $parent) = $gs->find_rev_before($r, 1);
Find the last revision that changed that path and record it.
Maybe we could benefit from including the copyfrom information in the
fast-import stream output by svn-fe somehow? The simplest way to do
this would be some specially formatted comments. An alternative (in
the spirit of Sam's earlier suggestions) might be to represent it in
the tree svn-fe creates, for example by introducing dummy
foo.copiedfrom
symlinks.
Thanks, that was interesting.
Jonathan
next prev parent reply other threads:[~2010-11-21 5:18 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-10-07 6:06 [PATCH] contrib/svn-fe: Fast script to remap svn history David Barr
2010-10-07 6:29 ` Sverre Rabbelier
2010-10-07 7:17 ` David Michael Barr
2010-10-07 8:28 ` Jonathan Nieder
2010-11-21 5:17 ` Jonathan Nieder [this message]
2010-11-22 14:01 ` Stephen Bash
2010-11-22 17:42 ` Jonathan Nieder
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101121051734.GA11856@burratino \
--to=jrnieder@gmail.com \
--cc=artagnon@gmail.com \
--cc=david.barr@cordelta.com \
--cc=git@vger.kernel.org \
--cc=normalperson@yhbt.net \
--cc=srabbelier@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).