git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jonathan Nieder <jrnieder@gmail.com>
To: David Barr <david.barr@cordelta.com>
Cc: Git Mailing List <git@vger.kernel.org>,
	Sverre Rabbelier <srabbelier@gmail.com>,
	Ramkumar Ramachandra <artagnon@gmail.com>,
	Eric Wong <normalperson@yhbt.net>
Subject: Re: [PATCH] contrib/svn-fe: Fast script to remap svn history
Date: Sat, 20 Nov 2010 23:17:34 -0600	[thread overview]
Message-ID: <20101121051734.GA11856@burratino> (raw)
In-Reply-To: <1286431561-24126-1-git-send-email-david.barr@cordelta.com>

Hi David,

David Barr wrote:

> This python script walks the commit sequence imported by svn-fe.
> For each commit, it tries to identify the branch that was changed.
> Commits are rewritten to be rooted according to the standard layout.

I like the idea and especially that the heuristics are simple.

Maybe this could be made git-agnostic using the new ls-tree command
you are introducing in fast-import?  Though it would need to get a
revision list from somewhere.  Alternatively, do you think it would
make sense for something like this to be implemented as a filter or
observer of the fast-import stream as it is generated during an
import?

> A basic heuristic of matching trees is used to find parents for the
> first commit in a branch and for tags.

More precisely, the rule used is:

> +    # Find a common path prefix in the changes for the revision
> +    subroot = ""
> +    changes = Popen(["git","diff","--name-only",parent,git_commit], stdout=PIPE)
> +    for path in changes.stdout:
> +        match = subroot_re.match(path)
> +        if match:
> +            subroot = match.group()
> +            changes.terminate()
> +            break

The first change lying in one of

	trunk
	branch/*
	tags/*

determines the branch.  When a branch is renamed, this has a 50/50
chance of choosing the right branch.

> +        # Choose a parent for the rewritten commit
> +        if ref in ref_commit:
> +            parent = ref_commit[ref]
> +        elif subtree in tree_commit:
> +            parent = tree_commit[subtree]
> +        else:
> +            parent = ""

If this is a live branch, the parent is the last commit from that
branch.  Otherwise, we take the last commit whose resulting tree
looked like this one.  Or...

> +            # Default to trunk if the branch is new
> +            if parent == "" and "refs/heads/trunk" in ref_commit:
> +                parent = ref_commit["refs/heads/trunk"]

... if all else fails, we take the tip commit on the trunk.

For comparison, here's the git-svn rule:

> 	# look for a parent from another branch:
> 	my @b_path_components = split m#/#, $self->{path};

Among the paths above this commit's base directory [if this is
branches/foo, examine first branches/foo, then branches, then /]:

> 	while (@b_path_components) {
> 		$i = $paths->{'/'.join('/', @b_path_components)};
> 		last if $i && defined $i->{copyfrom_path};
> 		unshift(@a_path_components, pop(@b_path_components));
> 	}
> 	return undef unless defined $i && defined $i->{copyfrom_path};

Find the first one with copyfrom information (i.e., that was
renamed or copied from another rev in this revision).

> 	my $branch_from = $i->{copyfrom_path};
> 	if (@a_path_components) {
> 		print STDERR "branch_from: $branch_from => ";
> 		$branch_from .= '/'.join('/', @a_path_components);
> 		print STDERR $branch_from, "\n";
> 	}

Build back up the URL (so if branches was renamed to Branches but
branches/foo had no copyfrom information, we look for Branches/foo).

[...]
> 	my $gs = $self->other_gs($new_url, $url,
> 		                 $branch_from, $r, $self->{ref_id});
> 	my ($r0, $parent) = $gs->find_rev_before($r, 1);

Find the last revision that changed that path and record it.

Maybe we could benefit from including the copyfrom information in the
fast-import stream output by svn-fe somehow?  The simplest way to do
this would be some specially formatted comments.  An alternative (in
the spirit of Sam's earlier suggestions) might be to represent it in
the tree svn-fe creates, for example by introducing dummy

	foo.copiedfrom

symlinks.

Thanks, that was interesting.
Jonathan

  parent reply	other threads:[~2010-11-21  5:18 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-07  6:06 [PATCH] contrib/svn-fe: Fast script to remap svn history David Barr
2010-10-07  6:29 ` Sverre Rabbelier
2010-10-07  7:17   ` David Michael Barr
2010-10-07  8:28   ` Jonathan Nieder
2010-11-21  5:17 ` Jonathan Nieder [this message]
2010-11-22 14:01   ` Stephen Bash
2010-11-22 17:42     ` Jonathan Nieder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101121051734.GA11856@burratino \
    --to=jrnieder@gmail.com \
    --cc=artagnon@gmail.com \
    --cc=david.barr@cordelta.com \
    --cc=git@vger.kernel.org \
    --cc=normalperson@yhbt.net \
    --cc=srabbelier@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).