git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] git-svn: only look at the new parts of svn:mergeinfo
@ 2014-04-17  6:54 Jakob Stoklund Olesen
  2014-04-17  6:54 ` [PATCH 2/2] git-svn: only look at the root path for svn:mergeinfo Jakob Stoklund Olesen
  2014-04-22 18:47 ` [PATCH 1/2] git-svn: only look at the new parts of svn:mergeinfo Eric Wong
  0 siblings, 2 replies; 5+ messages in thread
From: Jakob Stoklund Olesen @ 2014-04-17  6:54 UTC (permalink / raw)
  To: git
  Cc: Eric Wong, Sam Vilain, Steven Walter, Peter Baumann,
	Andrew Myrick, Jakob Stoklund Olesen

In a Subversion repository where many feature branches are merged into a
trunk, the svn:mergeinfo property can grow very large. This severely
slows down git-svn's make_log_entry() because it is checking all
mergeinfo entries every time the property changes.

In most cases, the additions to svn:mergeinfo since the last commit are
pretty small, and there is nothing to gain by checking merges that were
already checked for the last commit in the branch.

Add a mergeinfo_changes() function which computes the set of interesting
changes to svn:mergeinfo since the last commit. Filter out merged
branches whose ranges haven't changed, and remove a common prefix of
ranges from other merged branches.

This speeds up "git svn fetch" by several orders of magnitude on a large
repository where thousands of feature branches have been merged.

Signed-off-by: Jakob Stoklund Olesen <stoklund@2pi.dk>
---
 perl/Git/SVN.pm | 84 ++++++++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 72 insertions(+), 12 deletions(-)

diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm
index a59564f..d3785ab 100644
--- a/perl/Git/SVN.pm
+++ b/perl/Git/SVN.pm
@@ -1178,7 +1178,7 @@ sub find_parent_branch {
 			  or die "SVN connection failed somewhere...\n";
 		}
 		print STDERR "Successfully followed parent\n" unless $::_q > 1;
-		return $self->make_log_entry($rev, [$parent], $ed);
+		return $self->make_log_entry($rev, [$parent], $ed, $r0, $branch_from);
 	}
 	return undef;
 }
@@ -1210,7 +1210,7 @@ sub do_fetch {
 	unless ($self->ra->gs_do_update($last_rev, $rev, $self, $ed)) {
 		die "SVN connection failed somewhere...\n";
 	}
-	$self->make_log_entry($rev, \@parents, $ed);
+	$self->make_log_entry($rev, \@parents, $ed, $last_rev);
 }
 
 sub mkemptydirs {
@@ -1478,9 +1478,9 @@ sub find_extra_svk_parents {
 sub lookup_svn_merge {
 	my $uuid = shift;
 	my $url = shift;
-	my $merge = shift;
+	my $source = shift;
+	my $revs = shift;
 
-	my ($source, $revs) = split ":", $merge;
 	my $path = $source;
 	$path =~ s{^/}{};
 	my $gs = Git::SVN->find_by_url($url.$source, $url, $path);
@@ -1702,6 +1702,62 @@ sub parents_exclude {
 	return @excluded;
 }
 
+# Compute what's new in svn:mergeinfo.
+sub mergeinfo_changes {
+	my ($self, $old_path, $old_rev, $path, $rev, $mergeinfo_prop) = @_;
+	my %minfo = map {split ":", $_ } split "\n", $mergeinfo_prop;
+	my $old_minfo = {};
+
+	# Initialize cache on the first call.
+	unless (defined $self->{cached_mergeinfo_rev}) {
+		$self->{cached_mergeinfo_rev} = {};
+		$self->{cached_mergeinfo} = {};
+	}
+
+	my $cached_rev = $self->{cached_mergeinfo_rev}{$old_path};
+	if (defined $cached_rev && $cached_rev == $old_rev) {
+		$old_minfo = $self->{cached_mergeinfo}{$old_path};
+	} else {
+		my $ra = $self->ra;
+		# Give up if $old_path isn't in the repo.
+		# This is probably a merge on a subtree.
+		if ($ra->check_path($old_path, $old_rev) != $SVN::Node::dir) {
+			warn "W: ignoring svn:mergeinfo on $old_path, ",
+				"directory didn't exist in r$old_rev\n";
+			return {};
+		}
+		my (undef, undef, $props) =
+			$self->ra->get_dir($old_path, $old_rev);
+		if (defined $props->{"svn:mergeinfo"}) {
+			my %omi = map {split ":", $_ } split "\n",
+				$props->{"svn:mergeinfo"};
+			$old_minfo = \%omi;
+		}
+		$self->{cached_mergeinfo}{$old_path} = $old_minfo;
+		$self->{cached_mergeinfo_rev}{$old_path} = $old_rev;
+	}
+
+	# Cache the new mergeinfo.
+	$self->{cached_mergeinfo}{$path} = \%minfo;
+	$self->{cached_mergeinfo_rev}{$path} = $rev;
+
+	my %changes = ();
+	foreach my $p (keys %minfo) {
+		my $a = $old_minfo->{$p} || "";
+		my $b = $minfo{$p};
+		# Omit merged branches whose ranges lists are unchanged.
+		next if $a eq $b;
+		# Remove any common range list prefix.
+		($a ^ $b) =~ /^[\0]*/;
+		my $common_prefix = rindex $b, ",", $+[0] - 1;
+		$changes{$p} = substr $b, $common_prefix + 1;
+	}
+	print STDERR "Checking svn:mergeinfo changes since r$old_rev: ",
+		scalar(keys %minfo), " sources, ",
+		scalar(keys %changes), " changed\n";
+
+	return \%changes;
+}
 
 # note: this function should only be called if the various dirprops
 # have actually changed
@@ -1715,14 +1771,15 @@ sub find_extra_svn_parents {
 	# history.  Then, we figure out which git revisions are in
 	# that tip, but not this revision.  If all of those revisions
 	# are now marked as merge, we can add the tip as a parent.
-	my @merges = split "\n", $mergeinfo;
+	my @merges = sort keys %$mergeinfo;
 	my @merge_tips;
 	my $url = $self->url;
 	my $uuid = $self->ra_uuid;
 	my @all_ranges;
 	for my $merge ( @merges ) {
 		my ($tip_commit, @ranges) =
-			lookup_svn_merge( $uuid, $url, $merge );
+			lookup_svn_merge( $uuid, $url,
+					  $merge, $mergeinfo->{$merge} );
 		unless (!$tip_commit or
 				grep { $_ eq $tip_commit } @$parents ) {
 			push @merge_tips, $tip_commit;
@@ -1738,8 +1795,9 @@ sub find_extra_svn_parents {
 	# check merge tips for new parents
 	my @new_parents;
 	for my $merge_tip ( @merge_tips ) {
-		my $spec = shift @merges;
+		my $merge = shift @merges;
 		next unless $merge_tip and $excluded{$merge_tip};
+		my $spec = "$merge:$mergeinfo->{$merge}";
 
 		# check out 'new' tips
 		my $merge_base;
@@ -1770,7 +1828,7 @@ sub find_extra_svn_parents {
 				.@incomplete." commit(s) (eg $incomplete[0])\n";
 		} else {
 			warn
-				"Found merge parent (svn:mergeinfo prop): ",
+				"Found merge parent ($spec): ",
 					$merge_tip, "\n";
 			push @new_parents, $merge_tip;
 		}
@@ -1797,7 +1855,7 @@ sub find_extra_svn_parents {
 }
 
 sub make_log_entry {
-	my ($self, $rev, $parents, $ed) = @_;
+	my ($self, $rev, $parents, $ed, $parent_rev, $parent_path) = @_;
 	my $untracked = $self->get_untracked($ed);
 
 	my @parents = @$parents;
@@ -1809,10 +1867,12 @@ sub make_log_entry {
 				($ed, $props->{"svk:merge"}, \@parents);
 		}
 		if ( $props->{"svn:mergeinfo"} ) {
+			my $mi_changes = $self->mergeinfo_changes
+				($parent_path || $path, $parent_rev,
+				 $path, $rev,
+				 $props->{"svn:mergeinfo"});
 			$self->find_extra_svn_parents
-				($ed,
-				 $props->{"svn:mergeinfo"},
-				 \@parents);
+				($ed, $mi_changes, \@parents);
 		}
 	}
 
-- 
1.8.5.2 (Apple Git-48)

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] git-svn: only look at the root path for svn:mergeinfo
  2014-04-17  6:54 [PATCH 1/2] git-svn: only look at the new parts of svn:mergeinfo Jakob Stoklund Olesen
@ 2014-04-17  6:54 ` Jakob Stoklund Olesen
  2014-04-22 18:54   ` Eric Wong
  2014-04-22 18:47 ` [PATCH 1/2] git-svn: only look at the new parts of svn:mergeinfo Eric Wong
  1 sibling, 1 reply; 5+ messages in thread
From: Jakob Stoklund Olesen @ 2014-04-17  6:54 UTC (permalink / raw)
  To: git
  Cc: Eric Wong, Sam Vilain, Steven Walter, Peter Baumann,
	Andrew Myrick, Jakob Stoklund Olesen

Subversion can put mergeinfo on any sub-directory to track cherry-picks.
Since cherry-picks are not represented explicitly in git, git-svn should
just ignore it.

Signed-off-by: Jakob Stoklund Olesen <stoklund@2pi.dk>
---
 perl/Git/SVN.pm | 29 +++++++++++++----------------
 1 file changed, 13 insertions(+), 16 deletions(-)

diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm
index d3785ab..0aa4dd3 100644
--- a/perl/Git/SVN.pm
+++ b/perl/Git/SVN.pm
@@ -1210,7 +1210,7 @@ sub do_fetch {
 	unless ($self->ra->gs_do_update($last_rev, $rev, $self, $ed)) {
 		die "SVN connection failed somewhere...\n";
 	}
-	$self->make_log_entry($rev, \@parents, $ed, $last_rev);
+	$self->make_log_entry($rev, \@parents, $ed, $last_rev, $self->path);
 }
 
 sub mkemptydirs {
@@ -1859,21 +1859,18 @@ sub make_log_entry {
 	my $untracked = $self->get_untracked($ed);
 
 	my @parents = @$parents;
-	my $ps = $ed->{path_strip} || "";
-	for my $path ( grep { m/$ps/ } %{$ed->{dir_prop}} ) {
-		my $props = $ed->{dir_prop}{$path};
-		if ( $props->{"svk:merge"} ) {
-			$self->find_extra_svk_parents
-				($ed, $props->{"svk:merge"}, \@parents);
-		}
-		if ( $props->{"svn:mergeinfo"} ) {
-			my $mi_changes = $self->mergeinfo_changes
-				($parent_path || $path, $parent_rev,
-				 $path, $rev,
-				 $props->{"svn:mergeinfo"});
-			$self->find_extra_svn_parents
-				($ed, $mi_changes, \@parents);
-		}
+	my $props = $ed->{dir_prop}{$self->path};
+	if ( $props->{"svk:merge"} ) {
+		$self->find_extra_svk_parents
+			($ed, $props->{"svk:merge"}, \@parents);
+	}
+	if ( $props->{"svn:mergeinfo"} ) {
+		my $mi_changes = $self->mergeinfo_changes
+			($parent_path, $parent_rev,
+			 $self->path, $rev,
+			 $props->{"svn:mergeinfo"});
+		$self->find_extra_svn_parents
+			($ed, $mi_changes, \@parents);
 	}
 
 	open my $un, '>>', "$self->{dir}/unhandled.log" or croak $!;
-- 
1.8.5.2 (Apple Git-48)

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] git-svn: only look at the new parts of svn:mergeinfo
  2014-04-17  6:54 [PATCH 1/2] git-svn: only look at the new parts of svn:mergeinfo Jakob Stoklund Olesen
  2014-04-17  6:54 ` [PATCH 2/2] git-svn: only look at the root path for svn:mergeinfo Jakob Stoklund Olesen
@ 2014-04-22 18:47 ` Eric Wong
  1 sibling, 0 replies; 5+ messages in thread
From: Eric Wong @ 2014-04-22 18:47 UTC (permalink / raw)
  To: Jakob Stoklund Olesen
  Cc: git, Sam Vilain, Steven Walter, Peter Baumann, Andrew Myrick

Thanks!  I still haven't gotten around to looking at svn:mergeinfo
things, but this passes tests so I'm inclined to merge this unless
somebody disagrees.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] git-svn: only look at the root path for svn:mergeinfo
  2014-04-17  6:54 ` [PATCH 2/2] git-svn: only look at the root path for svn:mergeinfo Jakob Stoklund Olesen
@ 2014-04-22 18:54   ` Eric Wong
  2014-04-27 19:00     ` Jakob Stoklund Olesen
  0 siblings, 1 reply; 5+ messages in thread
From: Eric Wong @ 2014-04-22 18:54 UTC (permalink / raw)
  To: Jakob Stoklund Olesen
  Cc: git, Sam Vilain, Steven Walter, Peter Baumann, Andrew Myrick

Jakob Stoklund Olesen <stoklund@2pi.dk> wrote:
> Subversion can put mergeinfo on any sub-directory to track cherry-picks.
> Since cherry-picks are not represented explicitly in git, git-svn should
> just ignore it.

Hi, was git-svn trying to track cherry-picks as merge before?

This changes behavior a bit, so two independent users of git-svn
may not have identical histories as a result, correct?

Can you add a test to ensure this behavior is preserved?
Thanks.

Sorry, I've never looked at mergeinfo myself, mainly relying on
Sam + tests for this.

[1] - Historically, git-svn (using defaults) has always tried to
      preserve identical histories for independent users across
      different git-svn versions.  However, mergeinfo may be
      enough of a corner-case where we can make an exception.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] git-svn: only look at the root path for svn:mergeinfo
  2014-04-22 18:54   ` Eric Wong
@ 2014-04-27 19:00     ` Jakob Stoklund Olesen
  0 siblings, 0 replies; 5+ messages in thread
From: Jakob Stoklund Olesen @ 2014-04-27 19:00 UTC (permalink / raw)
  To: Eric Wong; +Cc: git, Sam Vilain, Steven Walter, Peter Baumann, Andrew Myrick


On Apr 22, 2014, at 11:54 AM, Eric Wong <normalperson@yhbt.net> wrote:

> Jakob Stoklund Olesen <stoklund@2pi.dk> wrote:
>> Subversion can put mergeinfo on any sub-directory to track cherry-picks.
>> Since cherry-picks are not represented explicitly in git, git-svn should
>> just ignore it.
> 
> Hi, was git-svn trying to track cherry-picks as merge before?

It would try and fail. I didn't explain that properly in the commit message.

Suppose I have a standard svn layout with $url/trunk and $url/branches/topic1. My topic1 branch has a change in subdir1 that I want to cherry-pick into trunk:

% svn switch $url/trunk
% cd subdir1
% svn merge $url/branches/topic1/subdir1
% cd ..
% svn commit

This operation will set svn:mergeinfo on $url/trunk/subdir1 where a normal full merge would set it on $url/trunk:

% svn pg svn:mergeinfo subdir1 
/branches/topic1/subdir1:3-4

When git-svn fetches these changes, it currently does examine the svn:mergeinfo change on the subdirectory as if it were a full merge. It then fails to find a revmap for /branches/topic1/subdir1:

Couldn't find revmap for file:///tmp/sdb/branches/topic1/subdir1
r5 = 5ce1f687c30495deca40730fb7be3baa0e145479 (refs/remotes/trunk)

It is looking for refs/remotes/topic1/subdir1, but we only have the refs/remotes/topic1 branch in git.

This patch makes git-svn stop trying to reconstruct those subdirectory merges that we know will fail anyway.

> This changes behavior a bit, so two independent users of git-svn
> may not have identical histories as a result, correct?

For normal subdirectory cherry-picks as described above, the behavior doesn't change. This is just a performance optimization.

For weirder cases where a whole branch has been merged onto a subdirectory of trunk, behavior does change. Currently, git-svn will mark that as a full merge in git. With this change it won't.

> Can you add a test to ensure this behavior is preserved?
> Thanks.

I'll add a test for the subdirectory merge described above.

> Sorry, I've never looked at mergeinfo myself, mainly relying on
> Sam + tests for this.
> 
> [1] - Historically, git-svn (using defaults) has always tried to
>      preserve identical histories for independent users across
>      different git-svn versions.  However, mergeinfo may be
>      enough of a corner-case where we can make an exception.


I agree. It doesn't seem worthwhile to try to preserve git-svn's historical behavior in weird corner cases.

BTW, this performance optimization matters not because of sporadic manual cherry-picks, but because certain older svn releases would replicate svn:mergeinfo on every subdirectory in a standard merge. With hundreds of subdirectories and thousands of merged branches, git-svn gets completely stuck processing all those mergeinfo lines.

Thanks,
/jakob

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-04-27 19:00 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-17  6:54 [PATCH 1/2] git-svn: only look at the new parts of svn:mergeinfo Jakob Stoklund Olesen
2014-04-17  6:54 ` [PATCH 2/2] git-svn: only look at the root path for svn:mergeinfo Jakob Stoklund Olesen
2014-04-22 18:54   ` Eric Wong
2014-04-27 19:00     ` Jakob Stoklund Olesen
2014-04-22 18:47 ` [PATCH 1/2] git-svn: only look at the new parts of svn:mergeinfo Eric Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).