* [PATCH 1/2] git-svn: only look at the new parts of svn:mergeinfo
@ 2014-04-17 6:54 Jakob Stoklund Olesen
2014-04-17 6:54 ` [PATCH 2/2] git-svn: only look at the root path for svn:mergeinfo Jakob Stoklund Olesen
2014-04-22 18:47 ` [PATCH 1/2] git-svn: only look at the new parts of svn:mergeinfo Eric Wong
0 siblings, 2 replies; 5+ messages in thread
From: Jakob Stoklund Olesen @ 2014-04-17 6:54 UTC (permalink / raw)
To: git
Cc: Eric Wong, Sam Vilain, Steven Walter, Peter Baumann,
Andrew Myrick, Jakob Stoklund Olesen
In a Subversion repository where many feature branches are merged into a
trunk, the svn:mergeinfo property can grow very large. This severely
slows down git-svn's make_log_entry() because it is checking all
mergeinfo entries every time the property changes.
In most cases, the additions to svn:mergeinfo since the last commit are
pretty small, and there is nothing to gain by checking merges that were
already checked for the last commit in the branch.
Add a mergeinfo_changes() function which computes the set of interesting
changes to svn:mergeinfo since the last commit. Filter out merged
branches whose ranges haven't changed, and remove a common prefix of
ranges from other merged branches.
This speeds up "git svn fetch" by several orders of magnitude on a large
repository where thousands of feature branches have been merged.
Signed-off-by: Jakob Stoklund Olesen <stoklund@2pi.dk>
---
perl/Git/SVN.pm | 84 ++++++++++++++++++++++++++++++++++++++++++++++++---------
1 file changed, 72 insertions(+), 12 deletions(-)
diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm
index a59564f..d3785ab 100644
--- a/perl/Git/SVN.pm
+++ b/perl/Git/SVN.pm
@@ -1178,7 +1178,7 @@ sub find_parent_branch {
or die "SVN connection failed somewhere...\n";
}
print STDERR "Successfully followed parent\n" unless $::_q > 1;
- return $self->make_log_entry($rev, [$parent], $ed);
+ return $self->make_log_entry($rev, [$parent], $ed, $r0, $branch_from);
}
return undef;
}
@@ -1210,7 +1210,7 @@ sub do_fetch {
unless ($self->ra->gs_do_update($last_rev, $rev, $self, $ed)) {
die "SVN connection failed somewhere...\n";
}
- $self->make_log_entry($rev, \@parents, $ed);
+ $self->make_log_entry($rev, \@parents, $ed, $last_rev);
}
sub mkemptydirs {
@@ -1478,9 +1478,9 @@ sub find_extra_svk_parents {
sub lookup_svn_merge {
my $uuid = shift;
my $url = shift;
- my $merge = shift;
+ my $source = shift;
+ my $revs = shift;
- my ($source, $revs) = split ":", $merge;
my $path = $source;
$path =~ s{^/}{};
my $gs = Git::SVN->find_by_url($url.$source, $url, $path);
@@ -1702,6 +1702,62 @@ sub parents_exclude {
return @excluded;
}
+# Compute what's new in svn:mergeinfo.
+sub mergeinfo_changes {
+ my ($self, $old_path, $old_rev, $path, $rev, $mergeinfo_prop) = @_;
+ my %minfo = map {split ":", $_ } split "\n", $mergeinfo_prop;
+ my $old_minfo = {};
+
+ # Initialize cache on the first call.
+ unless (defined $self->{cached_mergeinfo_rev}) {
+ $self->{cached_mergeinfo_rev} = {};
+ $self->{cached_mergeinfo} = {};
+ }
+
+ my $cached_rev = $self->{cached_mergeinfo_rev}{$old_path};
+ if (defined $cached_rev && $cached_rev == $old_rev) {
+ $old_minfo = $self->{cached_mergeinfo}{$old_path};
+ } else {
+ my $ra = $self->ra;
+ # Give up if $old_path isn't in the repo.
+ # This is probably a merge on a subtree.
+ if ($ra->check_path($old_path, $old_rev) != $SVN::Node::dir) {
+ warn "W: ignoring svn:mergeinfo on $old_path, ",
+ "directory didn't exist in r$old_rev\n";
+ return {};
+ }
+ my (undef, undef, $props) =
+ $self->ra->get_dir($old_path, $old_rev);
+ if (defined $props->{"svn:mergeinfo"}) {
+ my %omi = map {split ":", $_ } split "\n",
+ $props->{"svn:mergeinfo"};
+ $old_minfo = \%omi;
+ }
+ $self->{cached_mergeinfo}{$old_path} = $old_minfo;
+ $self->{cached_mergeinfo_rev}{$old_path} = $old_rev;
+ }
+
+ # Cache the new mergeinfo.
+ $self->{cached_mergeinfo}{$path} = \%minfo;
+ $self->{cached_mergeinfo_rev}{$path} = $rev;
+
+ my %changes = ();
+ foreach my $p (keys %minfo) {
+ my $a = $old_minfo->{$p} || "";
+ my $b = $minfo{$p};
+ # Omit merged branches whose ranges lists are unchanged.
+ next if $a eq $b;
+ # Remove any common range list prefix.
+ ($a ^ $b) =~ /^[\0]*/;
+ my $common_prefix = rindex $b, ",", $+[0] - 1;
+ $changes{$p} = substr $b, $common_prefix + 1;
+ }
+ print STDERR "Checking svn:mergeinfo changes since r$old_rev: ",
+ scalar(keys %minfo), " sources, ",
+ scalar(keys %changes), " changed\n";
+
+ return \%changes;
+}
# note: this function should only be called if the various dirprops
# have actually changed
@@ -1715,14 +1771,15 @@ sub find_extra_svn_parents {
# history. Then, we figure out which git revisions are in
# that tip, but not this revision. If all of those revisions
# are now marked as merge, we can add the tip as a parent.
- my @merges = split "\n", $mergeinfo;
+ my @merges = sort keys %$mergeinfo;
my @merge_tips;
my $url = $self->url;
my $uuid = $self->ra_uuid;
my @all_ranges;
for my $merge ( @merges ) {
my ($tip_commit, @ranges) =
- lookup_svn_merge( $uuid, $url, $merge );
+ lookup_svn_merge( $uuid, $url,
+ $merge, $mergeinfo->{$merge} );
unless (!$tip_commit or
grep { $_ eq $tip_commit } @$parents ) {
push @merge_tips, $tip_commit;
@@ -1738,8 +1795,9 @@ sub find_extra_svn_parents {
# check merge tips for new parents
my @new_parents;
for my $merge_tip ( @merge_tips ) {
- my $spec = shift @merges;
+ my $merge = shift @merges;
next unless $merge_tip and $excluded{$merge_tip};
+ my $spec = "$merge:$mergeinfo->{$merge}";
# check out 'new' tips
my $merge_base;
@@ -1770,7 +1828,7 @@ sub find_extra_svn_parents {
.@incomplete." commit(s) (eg $incomplete[0])\n";
} else {
warn
- "Found merge parent (svn:mergeinfo prop): ",
+ "Found merge parent ($spec): ",
$merge_tip, "\n";
push @new_parents, $merge_tip;
}
@@ -1797,7 +1855,7 @@ sub find_extra_svn_parents {
}
sub make_log_entry {
- my ($self, $rev, $parents, $ed) = @_;
+ my ($self, $rev, $parents, $ed, $parent_rev, $parent_path) = @_;
my $untracked = $self->get_untracked($ed);
my @parents = @$parents;
@@ -1809,10 +1867,12 @@ sub make_log_entry {
($ed, $props->{"svk:merge"}, \@parents);
}
if ( $props->{"svn:mergeinfo"} ) {
+ my $mi_changes = $self->mergeinfo_changes
+ ($parent_path || $path, $parent_rev,
+ $path, $rev,
+ $props->{"svn:mergeinfo"});
$self->find_extra_svn_parents
- ($ed,
- $props->{"svn:mergeinfo"},
- \@parents);
+ ($ed, $mi_changes, \@parents);
}
}
--
1.8.5.2 (Apple Git-48)
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/2] git-svn: only look at the root path for svn:mergeinfo
2014-04-17 6:54 [PATCH 1/2] git-svn: only look at the new parts of svn:mergeinfo Jakob Stoklund Olesen
@ 2014-04-17 6:54 ` Jakob Stoklund Olesen
2014-04-22 18:54 ` Eric Wong
2014-04-22 18:47 ` [PATCH 1/2] git-svn: only look at the new parts of svn:mergeinfo Eric Wong
1 sibling, 1 reply; 5+ messages in thread
From: Jakob Stoklund Olesen @ 2014-04-17 6:54 UTC (permalink / raw)
To: git
Cc: Eric Wong, Sam Vilain, Steven Walter, Peter Baumann,
Andrew Myrick, Jakob Stoklund Olesen
Subversion can put mergeinfo on any sub-directory to track cherry-picks.
Since cherry-picks are not represented explicitly in git, git-svn should
just ignore it.
Signed-off-by: Jakob Stoklund Olesen <stoklund@2pi.dk>
---
perl/Git/SVN.pm | 29 +++++++++++++----------------
1 file changed, 13 insertions(+), 16 deletions(-)
diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm
index d3785ab..0aa4dd3 100644
--- a/perl/Git/SVN.pm
+++ b/perl/Git/SVN.pm
@@ -1210,7 +1210,7 @@ sub do_fetch {
unless ($self->ra->gs_do_update($last_rev, $rev, $self, $ed)) {
die "SVN connection failed somewhere...\n";
}
- $self->make_log_entry($rev, \@parents, $ed, $last_rev);
+ $self->make_log_entry($rev, \@parents, $ed, $last_rev, $self->path);
}
sub mkemptydirs {
@@ -1859,21 +1859,18 @@ sub make_log_entry {
my $untracked = $self->get_untracked($ed);
my @parents = @$parents;
- my $ps = $ed->{path_strip} || "";
- for my $path ( grep { m/$ps/ } %{$ed->{dir_prop}} ) {
- my $props = $ed->{dir_prop}{$path};
- if ( $props->{"svk:merge"} ) {
- $self->find_extra_svk_parents
- ($ed, $props->{"svk:merge"}, \@parents);
- }
- if ( $props->{"svn:mergeinfo"} ) {
- my $mi_changes = $self->mergeinfo_changes
- ($parent_path || $path, $parent_rev,
- $path, $rev,
- $props->{"svn:mergeinfo"});
- $self->find_extra_svn_parents
- ($ed, $mi_changes, \@parents);
- }
+ my $props = $ed->{dir_prop}{$self->path};
+ if ( $props->{"svk:merge"} ) {
+ $self->find_extra_svk_parents
+ ($ed, $props->{"svk:merge"}, \@parents);
+ }
+ if ( $props->{"svn:mergeinfo"} ) {
+ my $mi_changes = $self->mergeinfo_changes
+ ($parent_path, $parent_rev,
+ $self->path, $rev,
+ $props->{"svn:mergeinfo"});
+ $self->find_extra_svn_parents
+ ($ed, $mi_changes, \@parents);
}
open my $un, '>>', "$self->{dir}/unhandled.log" or croak $!;
--
1.8.5.2 (Apple Git-48)
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] git-svn: only look at the new parts of svn:mergeinfo
2014-04-17 6:54 [PATCH 1/2] git-svn: only look at the new parts of svn:mergeinfo Jakob Stoklund Olesen
2014-04-17 6:54 ` [PATCH 2/2] git-svn: only look at the root path for svn:mergeinfo Jakob Stoklund Olesen
@ 2014-04-22 18:47 ` Eric Wong
1 sibling, 0 replies; 5+ messages in thread
From: Eric Wong @ 2014-04-22 18:47 UTC (permalink / raw)
To: Jakob Stoklund Olesen
Cc: git, Sam Vilain, Steven Walter, Peter Baumann, Andrew Myrick
Thanks! I still haven't gotten around to looking at svn:mergeinfo
things, but this passes tests so I'm inclined to merge this unless
somebody disagrees.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 2/2] git-svn: only look at the root path for svn:mergeinfo
2014-04-17 6:54 ` [PATCH 2/2] git-svn: only look at the root path for svn:mergeinfo Jakob Stoklund Olesen
@ 2014-04-22 18:54 ` Eric Wong
2014-04-27 19:00 ` Jakob Stoklund Olesen
0 siblings, 1 reply; 5+ messages in thread
From: Eric Wong @ 2014-04-22 18:54 UTC (permalink / raw)
To: Jakob Stoklund Olesen
Cc: git, Sam Vilain, Steven Walter, Peter Baumann, Andrew Myrick
Jakob Stoklund Olesen <stoklund@2pi.dk> wrote:
> Subversion can put mergeinfo on any sub-directory to track cherry-picks.
> Since cherry-picks are not represented explicitly in git, git-svn should
> just ignore it.
Hi, was git-svn trying to track cherry-picks as merge before?
This changes behavior a bit, so two independent users of git-svn
may not have identical histories as a result, correct?
Can you add a test to ensure this behavior is preserved?
Thanks.
Sorry, I've never looked at mergeinfo myself, mainly relying on
Sam + tests for this.
[1] - Historically, git-svn (using defaults) has always tried to
preserve identical histories for independent users across
different git-svn versions. However, mergeinfo may be
enough of a corner-case where we can make an exception.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 2/2] git-svn: only look at the root path for svn:mergeinfo
2014-04-22 18:54 ` Eric Wong
@ 2014-04-27 19:00 ` Jakob Stoklund Olesen
0 siblings, 0 replies; 5+ messages in thread
From: Jakob Stoklund Olesen @ 2014-04-27 19:00 UTC (permalink / raw)
To: Eric Wong; +Cc: git, Sam Vilain, Steven Walter, Peter Baumann, Andrew Myrick
On Apr 22, 2014, at 11:54 AM, Eric Wong <normalperson@yhbt.net> wrote:
> Jakob Stoklund Olesen <stoklund@2pi.dk> wrote:
>> Subversion can put mergeinfo on any sub-directory to track cherry-picks.
>> Since cherry-picks are not represented explicitly in git, git-svn should
>> just ignore it.
>
> Hi, was git-svn trying to track cherry-picks as merge before?
It would try and fail. I didn't explain that properly in the commit message.
Suppose I have a standard svn layout with $url/trunk and $url/branches/topic1. My topic1 branch has a change in subdir1 that I want to cherry-pick into trunk:
% svn switch $url/trunk
% cd subdir1
% svn merge $url/branches/topic1/subdir1
% cd ..
% svn commit
This operation will set svn:mergeinfo on $url/trunk/subdir1 where a normal full merge would set it on $url/trunk:
% svn pg svn:mergeinfo subdir1
/branches/topic1/subdir1:3-4
When git-svn fetches these changes, it currently does examine the svn:mergeinfo change on the subdirectory as if it were a full merge. It then fails to find a revmap for /branches/topic1/subdir1:
Couldn't find revmap for file:///tmp/sdb/branches/topic1/subdir1
r5 = 5ce1f687c30495deca40730fb7be3baa0e145479 (refs/remotes/trunk)
It is looking for refs/remotes/topic1/subdir1, but we only have the refs/remotes/topic1 branch in git.
This patch makes git-svn stop trying to reconstruct those subdirectory merges that we know will fail anyway.
> This changes behavior a bit, so two independent users of git-svn
> may not have identical histories as a result, correct?
For normal subdirectory cherry-picks as described above, the behavior doesn't change. This is just a performance optimization.
For weirder cases where a whole branch has been merged onto a subdirectory of trunk, behavior does change. Currently, git-svn will mark that as a full merge in git. With this change it won't.
> Can you add a test to ensure this behavior is preserved?
> Thanks.
I'll add a test for the subdirectory merge described above.
> Sorry, I've never looked at mergeinfo myself, mainly relying on
> Sam + tests for this.
>
> [1] - Historically, git-svn (using defaults) has always tried to
> preserve identical histories for independent users across
> different git-svn versions. However, mergeinfo may be
> enough of a corner-case where we can make an exception.
I agree. It doesn't seem worthwhile to try to preserve git-svn's historical behavior in weird corner cases.
BTW, this performance optimization matters not because of sporadic manual cherry-picks, but because certain older svn releases would replicate svn:mergeinfo on every subdirectory in a standard merge. With hundreds of subdirectories and thousands of merged branches, git-svn gets completely stuck processing all those mergeinfo lines.
Thanks,
/jakob
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-04-27 19:00 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-04-17 6:54 [PATCH 1/2] git-svn: only look at the new parts of svn:mergeinfo Jakob Stoklund Olesen
2014-04-17 6:54 ` [PATCH 2/2] git-svn: only look at the root path for svn:mergeinfo Jakob Stoklund Olesen
2014-04-22 18:54 ` Eric Wong
2014-04-27 19:00 ` Jakob Stoklund Olesen
2014-04-22 18:47 ` [PATCH 1/2] git-svn: only look at the new parts of svn:mergeinfo Eric Wong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).