git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] improve git svn performance
@ 2014-01-22  8:08 manjian2006
  2014-01-22 20:25 ` Eric Wong
  0 siblings, 1 reply; 2+ messages in thread
From: manjian2006 @ 2014-01-22  8:08 UTC (permalink / raw)
  To: git, normalperson; +Cc: manjian2006

From: manjian2006 <manjian2006@gmail.com>


* perl/Git/SVN.pm
  Modified according to Eric Wong <normalperson@yhbt.net>

>Hi, I'm interested in this.  How much did performance improve by
>(and how many revisions is the repository)>
Our svn server are built in a LAN,15152 revisions.Not optimized git-svn used 10 hours or more to accomplish,
while optimized one using only 3-4 hours.


According to some profiling data,_rev_list subroutine and rebuild subroutine are consuming a large proportion of time.
So I improve _rev_list's performance by memoize its results,and avoid subprocess invocation by memoize rebuild subroutine's key data.

Signed-off-by: manjian2006 <manjian2006@gmail.com>
---
 perl/Git/SVN.pm | 41 ++++++++++++++++++++++++++++++++++++++---
 1 file changed, 38 insertions(+), 3 deletions(-)

diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm
index 5273ee8..dc7942b 100644
--- a/perl/Git/SVN.pm
+++ b/perl/Git/SVN.pm
@@ -1599,6 +1599,7 @@ sub tie_for_persistent_memoization {
 		my %lookup_svn_merge_cache;
 		my %check_cherry_pick_cache;
 		my %has_no_changes_cache;
+		my %_rev_list_cache;
 
 		tie_for_persistent_memoization(\%lookup_svn_merge_cache,
 		    "$cache_path/lookup_svn_merge");
@@ -1620,6 +1621,14 @@ sub tie_for_persistent_memoization {
 			SCALAR_CACHE => ['HASH' => \%has_no_changes_cache],
 			LIST_CACHE => 'FAULT',
 		;
+
+		tie_for_persistent_memoization(\%_rev_list_cache,
+		    "$cache_path/_rev_list");
+		memoize '_rev_list',
+			SCALAR_CACHE => 'FAULT',
+			LIST_CACHE => ['HASH' => \%_rev_list_cache],
+		;
+
 	}
 
 	sub unmemoize_svn_mergeinfo_functions {
@@ -1629,6 +1638,7 @@ sub tie_for_persistent_memoization {
 		Memoize::unmemoize 'lookup_svn_merge';
 		Memoize::unmemoize 'check_cherry_pick';
 		Memoize::unmemoize 'has_no_changes';
+		Memoize::unmemoize '_rev_list';
 	}
 
 	sub clear_memoized_mergeinfo_caches {
@@ -1959,11 +1969,25 @@ sub rebuild_from_rev_db {
 	unlink $path or croak "unlink: $!";
 }
 
+#define a global associate map to record rebuild status
+my %rebuild_status;
+#define a global associate map to record rebuild verify status
+my %rebuild_verify_status;
+
 sub rebuild {
 	my ($self) = @_;
 	my $map_path = $self->map_path;
 	my $partial = (-e $map_path && ! -z $map_path);
-	return unless ::verify_ref($self->refname.'^0');
+	my $verify_key = $self->refname.'^0';
+	if (! exists $rebuild_verify_status{$verify_key} || ! defined $rebuild_verify_status{$verify_key} ) {
+		my $verify_result = ::verify_ref($verify_key);
+		if ($verify_result) {
+			$rebuild_verify_status{$verify_key} = 1;
+		}
+	}
+	if (! exists $rebuild_verify_status{$verify_key}) {
+		return;
+	}
 	if (!$partial && ($self->use_svm_props || $self->no_metadata)) {
 		my $rev_db = $self->rev_db_path;
 		$self->rebuild_from_rev_db($rev_db);
@@ -1977,10 +2001,21 @@ sub rebuild {
 	print "Rebuilding $map_path ...\n" if (!$partial);
 	my ($base_rev, $head) = ($partial ? $self->rev_map_max_norebuild(1) :
 		(undef, undef));
+	my $key_value = ($head ? "$head.." : "") . $self->refname;
+	if (exists $rebuild_status{$key_value}) {
+		print "Done rebuilding $map_path\n" if (!$partial || !$head);
+		my $rev_db_path = $self->rev_db_path;
+		if (-f $self->rev_db_path) {
+			unlink $self->rev_db_path or croak "unlink: $!";
+		}
+		$self->unlink_rev_db_symlink;
+		return;
+	}
 	my ($log, $ctx) =
-	    command_output_pipe(qw/rev-list --pretty=raw --reverse/,
-				($head ? "$head.." : "") . $self->refname,
+		command_output_pipe(qw/rev-list --pretty=raw --reverse/,
+				$key_value,	
 				'--');
+	$rebuild_status{$key_value} = 1;
 	my $metadata_url = $self->metadata_url;
 	remove_username($metadata_url);
 	my $svn_uuid = $self->rewrite_uuid || $self->ra_uuid;
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH v2] improve git svn performance
  2014-01-22  8:08 [PATCH v2] improve git svn performance manjian2006
@ 2014-01-22 20:25 ` Eric Wong
  0 siblings, 0 replies; 2+ messages in thread
From: Eric Wong @ 2014-01-22 20:25 UTC (permalink / raw)
  To: manjian2006; +Cc: git, Junio C Hamano

manjian2006@gmail.com wrote:
> * perl/Git/SVN.pm
>   Modified according to Eric Wong <normalperson@yhbt.net>
> 
> >Hi, I'm interested in this.  How much did performance improve by
> >(and how many revisions is the repository)>

> Our svn server are built in a LAN,15152 revisions.Not optimized
> git-svn used 10 hours or more to accomplish, while optimized one using
> only 3-4 hours.
> 
> According to some profiling data,_rev_list subroutine and rebuild
> subroutine are consuming a large proportion of time.  So I improve
> _rev_list's performance by memoize its results,and avoid subprocess
> invocation by memoize rebuild subroutine's key data.

Impressive!  Thanks for that info.

> Signed-off-by: manjian2006 <manjian2006@gmail.com>

Real name is preferred by this project, I think.

A proper patch would start something like this:
-------------------------------8<------------------------------------
From: Your Name <manjian2006@gmail.com>
Subject: git-svn: memoize _rev_list and rebuild

According to profile data, _rev_list and rebuild consume a large
portion of time.  Memoize the results of _rev_list and memoize
rebuild internals to avoid subprocess invocation.

When importing 15152 revisions on a LAN, time improved from 10
hours to 3-4 hours.

Signed-off-by: Your Name <manjian2006@gmail.com>
---------------------- a few more comments below -------------------

>  sub rebuild {
>  	my ($self) = @_;
>  	my $map_path = $self->map_path;
>  	my $partial = (-e $map_path && ! -z $map_path);
> -	return unless ::verify_ref($self->refname.'^0');
> +	my $verify_key = $self->refname.'^0';
> +	if (! exists $rebuild_verify_status{$verify_key} || ! defined $rebuild_verify_status{$verify_key} ) {

80 column wrap, please.

However, I think just having a single
"!$rebuild_verify_status{$verify_key}" check is enough, no need for
extra defined/exists checks for %rebuild_verify_status nor %rebuild_status.
Neither of them load untrusted data.

> -	    command_output_pipe(qw/rev-list --pretty=raw --reverse/,
> -				($head ? "$head.." : "") . $self->refname,
> +		command_output_pipe(qw/rev-list --pretty=raw --reverse/,
> +				$key_value,	

Please do not leave trailing whitespace.  Thanks.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2014-01-22 20:25 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-22  8:08 [PATCH v2] improve git svn performance manjian2006
2014-01-22 20:25 ` Eric Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).