git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* git-svn metadata commands performance issue
@ 2015-01-13 10:14 Niluge kiwi
  2015-01-14 20:30 ` Eric Wong
  0 siblings, 1 reply; 5+ messages in thread
From: Niluge kiwi @ 2015-01-13 10:14 UTC (permalink / raw)
  To: git

Hi all,

In magit (http://magit.github.io/), a popular git frontend within
emacs, there is a git-svn frontend.  With a recent refactoring, it was
discovered that git-svn metadata commands (like "git-svn info") are
much slower than git ones:
git svn info: 130-150ms (after warmup): get the svn revision and url.
git svn rebase --dry-run: 150-170ms (after warmup): get the remote
branch.

Whereas in pure git:
git rev-parse --abbrev-ref HEAD@{upstream}: 2-3ms (after warmup): get
the remote branch
Other git commands alike take all less than 10ms after warmup.

This is an issue for the magit developers and users: just getting a
git-svn status with some metadata easily take ~500ms, which is really
slow for a UI. The equivalent UI with a pure git repository in magit
takes much less than 100ms to generate although more than 30 git
process are forked for it.

A previous version of magit-svn was much faster because it
re-implemented the logic of git-svn from perl to elisp (the
programming language in emacs), and to get the 3 previously mentioned
values it took less than 10ms.


What could be done about this?
Could git-svn performance be dramatically improved?
Even git svn --version takes ~100ms, is perl the bottleneck?
Or should each git-svn frontend developer re-implement the git-svn
metadata commands themselves for better performance?
Also, wouldn't it be better for those frontend developers if there
were some git-svn porcelain commands like git has? Fast, easy to parse
and stable input & output format?

For reference, here is the discussion about the performance issue on
magit-svn: https://github.com/magit/magit-svn/issues/1

And I'm using git-svn version 2.2.1 (svn 1.6.17) on an ubuntu 12.04
64bits Intel machine with an HDD (no SSD).

Thanks,
Thomas Riccardi

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: git-svn metadata commands performance issue
  2015-01-13 10:14 git-svn metadata commands performance issue Niluge kiwi
@ 2015-01-14 20:30 ` Eric Wong
  2015-01-15 10:14   ` [PATCH] git-svn: lazy load some modules Eric Wong
                     ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Eric Wong @ 2015-01-14 20:30 UTC (permalink / raw)
  To: Niluge kiwi; +Cc: git

Niluge kiwi <kiwiiii@gmail.com> wrote:
> Hi all,
> 
> In magit (http://magit.github.io/), a popular git frontend within
> emacs, there is a git-svn frontend.  With a recent refactoring, it was
> discovered that git-svn metadata commands (like "git-svn info") are
> much slower than git ones:
> git svn info: 130-150ms (after warmup): get the svn revision and url.
> git svn rebase --dry-run: 150-170ms (after warmup): get the remote
> branch.
> 
> Whereas in pure git:
> git rev-parse --abbrev-ref HEAD@{upstream}: 2-3ms (after warmup): get
> the remote branch
> Other git commands alike take all less than 10ms after warmup.

Thanks for the bug report.  I actually see worse performance from
my old machines, but I'm not a very heavy git-svn user anymore.
100ms is an eternity :<

> This is an issue for the magit developers and users: just getting a
> git-svn status with some metadata easily take ~500ms, which is really
> slow for a UI. The equivalent UI with a pure git repository in magit
> takes much less than 100ms to generate although more than 30 git
> process are forked for it.

How big is the parent process which forks the git commands?  On Linux at
least, fork() performance is negatively impacted by parent process
memory size.  To avoid spawn performance problems with large parent
processes, vfork() should be used, but there does not seem to be an
easy/portable way to use vfork() from Perl.

> A previous version of magit-svn was much faster because it
> re-implemented the logic of git-svn from perl to elisp (the
> programming language in emacs), and to get the 3 previously mentioned
> values it took less than 10ms.

I've never worked with elisp, but we can probably figure out why it's
faster.  Can you give us a pointer to the old elisp code?

> What could be done about this?
> Could git-svn performance be dramatically improved?
> Even git svn --version takes ~100ms, is perl the bottleneck?

The Linux 'perf' tool reports much time is spent is from the Perl
parser.  So we may implement lazy loading, so simple commands such as
"git svn info" do not need to load and parse all the Perl code.

> Or should each git-svn frontend developer re-implement the git-svn
> metadata commands themselves for better performance?

I prefer git-svn be fast enough; but you're free to reimplement
and optimize your own code as you see fit.

> Also, wouldn't it be better for those frontend developers if there
> were some git-svn porcelain commands like git has? Fast, easy to parse
> and stable input & output format?

Sure, but we don't know what you'd need beyond the current command set.
Of course we need to be careful about adding even more code to git-svn
as that impacts startup time, too.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] git-svn: lazy load some modules
  2015-01-14 20:30 ` Eric Wong
@ 2015-01-15 10:14   ` Eric Wong
  2015-01-15 11:26   ` git-svn metadata commands performance issue Niluge kiwi
  2015-01-15 11:53   ` David Kastrup
  2 siblings, 0 replies; 5+ messages in thread
From: Eric Wong @ 2015-01-15 10:14 UTC (permalink / raw)
  To: Niluge kiwi; +Cc: git

Baby steps, pushed to my master on git://bogomips.org/git-svn

--------------------------8<----------------------------
Subject: [PATCH] git-svn: lazy load some modules

We can delay loading some modules until we need them for uncommon
code paths.  For example, persistent memoization is not often
needed, so we can avoid loading the modules for it until we
encounter svn::mergeinfo during fetch.

This gives a tiny reduction in syscalls (from 15641 to 15305) when
running "git svn info" and counting via "strace -fc".  Further,
more invasive work will be needed to noticeably improve performance.

Signed-off-by: Eric Wong <normalperson@yhbt.net>
---
 git-svn.perl            | 13 +++++++------
 perl/Git/SVN.pm         | 22 +++++++++++++---------
 perl/Git/SVN/Editor.pm  |  3 +--
 perl/Git/SVN/Fetcher.pm |  3 +--
 perl/Git/SVN/Ra.pm      |  5 ++++-
 5 files changed, 26 insertions(+), 20 deletions(-)

diff --git a/git-svn.perl b/git-svn.perl
index 32d109e..36f7240 100755
--- a/git-svn.perl
+++ b/git-svn.perl
@@ -11,14 +11,10 @@ $AUTHOR = 'Eric Wong <normalperson@yhbt.net>';
 $VERSION = '@@GIT_VERSION@@';
 
 use Carp qw/croak/;
-use Digest::MD5;
-use IO::File qw//;
 use File::Basename qw/dirname basename/;
 use File::Path qw/mkpath/;
 use File::Spec;
-use File::Find;
 use Getopt::Long qw/:config gnu_getopt no_ignore_case auto_abbrev/;
-use IPC::Open3;
 use Memoize;
 
 use Git::SVN;
@@ -298,7 +294,6 @@ my %cmd = (
 		{} ],
 );
 
-use Term::ReadLine;
 package FakeTerm;
 sub new {
 	my ($class, $reason) = @_;
@@ -313,6 +308,7 @@ package main;
 my $term;
 sub term_init {
 	$term = eval {
+		require Term::ReadLine;
 		$ENV{"GIT_SVN_NOTTY"}
 			? new Term::ReadLine 'git-svn', \*STDIN, \*STDOUT
 			: new Term::ReadLine 'git-svn';
@@ -1173,6 +1169,7 @@ sub cmd_branch {
 	}
 
 	::_req_svn();
+	require SVN::Client;
 
 	my $ctx = SVN::Client->new(
 		config => SVN::Core::config_get_config(
@@ -1693,11 +1690,13 @@ sub cmd_reset {
 }
 
 sub cmd_gc {
+	require File::Find;
 	if (!can_compress()) {
 		warn "Compress::Zlib could not be found; unhandled.log " .
 		     "files will not be compressed.\n";
 	}
-	find({ wanted => \&gc_directory, no_chdir => 1}, "$ENV{GIT_DIR}/svn");
+	File::Find::find({ wanted => \&gc_directory, no_chdir => 1},
+			 "$ENV{GIT_DIR}/svn");
 }
 
 ########################### utility functions #########################
@@ -2122,6 +2121,7 @@ sub find_file_type_and_diff_status {
 sub md5sum {
 	my $arg = shift;
 	my $ref = ref $arg;
+	require Digest::MD5;
 	my $md5 = Digest::MD5->new();
         if ($ref eq 'GLOB' || $ref eq 'IO::File' || $ref eq 'File::Temp') {
 		$md5->addfile($arg) or croak $!;
@@ -2148,6 +2148,7 @@ sub gc_directory {
 			$gz->gzwrite($str) or
 				die "Unable to write: ".$gz->gzerror()."!\n";
 		}
+		no warnings 'once'; # $File::Find::name would warn
 		unlink $_ or die "unlink $File::Find::name: $!\n";
 	} elsif (-f $_ && basename($_) eq "index") {
 		unlink $_ or die "unlink $_: $!\n";
diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm
index 8e4af71..afa562c 100644
--- a/perl/Git/SVN.pm
+++ b/perl/Git/SVN.pm
@@ -9,10 +9,8 @@ use vars qw/$_no_metadata
 	    $_use_log_author $_add_author_from $_localtime/;
 use Carp qw/croak/;
 use File::Path qw/mkpath/;
-use File::Copy qw/copy/;
 use IPC::Open3;
 use Memoize;  # core since 5.8.0, Jul 2002
-use Memoize::Storable;
 use POSIX qw(:signal_h);
 
 use Git qw(
@@ -32,11 +30,7 @@ use Git::SVN::Utils qw(
 	add_path_to_url
 );
 
-my $can_use_yaml;
-BEGIN {
-	$can_use_yaml = eval { require Git::SVN::Memoize::YAML; 1};
-}
-
+my $memo_backend;
 our $_follow_parent  = 1;
 our $_minimize_url   = 'unset';
 our $default_repo_id = 'svn';
@@ -1578,7 +1572,16 @@ sub tie_for_persistent_memoization {
 	my $hash = shift;
 	my $path = shift;
 
-	if ($can_use_yaml) {
+	unless ($memo_backend) {
+		if (eval { require Git::SVN::Memoize::YAML; 1}) {
+			$memo_backend = 1;
+		} else {
+			require Memoize::Storable;
+			$memo_backend = -1;
+		}
+	}
+
+	if ($memo_backend > 0) {
 		tie %$hash => 'Git::SVN::Memoize::YAML', "$path.yaml";
 	} else {
 		tie %$hash => 'Memoize::Storable', "$path.db", 'nstore';
@@ -2188,8 +2191,9 @@ sub rev_map_set {
 	# both of these options make our .rev_db file very, very important
 	# and we can't afford to lose it because rebuild() won't work
 	if ($self->use_svm_props || $self->no_metadata) {
+		require File::Copy;
 		$sync = 1;
-		copy($db, $db_lock) or die "rev_map_set(@_): ",
+		File::Copy::copy($db, $db_lock) or die "rev_map_set(@_): ",
 					   "Failed to copy: ",
 					   "$db => $db_lock ($!)\n";
 	} else {
diff --git a/perl/Git/SVN/Editor.pm b/perl/Git/SVN/Editor.pm
index 4088f13..c50176e 100644
--- a/perl/Git/SVN/Editor.pm
+++ b/perl/Git/SVN/Editor.pm
@@ -5,7 +5,6 @@ use warnings;
 use SVN::Core;
 use SVN::Delta;
 use Carp qw/croak/;
-use IO::File;
 use Git qw/command command_oneline command_noisy command_output_pipe
            command_input_pipe command_close_pipe
            command_bidi_pipe command_close_bidi_pipe/;
@@ -586,7 +585,7 @@ The interface will change as git-svn evolves.
 =head1 DEPENDENCIES
 
 Subversion perl bindings,
-the core L<Carp> and L<IO::File> modules,
+the core L<Carp> module,
 and git's L<Git> helper module.
 
 C<Git::SVN::Editor> has not been tested using callers other than
diff --git a/perl/Git/SVN/Fetcher.pm b/perl/Git/SVN/Fetcher.pm
index 10edb27..6b9c6e0 100644
--- a/perl/Git/SVN/Fetcher.pm
+++ b/perl/Git/SVN/Fetcher.pm
@@ -7,7 +7,6 @@ use warnings;
 use SVN::Delta;
 use Carp qw/croak/;
 use File::Basename qw/dirname/;
-use IO::File qw//;
 use Git qw/command command_oneline command_noisy command_output_pipe
            command_input_pipe command_close_pipe
            command_bidi_pipe command_close_bidi_pipe/;
@@ -600,7 +599,7 @@ developing git-svn.
 =head1 DEPENDENCIES
 
 L<SVN::Delta> from the Subversion perl bindings,
-the core L<Carp>, L<File::Basename>, and L<IO::File> modules,
+the core L<Carp> and L<File::Basename> modules,
 and git's L<Git> helper module.
 
 C<Git::SVN::Fetcher> has not been tested using callers other than
diff --git a/perl/Git/SVN/Ra.pm b/perl/Git/SVN/Ra.pm
index 622535e..cf36b9d 100644
--- a/perl/Git/SVN/Ra.pm
+++ b/perl/Git/SVN/Ra.pm
@@ -3,7 +3,6 @@ use vars qw/@ISA $config_dir $_ignore_refs_regex $_log_window_size/;
 use strict;
 use warnings;
 use Memoize;
-use SVN::Client;
 use Git::SVN::Utils qw(
 	canonicalize_url
 	canonicalize_path
@@ -42,6 +41,7 @@ END {
 }
 
 sub _auth_providers () {
+	require SVN::Client;
 	my @rv = (
 	  SVN::Client::get_simple_provider(),
 	  SVN::Client::get_ssl_server_trust_file_provider(),
@@ -247,7 +247,10 @@ sub get_log {
 	$ret;
 }
 
+# uncommon, only for ancient SVN (<= 1.4.2)
 sub trees_match {
+	require IO::File;
+	require SVN::Client;
 	my ($self, $url1, $rev1, $url2, $rev2) = @_;
 	my $ctx = SVN::Client->new(auth => _auth_providers);
 	my $out = IO::File->new_tmpfile;
-- 
EW

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: git-svn metadata commands performance issue
  2015-01-14 20:30 ` Eric Wong
  2015-01-15 10:14   ` [PATCH] git-svn: lazy load some modules Eric Wong
@ 2015-01-15 11:26   ` Niluge kiwi
  2015-01-15 11:53   ` David Kastrup
  2 siblings, 0 replies; 5+ messages in thread
From: Niluge kiwi @ 2015-01-15 11:26 UTC (permalink / raw)
  To: Eric Wong; +Cc: git

On Wed, Jan 14, 2015 at 9:30 PM, Eric Wong <normalperson@yhbt.net> wrote:
> Niluge kiwi <kiwiiii@gmail.com> wrote:
>> Hi all,
>>
>> In magit (http://magit.github.io/), a popular git frontend within
>> emacs, there is a git-svn frontend.  With a recent refactoring, it was
>> discovered that git-svn metadata commands (like "git-svn info") are
>> much slower than git ones:
>> git svn info: 130-150ms (after warmup): get the svn revision and url.
>> git svn rebase --dry-run: 150-170ms (after warmup): get the remote
>> branch.
>>
>> Whereas in pure git:
>> git rev-parse --abbrev-ref HEAD@{upstream}: 2-3ms (after warmup): get
>> the remote branch
>> Other git commands alike take all less than 10ms after warmup.
>
> Thanks for the bug report.  I actually see worse performance from
> my old machines, but I'm not a very heavy git-svn user anymore.
> 100ms is an eternity :<
>
>> This is an issue for the magit developers and users: just getting a
>> git-svn status with some metadata easily take ~500ms, which is really
>> slow for a UI. The equivalent UI with a pure git repository in magit
>> takes much less than 100ms to generate although more than 30 git
>> process are forked for it.
>
> How big is the parent process which forks the git commands?  On Linux at
> least, fork() performance is negatively impacted by parent process
> memory size.  To avoid spawn performance problems with large parent
> processes, vfork() should be used, but there does not seem to be an
> easy/portable way to use vfork() from Perl.

My emacs process uses ~500MB in RAM, but all the timings from my first
email come from commands run from bash that uses 8Mo in RAM. I don't
see any significant difference between timings from emacs and bash.

>
>> A previous version of magit-svn was much faster because it
>> re-implemented the logic of git-svn from perl to elisp (the
>> programming language in emacs), and to get the 3 previously mentioned
>> values it took less than 10ms.
>
> I've never worked with elisp, but we can probably figure out why it's
> faster.  Can you give us a pointer to the old elisp code?

The commit that removes the old code and replaces it by simple calls
to git-svn is here:
https://github.com/magit/magit/commit/b583e43a57dee164fdd53e0772220e32c1b3fab3

It may be faster because we don't have to start perl each time...

>
>> What could be done about this?
>> Could git-svn performance be dramatically improved?
>> Even git svn --version takes ~100ms, is perl the bottleneck?
>
> The Linux 'perf' tool reports much time is spent is from the Perl
> parser.  So we may implement lazy loading, so simple commands such as
> "git svn info" do not need to load and parse all the Perl code.
>
>> Or should each git-svn frontend developer re-implement the git-svn
>> metadata commands themselves for better performance?
>
> I prefer git-svn be fast enough; but you're free to reimplement
> and optimize your own code as you see fit.

The magit maintainer also prefers git-svn to be fast enough, because
it means less code to write and maintain.

>
>> Also, wouldn't it be better for those frontend developers if there
>> were some git-svn porcelain commands like git has? Fast, easy to parse
>> and stable input & output format?
>
> Sure, but we don't know what you'd need beyond the current command set.
> Of course we need to be careful about adding even more code to git-svn
> as that impacts startup time, too.
I'll ask the magit maintainer to weight in here for the ideal git-svn
command line features.

Also, couldn't the code be split into multiple files to have no global
impact on startup time when adding new features?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: git-svn metadata commands performance issue
  2015-01-14 20:30 ` Eric Wong
  2015-01-15 10:14   ` [PATCH] git-svn: lazy load some modules Eric Wong
  2015-01-15 11:26   ` git-svn metadata commands performance issue Niluge kiwi
@ 2015-01-15 11:53   ` David Kastrup
  2 siblings, 0 replies; 5+ messages in thread
From: David Kastrup @ 2015-01-15 11:53 UTC (permalink / raw)
  To: Eric Wong; +Cc: Niluge kiwi, git

Eric Wong <normalperson@yhbt.net> writes:

> How big is the parent process which forks the git commands?  On Linux at
> least, fork() performance is negatively impacted by parent process
> memory size.

Huh.  I thought with the advent of demand-paging, at the very least with
copy-on-write, this was supposed to be sort of a non-issue.

The old original UNIX version, in contrast, consisted of swapping out
the current process without removing the in-memory copy.  But since the
in-memory copy then did the exec call and since usually the exec call
was happy about every page of free memory (we _are_ talking about
something like 64kB of total available memory here), that tended to work
reasonably well.

-- 
David Kastrup

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-01-15 11:54 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-01-13 10:14 git-svn metadata commands performance issue Niluge kiwi
2015-01-14 20:30 ` Eric Wong
2015-01-15 10:14   ` [PATCH] git-svn: lazy load some modules Eric Wong
2015-01-15 11:26   ` git-svn metadata commands performance issue Niluge kiwi
2015-01-15 11:53   ` David Kastrup

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).