git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Wong <normalperson@yhbt.net>
To: Adam Roben <aroben@apple.com>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH 9/9] git-svn: Make fetch ~1.7x faster
Date: Tue, 23 Oct 2007 23:34:01 -0700	[thread overview]
Message-ID: <20071024063401.GB10916@soma> (raw)
In-Reply-To: <1193118397-4696-10-git-send-email-aroben@apple.com>

Adam Roben <aroben@apple.com> wrote:
> We were spending a lot of time forking/execing git-cat-file and
> git-hash-object. We now use command_bidi_pipe to keep one instance of each
> running and feed it input on stdin.

Nice job!  I just got access to a very fast SVN repository for a project
I'm working on (not working on git-svn itself, unfortunately).

A few comments and small nitpicks below:

> Signed-off-by: Adam Roben <aroben@apple.com>
> ---
>  git-svn.perl |   94 ++++++++++++++++++++++++++++++++++++++++++++-------------
>  1 files changed, 72 insertions(+), 22 deletions(-)

> +package Git::Commands;

Can this be a separate file, or a part of Git.pm?  I'm sure other
scripts can eventually use this and I've been meaning to split
git-svn.perl into separate files so it's easier to follow.

> +use vars qw/$_cat_blob_pid $_cat_blob_in $_cat_blob_out $_cat_blob_ctx $_cat_blob_separator
> +	    $_hash_object_pid $_hash_object_in $_hash_object_out $_hash_object_ctx/;

I have trouble following long lines, and most of the git code also wraps
at 80-columns.  Dead-tree publishers got this concept right a long
time ago :)

> +use strict;
> +use warnings;
> +use File::Temp qw/tempfile/;
> +use Git qw/command_bidi_pipe command_close_bidi_pipe/;
> +
> +sub _open_cat_blob_if_needed {
> +	return if defined($_cat_blob_pid);
> +	$_cat_blob_separator = "--------------GITCATFILESEPARATOR-----------";

Brian brought this up already, but yes, having pre-defined separators
instead of explicitly-specified sizes makes it all too easy for a
malicious user to commit code that will break things for git-svn users.

> +sub hash_object {
> +	my (undef, $fh) = @_;
> +
> +	my ($tmp_fh, $tmp_filename) = tempfile(UNLINK => 1);
> +	while (my $line = <$fh>) {
> +		print $tmp_fh $line;
> +	}
> +	close($tmp_fh);

Related to the above.  It's better to sysread()/syswrite() or
read()/print() in a loop with a predefined buffer size rather than to
use a readline() since you could be dealing with files with very long
lines or binaries with no newline characters in them at all.

> +	_open_hash_object_if_needed();
> +	print $_hash_object_out $tmp_filename . "\n";

Minor, but

	print $_hash_object_out $tmp_filename, "\n";

avoids creating a new string.

-- 
Eric Wong

  parent reply	other threads:[~2007-10-24  6:34 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-10-23  5:46 [PATCH 0/9] Make git-svn fetch ~1.7x faster Adam Roben
2007-10-23  5:46 ` [PATCH 1/9] Add tests for git cat-file Adam Roben
2007-10-23  5:46   ` [PATCH 2/9] git-cat-file: Small refactor of cmd_cat_file Adam Roben
2007-10-23  5:46     ` [PATCH 3/9] git-cat-file: Make option parsing a little more flexible Adam Roben
2007-10-23  5:46       ` [PATCH 4/9] git-cat-file: Add --stdin option Adam Roben
2007-10-23  5:46         ` [PATCH 5/9] git-cat-file: Add --separator option Adam Roben
2007-10-23  5:46           ` [PATCH 6/9] Add tests for git hash-object Adam Roben
2007-10-23  5:46             ` [PATCH 7/9] git-hash-object: Add --stdin-paths option Adam Roben
2007-10-23  5:46               ` [PATCH 8/9] Git.pm: Add command_bidi_pipe and command_close_bidi_pipe Adam Roben
2007-10-23  5:46                 ` [PATCH 9/9] git-svn: Make fetch ~1.7x faster Adam Roben
2007-10-23  7:01                   ` Johannes Sixt
2007-10-24  6:34                   ` Eric Wong [this message]
2007-10-24  6:48                     ` Adam Roben
2007-10-23  5:53               ` [PATCH 7/9] git-hash-object: Add --stdin-paths option Shawn O. Pearce
2007-10-23  5:57                 ` Adam Roben
2007-10-23  6:10                   ` Shawn O. Pearce
2007-10-24  6:11                     ` Eric Wong
2007-10-23  6:59             ` [PATCH 6/9] Add tests for git hash-object Johannes Sixt
2007-10-24  3:43           ` [PATCH 5/9] git-cat-file: Add --separator option Brian Downing
2007-10-24  4:26             ` Adam Roben
2007-10-23  6:59   ` [PATCH 1/9] Add tests for git cat-file Johannes Sixt
2007-10-23  6:08 ` [PATCH 0/9] Make git-svn fetch ~1.7x faster Mike Hommey
2007-10-23  6:13   ` Adam Roben
2007-10-24  0:43   ` Sam Vilain
  -- strict thread matches above, loose matches on Subject: below --
2007-10-25 10:25 [RESEND PATCH " Adam Roben
2007-10-25 10:25 ` [PATCH 1/9] Add tests for git cat-file Adam Roben
2007-10-25 10:25   ` [PATCH 2/9] git-cat-file: Small refactor of cmd_cat_file Adam Roben
2007-10-25 10:25     ` [PATCH 3/9] git-cat-file: Make option parsing a little more flexible Adam Roben
2007-10-25 10:25       ` [PATCH 4/9] git-cat-file: Add --stdin option Adam Roben
2007-10-25 10:25         ` [PATCH 5/9] Add tests for git hash-object Adam Roben
2007-10-25 10:25           ` [PATCH 6/9] git-hash-object: Add --stdin-paths option Adam Roben
2007-10-25 10:25             ` [PATCH 7/9] Git.pm: Add command_bidi_pipe and command_close_bidi_pipe Adam Roben
2007-10-25 10:25               ` [PATCH 8/9] Git.pm: Add hash_and_insert_object and cat_blob Adam Roben
2007-10-25 10:25                 ` [PATCH 9/9] git-svn: Make fetch ~1.7x faster Adam Roben

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20071024063401.GB10916@soma \
    --to=normalperson@yhbt.net \
    --cc=aroben@apple.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).