All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Wong <normalperson@yhbt.net>
To: Adam Roben <aroben@apple.com>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH 9/9] git-svn: Make fetch ~1.7x faster
Date: Tue, 23 Oct 2007 23:34:01 -0700	[thread overview]
Message-ID: <20071024063401.GB10916@soma> (raw)
In-Reply-To: <1193118397-4696-10-git-send-email-aroben@apple.com>

Adam Roben <aroben@apple.com> wrote:
> We were spending a lot of time forking/execing git-cat-file and
> git-hash-object. We now use command_bidi_pipe to keep one instance of each
> running and feed it input on stdin.

Nice job!  I just got access to a very fast SVN repository for a project
I'm working on (not working on git-svn itself, unfortunately).

A few comments and small nitpicks below:

> Signed-off-by: Adam Roben <aroben@apple.com>
> ---
>  git-svn.perl |   94 ++++++++++++++++++++++++++++++++++++++++++++-------------
>  1 files changed, 72 insertions(+), 22 deletions(-)

> +package Git::Commands;

Can this be a separate file, or a part of Git.pm?  I'm sure other
scripts can eventually use this and I've been meaning to split
git-svn.perl into separate files so it's easier to follow.

> +use vars qw/$_cat_blob_pid $_cat_blob_in $_cat_blob_out $_cat_blob_ctx $_cat_blob_separator
> +	    $_hash_object_pid $_hash_object_in $_hash_object_out $_hash_object_ctx/;

I have trouble following long lines, and most of the git code also wraps
at 80-columns.  Dead-tree publishers got this concept right a long
time ago :)

> +use strict;
> +use warnings;
> +use File::Temp qw/tempfile/;
> +use Git qw/command_bidi_pipe command_close_bidi_pipe/;
> +
> +sub _open_cat_blob_if_needed {
> +	return if defined($_cat_blob_pid);
> +	$_cat_blob_separator = "--------------GITCATFILESEPARATOR-----------";

Brian brought this up already, but yes, having pre-defined separators
instead of explicitly-specified sizes makes it all too easy for a
malicious user to commit code that will break things for git-svn users.

> +sub hash_object {
> +	my (undef, $fh) = @_;
> +
> +	my ($tmp_fh, $tmp_filename) = tempfile(UNLINK => 1);
> +	while (my $line = <$fh>) {
> +		print $tmp_fh $line;
> +	}
> +	close($tmp_fh);

Related to the above.  It's better to sysread()/syswrite() or
read()/print() in a loop with a predefined buffer size rather than to
use a readline() since you could be dealing with files with very long
lines or binaries with no newline characters in them at all.

> +	_open_hash_object_if_needed();
> +	print $_hash_object_out $tmp_filename . "\n";

Minor, but

	print $_hash_object_out $tmp_filename, "\n";

avoids creating a new string.

-- 
Eric Wong

  parent reply	other threads:[~2007-10-24  6:34 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-10-23  5:46 [PATCH 0/9] Make git-svn fetch ~1.7x faster Adam Roben
2007-10-23  5:46 ` [PATCH 1/9] Add tests for git cat-file Adam Roben
2007-10-23  5:46   ` [PATCH 2/9] git-cat-file: Small refactor of cmd_cat_file Adam Roben
2007-10-23  5:46     ` [PATCH 3/9] git-cat-file: Make option parsing a little more flexible Adam Roben
2007-10-23  5:46       ` [PATCH 4/9] git-cat-file: Add --stdin option Adam Roben
2007-10-23  5:46         ` [PATCH 5/9] git-cat-file: Add --separator option Adam Roben
2007-10-23  5:46           ` [PATCH 6/9] Add tests for git hash-object Adam Roben
2007-10-23  5:46             ` [PATCH 7/9] git-hash-object: Add --stdin-paths option Adam Roben
2007-10-23  5:46               ` [PATCH 8/9] Git.pm: Add command_bidi_pipe and command_close_bidi_pipe Adam Roben
2007-10-23  5:46                 ` [PATCH 9/9] git-svn: Make fetch ~1.7x faster Adam Roben
2007-10-23  7:01                   ` Johannes Sixt
2007-10-24  6:34                   ` Eric Wong [this message]
2007-10-24  6:48                     ` Adam Roben
2007-10-23  5:53               ` [PATCH 7/9] git-hash-object: Add --stdin-paths option Shawn O. Pearce
2007-10-23  5:57                 ` Adam Roben
2007-10-23  6:10                   ` Shawn O. Pearce
2007-10-24  6:11                     ` Eric Wong
2007-10-23  6:59             ` [PATCH 6/9] Add tests for git hash-object Johannes Sixt
2007-10-24  3:43           ` [PATCH 5/9] git-cat-file: Add --separator option Brian Downing
2007-10-24  4:26             ` Adam Roben
2007-10-23  6:59   ` [PATCH 1/9] Add tests for git cat-file Johannes Sixt
2007-10-23  6:08 ` [PATCH 0/9] Make git-svn fetch ~1.7x faster Mike Hommey
2007-10-23  6:13   ` Adam Roben
2007-10-24  0:43   ` Sam Vilain
  -- strict thread matches above, loose matches on Subject: below --
2007-10-25 10:25 [RESEND PATCH " Adam Roben
2007-10-25 10:25 ` [PATCH 1/9] Add tests for git cat-file Adam Roben
2007-10-25 10:25   ` [PATCH 2/9] git-cat-file: Small refactor of cmd_cat_file Adam Roben
2007-10-25 10:25     ` [PATCH 3/9] git-cat-file: Make option parsing a little more flexible Adam Roben
2007-10-25 10:25       ` [PATCH 4/9] git-cat-file: Add --stdin option Adam Roben
2007-10-25 10:25         ` [PATCH 5/9] Add tests for git hash-object Adam Roben
2007-10-25 10:25           ` [PATCH 6/9] git-hash-object: Add --stdin-paths option Adam Roben
2007-10-25 10:25             ` [PATCH 7/9] Git.pm: Add command_bidi_pipe and command_close_bidi_pipe Adam Roben
2007-10-25 10:25               ` [PATCH 8/9] Git.pm: Add hash_and_insert_object and cat_blob Adam Roben
2007-10-25 10:25                 ` [PATCH 9/9] git-svn: Make fetch ~1.7x faster Adam Roben

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20071024063401.GB10916@soma \
    --to=normalperson@yhbt.net \
    --cc=aroben@apple.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.