From: Eric Wong <normalperson@yhbt.net>
To: Adam Roben <aroben@apple.com>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH 9/9] git-svn: Make fetch ~1.7x faster
Date: Tue, 23 Oct 2007 23:34:01 -0700 [thread overview]
Message-ID: <20071024063401.GB10916@soma> (raw)
In-Reply-To: <1193118397-4696-10-git-send-email-aroben@apple.com>
Adam Roben <aroben@apple.com> wrote:
> We were spending a lot of time forking/execing git-cat-file and
> git-hash-object. We now use command_bidi_pipe to keep one instance of each
> running and feed it input on stdin.
Nice job! I just got access to a very fast SVN repository for a project
I'm working on (not working on git-svn itself, unfortunately).
A few comments and small nitpicks below:
> Signed-off-by: Adam Roben <aroben@apple.com>
> ---
> git-svn.perl | 94 ++++++++++++++++++++++++++++++++++++++++++++-------------
> 1 files changed, 72 insertions(+), 22 deletions(-)
> +package Git::Commands;
Can this be a separate file, or a part of Git.pm? I'm sure other
scripts can eventually use this and I've been meaning to split
git-svn.perl into separate files so it's easier to follow.
> +use vars qw/$_cat_blob_pid $_cat_blob_in $_cat_blob_out $_cat_blob_ctx $_cat_blob_separator
> + $_hash_object_pid $_hash_object_in $_hash_object_out $_hash_object_ctx/;
I have trouble following long lines, and most of the git code also wraps
at 80-columns. Dead-tree publishers got this concept right a long
time ago :)
> +use strict;
> +use warnings;
> +use File::Temp qw/tempfile/;
> +use Git qw/command_bidi_pipe command_close_bidi_pipe/;
> +
> +sub _open_cat_blob_if_needed {
> + return if defined($_cat_blob_pid);
> + $_cat_blob_separator = "--------------GITCATFILESEPARATOR-----------";
Brian brought this up already, but yes, having pre-defined separators
instead of explicitly-specified sizes makes it all too easy for a
malicious user to commit code that will break things for git-svn users.
> +sub hash_object {
> + my (undef, $fh) = @_;
> +
> + my ($tmp_fh, $tmp_filename) = tempfile(UNLINK => 1);
> + while (my $line = <$fh>) {
> + print $tmp_fh $line;
> + }
> + close($tmp_fh);
Related to the above. It's better to sysread()/syswrite() or
read()/print() in a loop with a predefined buffer size rather than to
use a readline() since you could be dealing with files with very long
lines or binaries with no newline characters in them at all.
> + _open_hash_object_if_needed();
> + print $_hash_object_out $tmp_filename . "\n";
Minor, but
print $_hash_object_out $tmp_filename, "\n";
avoids creating a new string.
--
Eric Wong
next prev parent reply other threads:[~2007-10-24 6:34 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-10-23 5:46 [PATCH 0/9] Make git-svn fetch ~1.7x faster Adam Roben
2007-10-23 5:46 ` [PATCH 1/9] Add tests for git cat-file Adam Roben
2007-10-23 5:46 ` [PATCH 2/9] git-cat-file: Small refactor of cmd_cat_file Adam Roben
2007-10-23 5:46 ` [PATCH 3/9] git-cat-file: Make option parsing a little more flexible Adam Roben
2007-10-23 5:46 ` [PATCH 4/9] git-cat-file: Add --stdin option Adam Roben
2007-10-23 5:46 ` [PATCH 5/9] git-cat-file: Add --separator option Adam Roben
2007-10-23 5:46 ` [PATCH 6/9] Add tests for git hash-object Adam Roben
2007-10-23 5:46 ` [PATCH 7/9] git-hash-object: Add --stdin-paths option Adam Roben
2007-10-23 5:46 ` [PATCH 8/9] Git.pm: Add command_bidi_pipe and command_close_bidi_pipe Adam Roben
2007-10-23 5:46 ` [PATCH 9/9] git-svn: Make fetch ~1.7x faster Adam Roben
2007-10-23 7:01 ` Johannes Sixt
2007-10-24 6:34 ` Eric Wong [this message]
2007-10-24 6:48 ` Adam Roben
2007-10-23 5:53 ` [PATCH 7/9] git-hash-object: Add --stdin-paths option Shawn O. Pearce
2007-10-23 5:57 ` Adam Roben
2007-10-23 6:10 ` Shawn O. Pearce
2007-10-24 6:11 ` Eric Wong
2007-10-23 6:59 ` [PATCH 6/9] Add tests for git hash-object Johannes Sixt
2007-10-24 3:43 ` [PATCH 5/9] git-cat-file: Add --separator option Brian Downing
2007-10-24 4:26 ` Adam Roben
2007-10-23 6:59 ` [PATCH 1/9] Add tests for git cat-file Johannes Sixt
2007-10-23 6:08 ` [PATCH 0/9] Make git-svn fetch ~1.7x faster Mike Hommey
2007-10-23 6:13 ` Adam Roben
2007-10-24 0:43 ` Sam Vilain
-- strict thread matches above, loose matches on Subject: below --
2007-10-25 10:25 [RESEND PATCH " Adam Roben
2007-10-25 10:25 ` [PATCH 1/9] Add tests for git cat-file Adam Roben
2007-10-25 10:25 ` [PATCH 2/9] git-cat-file: Small refactor of cmd_cat_file Adam Roben
2007-10-25 10:25 ` [PATCH 3/9] git-cat-file: Make option parsing a little more flexible Adam Roben
2007-10-25 10:25 ` [PATCH 4/9] git-cat-file: Add --stdin option Adam Roben
2007-10-25 10:25 ` [PATCH 5/9] Add tests for git hash-object Adam Roben
2007-10-25 10:25 ` [PATCH 6/9] git-hash-object: Add --stdin-paths option Adam Roben
2007-10-25 10:25 ` [PATCH 7/9] Git.pm: Add command_bidi_pipe and command_close_bidi_pipe Adam Roben
2007-10-25 10:25 ` [PATCH 8/9] Git.pm: Add hash_and_insert_object and cat_blob Adam Roben
2007-10-25 10:25 ` [PATCH 9/9] git-svn: Make fetch ~1.7x faster Adam Roben
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20071024063401.GB10916@soma \
--to=normalperson@yhbt.net \
--cc=aroben@apple.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).