From: Eric Wong <normalperson@yhbt.net>
To: Adam Roben <aroben@apple.com>
Cc: git@vger.kernel.org, Junio Hamano <gitster@pobox.com>
Subject: Re: [PATCH 8/9] Git.pm: Add hash_and_insert_object and cat_blob
Date: Fri, 26 Oct 2007 08:11:07 -0700 [thread overview]
Message-ID: <20071026151107.GA29522@muzzle> (raw)
In-Reply-To: <1193307927-3592-9-git-send-email-aroben@apple.com>
Adam Roben <aroben@apple.com> wrote:
> These functions are more efficient ways of executing `git hash-object -w` and
> `git cat-file blob` when you are dealing with many files/objects.
>
> Signed-off-by: Adam Roben <aroben@apple.com>
> ---
> Eric Wong wrote:
> > > +package Git::Commands;
> >
> > Can this be a separate file, or a part of Git.pm? I'm sure other
> > scripts can eventually use this and I've been meaning to split
> > git-svn.perl into separate files so it's easier to follow.
>
> I ended up making it part of Git.pm, because I realized that made far more
> sense than splitting it into a separate file.
>
> perl/Git.pm | 97 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
> 1 files changed, 95 insertions(+), 2 deletions(-)
Hi Adam,
Thanks.
> diff --git a/perl/Git.pm b/perl/Git.pm
> index 46c5d10..f23edef 100644
> --- a/perl/Git.pm
> +++ b/perl/Git.pm
> @@ -39,6 +39,9 @@ $VERSION = '0.01';
> my $lastrev = $repo->command_oneline( [ 'rev-list', '--all' ],
> STDERR => 0 );
>
> + my $sha1 = $repo->hash_and_insert_object('file.txt');
> + my $contents = $repo->cat_blob($sha1);
I missed this the first time around. But I'd rather be able to pass a
file handle to cat_blob for writing, instead of returning a potentially
huge string in memory.
> @@ -675,6 +677,93 @@ sub hash_object {
> }
>
>
> +=item hash_and_insert_object ( FILENAME )
> +
> +Compute the SHA1 object id of the given C<FILENAME> and add the object to the
> +object database.
> +
> +The function returns the SHA1 hash.
> +
> +=cut
> +
> +# TODO: Support for passing FILEHANDLE instead of FILENAME
Filenames are fine for this input since they (are/should be) generated
by File::Temp and not from an untrusted repo.
We should, however assert that the caller of this function
isn't using a stupid filename with "\n" in it.
> +sub hash_and_insert_object {
> + my ($self, $filename) = @_;
> +
> + $self->_open_hash_and_insert_object_if_needed();
> + my ($in, $out) = ($self->{hash_object_in}, $self->{hash_object_out});
> +
> + print $out $filename, "\n";
> + chomp(my $hash = <$in>);
> + return $hash;
> +}
> +
> +sub _open_hash_and_insert_object_if_needed {
> + my ($self) = @_;
> +
> + return if defined($self->{hash_object_pid});
> +
> + ($self->{hash_object_pid}, $self->{hash_object_in},
> + $self->{hash_object_out}, $self->{hash_object_ctx}) =
> + command_bidi_pipe(qw(hash-object -w --stdin-paths));
> +}
> +
> +sub _close_hash_and_insert_object {
> + my ($self) = @_;
> +
> + return unless defined($self->{hash_object_pid});
> +
> + my @vars = map { 'hash_object' . $_ } qw(pid in out ctx);
It looks like you're missing a '_' in there.
> + command_close_bidi_pipe($self->{@vars});
> + delete $self->{@vars};
> +}
> +
> +=item cat_blob ( SHA1 )
> +
> +Returns the contents of the blob identified by C<SHA1>.
> +
> +=cut
> +
> +sub cat_blob {
> + my ($self, $sha1) = @_;
> +
> + $self->_open_cat_blob_if_needed();
> + my ($in, $out) = ($self->{cat_blob_in}, $self->{cat_blob_out});
> +
> + print $out $sha1, "\n";
> + chomp(my $size = <$in>);
> +
> + my $blob;
> + my $result = read($in, $blob, $size);
> + defined $result or carp $!;
> +
> + # Skip past the trailing newline.
> + read($in, my $newline, 1);
> +
> + return $blob;
> +}
However, I'd very much like to be able to pass a file handle to this
function. This should read()/print() to a file handle passed to it in a
loop rather than slurping all of $size at once, since the files we're
receiving can be huge.
I'd also be happier if we checked that we actually read $size bytes in
the loop, and that $newline is actually "\n" to safeguard against bugs
in cat-blob.
> +sub _open_cat_blob_if_needed {
> + my ($self) = @_;
> +
> + return if defined($self->{cat_blob_pid});
> +
> + ($self->{cat_blob_pid}, $self->{cat_blob_in},
> + $self->{cat_blob_out}, $self->{cat_blob_ctx}) =
> + command_bidi_pipe(qw(cat-file blob --stdin));
> +}
> +
> +sub _close_cat_blob {
> + my ($self) = @_;
> +
> + return unless defined($self->{cat_blob_pid});
> +
> + my @vars = map { 'cat_blob' . $_ } qw(pid in out ctx);
It looks like you're missing a '_' here, too.
> + command_close_bidi_pipe($self->{@vars});
> + delete $self->{@vars};
> +}
>
One more nit, I'm a bit paranoid, but I personally like to die/croak if
the result of every print()/syswrite() to make sure the pipe we're
writing to didn't die or if there were other error indicators.
Hopefully that's the last of tweaks I'd like to see :)
--
Eric Wong
next prev parent reply other threads:[~2007-10-26 15:11 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-10-25 10:25 [RESEND PATCH 0/9] Make git-svn fetch ~1.7x faster Adam Roben
2007-10-25 10:25 ` [PATCH 1/9] Add tests for git cat-file Adam Roben
2007-10-25 10:25 ` [PATCH 2/9] git-cat-file: Small refactor of cmd_cat_file Adam Roben
2007-10-25 10:25 ` [PATCH 3/9] git-cat-file: Make option parsing a little more flexible Adam Roben
2007-10-25 10:25 ` [PATCH 4/9] git-cat-file: Add --stdin option Adam Roben
2007-10-25 10:25 ` [PATCH 5/9] Add tests for git hash-object Adam Roben
2007-10-25 10:25 ` [PATCH 6/9] git-hash-object: Add --stdin-paths option Adam Roben
2007-10-25 10:25 ` [PATCH 7/9] Git.pm: Add command_bidi_pipe and command_close_bidi_pipe Adam Roben
2007-10-25 10:25 ` [PATCH 8/9] Git.pm: Add hash_and_insert_object and cat_blob Adam Roben
2007-10-25 10:25 ` [PATCH 9/9] git-svn: Make fetch ~1.7x faster Adam Roben
2007-10-26 15:11 ` Eric Wong [this message]
2007-10-26 21:00 ` [PATCH 6/9] git-hash-object: Add --stdin-paths option Junio C Hamano
2007-10-26 23:19 ` Brian Downing
2007-10-27 1:02 ` Junio C Hamano
2007-10-26 21:00 ` [PATCH 5/9] Add tests for git hash-object Junio C Hamano
2007-10-26 20:59 ` [PATCH 4/9] git-cat-file: Add --stdin option Junio C Hamano
2007-10-26 20:56 ` [PATCH 3/9] git-cat-file: Make option parsing a little more flexible Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20071026151107.GA29522@muzzle \
--to=normalperson@yhbt.net \
--cc=aroben@apple.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).