git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Adam Roben <aroben@apple.com>
To: git@vger.kernel.org
Cc: Adam Roben <aroben@apple.com>, Eric Wong <normalperson@yhbt.net>
Subject: [PATCH 10/11] Git.pm: Add hash_and_insert_object and cat_blob
Date: Wed, 23 Apr 2008 15:17:52 -0400	[thread overview]
Message-ID: <1208978273-98146-11-git-send-email-aroben@apple.com> (raw)
In-Reply-To: <1208978273-98146-10-git-send-email-aroben@apple.com>

These functions are more efficient ways of executing `git hash-object -w` and
`git cat-file blob` when you are dealing with many files/objects.

Signed-off-by: Adam Roben <aroben@apple.com>
---
Eric Wong <normalperson@yhbt.net> wrote:
> > diff --git a/perl/Git.pm b/perl/Git.pm
> > index 46c5d10..f23edef 100644
> > --- a/perl/Git.pm
> > +++ b/perl/Git.pm
> > @@ -39,6 +39,9 @@ $VERSION = '0.01';
> >    my $lastrev = $repo->command_oneline( [ 'rev-list', '--all' ],
> >                                          STDERR => 0 );
> >  
> > +  my $sha1 = $repo->hash_and_insert_object('file.txt');
> > +  my $contents = $repo->cat_blob($sha1);
> 
> I missed this the first time around.  But I'd rather be able to pass a
> file handle to cat_blob for writing, instead of returning a potentially
> huge string in memory.

Fixed.

> > @@ -675,6 +677,93 @@ sub hash_object {
> >  }
> >  
> >  
> > +=item hash_and_insert_object ( FILENAME )
> > +
> > +Compute the SHA1 object id of the given C<FILENAME> and add the object to the
> > +object database.
> > +
> > +The function returns the SHA1 hash.
> > +
> > +=cut
> > +
> > +# TODO: Support for passing FILEHANDLE instead of FILENAME
> 
> Filenames are fine for this input since they (are/should be) generated
> by File::Temp and not from an untrusted repo.
> 
> We should, however assert that the caller of this function
> isn't using a stupid filename with "\n" in it.

Fixed.

> > +sub hash_and_insert_object {
> > +   my ($self, $filename) = @_;
> > +
> > +   $self->_open_hash_and_insert_object_if_needed();
> > +   my ($in, $out) = ($self->{hash_object_in}, $self->{hash_object_out});
> > +
> > +   print $out $filename, "\n";
> > +   chomp(my $hash = <$in>);
> > +   return $hash;
> > +}
> > +
> > +sub _open_hash_and_insert_object_if_needed {
> > +   my ($self) = @_;
> > +
> > +   return if defined($self->{hash_object_pid});
> > +
> > +   ($self->{hash_object_pid}, $self->{hash_object_in},
> > +    $self->{hash_object_out}, $self->{hash_object_ctx}) =
> > +           command_bidi_pipe(qw(hash-object -w --stdin-paths));
> > +}
> > +
> > +sub _close_hash_and_insert_object {
> > +   my ($self) = @_;
> > +
> > +   return unless defined($self->{hash_object_pid});
> > +
> > +   my @vars = map { 'hash_object' . $_ } qw(pid in out ctx);
> 
> It looks like you're missing a '_' in there.

Fixed.

> > +=item cat_blob ( SHA1 )
> > +
> > +Returns the contents of the blob identified by C<SHA1>.
> > +
> > +=cut
> > +
> > +sub cat_blob {
> > +   my ($self, $sha1) = @_;
> > +
> > +   $self->_open_cat_blob_if_needed();
> > +   my ($in, $out) = ($self->{cat_blob_in}, $self->{cat_blob_out});
> > +
> > +   print $out $sha1, "\n";
> > +   chomp(my $size = <$in>);
> > +
> > +   my $blob;
> > +   my $result = read($in, $blob, $size);
> > +   defined $result or carp $!;
> > +
> > +   # Skip past the trailing newline.
> > +   read($in, my $newline, 1);
> > +
> > +   return $blob;
> > +}
> 
> However, I'd very much like to be able to pass a file handle to this
> function.  This should read()/print() to a file handle passed to it in a
> loop rather than slurping all of $size at once, since the files we're
> receiving can be huge.

Fixed.

> I'd also be happier if we checked that we actually read $size bytes in
> the loop, and that $newline is actually "\n" to safeguard against bugs
> in cat-blob.

Fixed.

> > +sub _open_cat_blob_if_needed {
> > +   my ($self) = @_;
> > +
> > +   return if defined($self->{cat_blob_pid});
> > +
> > +   ($self->{cat_blob_pid}, $self->{cat_blob_in},
> > +    $self->{cat_blob_out}, $self->{cat_blob_ctx}) =
> > +           command_bidi_pipe(qw(cat-file blob --stdin));
> > +}
> > +
> > +sub _close_cat_blob {
> > +   my ($self) = @_;
> > +
> > +   return unless defined($self->{cat_blob_pid});
> > +
> > +   my @vars = map { 'cat_blob' . $_ } qw(pid in out ctx);
> 
> It looks like you're missing a '_' here, too.

Fixed.

> One more nit, I'm a bit paranoid, but I personally like to die/croak if
> the result of every print()/syswrite() to make sure the pipe we're
> writing to didn't die or if there were other error indicators.

Fixed.

 perl/Git.pm |  152 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 150 insertions(+), 2 deletions(-)

diff --git a/perl/Git.pm b/perl/Git.pm
index d766974..6ba8ee5 100644
--- a/perl/Git.pm
+++ b/perl/Git.pm
@@ -39,6 +39,10 @@ $VERSION = '0.01';
   my $lastrev = $repo->command_oneline( [ 'rev-list', '--all' ],
                                         STDERR => 0 );
 
+  my $sha1 = $repo->hash_and_insert_object('file.txt');
+  my $tempfile = tempfile();
+  my $size = $repo->cat_blob($sha1, $tempfile);
+
 =cut
 
 
@@ -218,7 +222,6 @@ sub repository {
 	bless $self, $class;
 }
 
-
 =back
 
 =head1 METHODS
@@ -734,6 +737,147 @@ sub hash_object {
 }
 
 
+=item hash_and_insert_object ( FILENAME )
+
+Compute the SHA1 object id of the given C<FILENAME> and add the object to the
+object database.
+
+The function returns the SHA1 hash.
+
+=cut
+
+# TODO: Support for passing FILEHANDLE instead of FILENAME
+sub hash_and_insert_object {
+	my ($self, $filename) = @_;
+
+	carp "Bad filename \"$filename\"" if $filename =~ /[\r\n]/;
+
+	$self->_open_hash_and_insert_object_if_needed();
+	my ($in, $out) = ($self->{hash_object_in}, $self->{hash_object_out});
+
+	unless (print $out $filename, "\n") {
+		$self->_close_hash_and_insert_object();
+		throw Error::Simple("out pipe went bad");
+	}
+
+	chomp(my $hash = <$in>);
+	unless (defined($hash)) {
+		$self->_close_hash_and_insert_object();
+		throw Error::Simple("in pipe went bad");
+	}
+
+	return $hash;
+}
+
+sub _open_hash_and_insert_object_if_needed {
+	my ($self) = @_;
+
+	return if defined($self->{hash_object_pid});
+
+	($self->{hash_object_pid}, $self->{hash_object_in},
+	 $self->{hash_object_out}, $self->{hash_object_ctx}) =
+		command_bidi_pipe(qw(hash-object -w --stdin-paths));
+}
+
+sub _close_hash_and_insert_object {
+	my ($self) = @_;
+
+	return unless defined($self->{hash_object_pid});
+
+	my @vars = map { 'hash_object_' . $_ } qw(pid in out ctx);
+
+	command_close_bidi_pipe($self->{@vars});
+	delete $self->{@vars};
+}
+
+=item cat_blob ( SHA1, FILEHANDLE )
+
+Prints the contents of the blob identified by C<SHA1> to C<FILEHANDLE> and
+returns the number of bytes printed.
+
+=cut
+
+sub cat_blob {
+	my ($self, $sha1, $fh) = @_;
+
+	$self->_open_cat_blob_if_needed();
+	my ($in, $out) = ($self->{cat_blob_in}, $self->{cat_blob_out});
+
+	unless (print $out $sha1, "\n") {
+		$self->_close_cat_blob();
+		throw Error::Simple("out pipe went bad");
+	}
+
+	my $description = <$in>;
+	if ($description =~ / missing$/) {
+		carp "$sha1 doesn't exist in the repository";
+		return 0;
+	}
+
+	if ($description !~ /^[0-9a-fA-F]{40} \S+ (\d+)$/) {
+		carp "Unexpected result returned from git cat-file";
+		return 0;
+	}
+
+	my $size = $1;
+
+	my $blob;
+	my $bytesRead = 0;
+
+	while (1) {
+		my $bytesLeft = $size - $bytesRead;
+		last unless $bytesLeft;
+
+		my $bytesToRead = $bytesLeft < 1024 ? $bytesLeft : 1024;
+		my $read = read($in, $blob, $bytesToRead, $bytesRead);
+		unless (defined($read)) {
+			$self->_close_cat_blob();
+			throw Error::Simple("in pipe went bad");
+		}
+
+		$bytesRead += $read;
+	}
+
+	# Skip past the trailing newline.
+	my $newline;
+	my $read = read($in, $newline, 1);
+	unless (defined($read)) {
+		$self->_close_cat_blob();
+		throw Error::Simple("in pipe went bad");
+	}
+	unless ($read == 1 && $newline eq "\n") {
+		$self->_close_cat_blob();
+		throw Error::Simple("didn't find newline after blob");
+	}
+
+	unless (print $fh $blob) {
+		$self->_close_cat_blob();
+		throw Error::Simple("couldn't write to passed in filehandle");
+	}
+
+	return $size;
+}
+
+sub _open_cat_blob_if_needed {
+	my ($self) = @_;
+
+	return if defined($self->{cat_blob_pid});
+
+	($self->{cat_blob_pid}, $self->{cat_blob_in},
+	 $self->{cat_blob_out}, $self->{cat_blob_ctx}) =
+		command_bidi_pipe(qw(cat-file --batch));
+}
+
+sub _close_cat_blob {
+	my ($self) = @_;
+
+	return unless defined($self->{cat_blob_pid});
+
+	my @vars = map { 'cat_blob_' . $_ } qw(pid in out ctx);
+
+	command_close_bidi_pipe($self->{@vars});
+	delete $self->{@vars};
+}
 
 =back
 
@@ -951,7 +1095,11 @@ sub _cmd_close {
 }
 
 
-sub DESTROY { }
+sub DESTROY {
+	my ($self) = @_;
+	$self->_close_hash_and_insert_object();
+	$self->_close_cat_blob();
+}
 
 
 # Pipe implementation for ActiveState Perl.
-- 
1.5.5.1.152.g9aeb7

  reply	other threads:[~2008-04-23 19:19 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-04-23 19:17 Speed up git-svn fetch Adam Roben
2008-04-23 19:17 ` [PATCH 01/11] Add tests for git cat-file Adam Roben
2008-04-23 19:17   ` [PATCH 02/11] git-cat-file: Small refactor of cmd_cat_file Adam Roben
2008-04-23 19:17     ` [PATCH 03/11] git-cat-file: Make option parsing a little more flexible Adam Roben
2008-04-23 19:17       ` [PATCH 04/11] git-cat-file: Add --batch-check option Adam Roben
2008-04-23 19:17         ` [PATCH 05/11] git-cat-file: Add --batch option Adam Roben
2008-04-23 19:17           ` [PATCH 06/11] Move git-hash-object tests from t5303 to t1007 Adam Roben
2008-04-23 19:17             ` [PATCH 07/11] Add more tests for git hash-object Adam Roben
2008-04-23 19:17               ` [PATCH 08/11] git-hash-object: Add --stdin-paths option Adam Roben
2008-04-23 19:17                 ` [PATCH 09/11] Git.pm: Add command_bidi_pipe and command_close_bidi_pipe Adam Roben
2008-04-23 19:17                   ` Adam Roben [this message]
2008-04-23 19:17                     ` [PATCH 11/11] git-svn: Speed up fetch Adam Roben
2008-04-25 18:04       ` [PATCH 03/11] git-cat-file: Make option parsing a little more flexible Junio C Hamano
2008-04-25  6:56   ` [PATCH 01/11] Add tests for git cat-file Eric Wong
2008-04-25 18:06     ` Junio C Hamano
2008-04-25 18:03   ` Junio C Hamano
2008-05-06  6:41     ` Junio C Hamano
2008-04-23 19:19 ` Speed up git-svn fetch Adam Roben
2008-04-25  7:15   ` Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1208978273-98146-11-git-send-email-aroben@apple.com \
    --to=aroben@apple.com \
    --cc=git@vger.kernel.org \
    --cc=normalperson@yhbt.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).