git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Joshua Clayton <stillcompiling@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] Fix in Git.pm cat_blob crashes on large files
Date: Thu, 21 Feb 2013 17:43:20 -0500	[thread overview]
Message-ID: <20130221224319.GA19021@sigill.intra.peff.net> (raw)
In-Reply-To: <CAMB+bfKYLjmDavcLaO7scBPfTLmzqAmH+k9uBj0WJ+dzj9vuyA@mail.gmail.com>

On Thu, Feb 21, 2013 at 02:13:32PM -0800, Joshua Clayton wrote:

> Greetings.
> This is my first patch here. Hopefully I get the stylistic & political
> details right... :)
> Patch applies against maint and master

I have some comments. :)

The body of your email should contain the commit message (i.e., whatever
people reading "git log" a year from now would see). Cover letter bits
like this should go after the "---". That way "git am" knows which part
is which.

>         Developer's Certificate of Origin 1.1

You don't need to include the DCO. Your "Signed-off-by" is an indication
that you agree to it.

> Affects git svn clone/fetch
> Original code loaded entire file contents into a variable
> before writing to disk. If the offset within the variable passed
> 2 GiB, it becrame negative, resulting in a crash.

Interesting. I didn't think perl had signed wrap-around issues like
this, as its numeric variables are not strictly integers. But I don't
have a 32-bit machine to test on (and numbers larger than 2G obviously
work on 64-bit machines). At any rate, though:

> On a 32 bit system, or a system with low memory it may crash before
> reaching 2 GiB due to memory exhaustion.

Yeah, it is stupid to read the whole thing into memory if we are just
going to dump it to another filehandle.

> @@ -949,13 +951,21 @@ sub cat_blob {
>  		last unless $bytesLeft;
> 
>  		my $bytesToRead = $bytesLeft < 1024 ? $bytesLeft : 1024;
> -		my $read = read($in, $blob, $bytesToRead, $bytesRead);
> +		my $read = read($in, $blob, $bytesToRead, $blobSize);
>  		unless (defined($read)) {
>  			$self->_close_cat_blob();
>  			throw Error::Simple("in pipe went bad");
>  		}

Hmph. The existing code already reads in 1024-byte chunks. For no
reason, as far as I can tell, since we are just loading the blob buffer
incrementally into memory, only to then flush it all out at once.

Why do you read at the $blobSize offset? If we are just reading in
chunks, we be able to just keep writing to the start of our small
buffer, as we flush each chunk out before trying to read more.

IOW, shouldn't the final code look like this:

  my $bytesLeft = $size;
  while ($bytesLeft > 0) {
          my $buf;
          my $bytesToRead = $bytesLeft < 1024 ? $bytesLeft : 1024;
          my $read = read($in, $buf, $bytesToRead);
          unless (defined($read)) {
                  $self->_close_cat_blob();
                  throw Error::Simple("unable to read cat-blob pipe");
          }
          unless (print $fh $buf) {
                  $self->_close_cat_blob();
                  throw Error::Simple("unable to write blob output");
          }

          $bytesLeft -= $read;
  }

By having the read and flush size be the same, it's much simpler.

Your change (and my proposed code) do mean that an error during the read
operation will result in a truncated output file, rather than an empty
one. I think that is OK, though. That can happen anyway in the original
due to a failure in the "print" step. Any caller who wants to be careful
that they leave only a full file in place must either:

  1. Check the return value of cat_blob and verify that the result has
     $size bytes, and otherwise delete it.

  2. Write to a temporary file, then once success has been returned from
     cat_blob, rename the result into place.

Neither of which is affected by this change.

-Peff

  reply	other threads:[~2013-02-21 22:43 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-21 22:13 [PATCH] Fix in Git.pm cat_blob crashes on large files Joshua Clayton
2013-02-21 22:43 ` Jeff King [this message]
2013-02-21 23:18   ` Joshua Clayton
2013-02-21 23:24     ` Jeff King
2013-02-22 15:11       ` Joshua Clayton
2013-02-22 15:38         ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130221224319.GA19021@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=stillcompiling@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).