From: Joshua Clayton <stillcompiling@gmail.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] Fix in Git.pm cat_blob crashes on large files
Date: Thu, 21 Feb 2013 15:18:40 -0800 [thread overview]
Message-ID: <CAMB+bf+whVFD03neCh-gBORXOBoNjgaCbfP_mh8HgDy6UqGFZA@mail.gmail.com> (raw)
In-Reply-To: <20130221224319.GA19021@sigill.intra.peff.net>
On Thu, Feb 21, 2013 at 2:43 PM, Jeff King <peff@peff.net> wrote:
> On Thu, Feb 21, 2013 at 02:13:32PM -0800, Joshua Clayton wrote:
>
>> Greetings.
>> This is my first patch here. Hopefully I get the stylistic & political
>> details right... :)
>> Patch applies against maint and master
>
> I have some comments. :)
>
> The body of your email should contain the commit message (i.e., whatever
> people reading "git log" a year from now would see). Cover letter bits
> like this should go after the "---". That way "git am" knows which part
> is which.
>
>> Developer's Certificate of Origin 1.1
>
> You don't need to include the DCO. Your "Signed-off-by" is an indication
> that you agree to it.
>
>> Affects git svn clone/fetch
>> Original code loaded entire file contents into a variable
>> before writing to disk. If the offset within the variable passed
>> 2 GiB, it becrame negative, resulting in a crash.
>
> Interesting. I didn't think perl had signed wrap-around issues like
> this, as its numeric variables are not strictly integers. But I don't
> have a 32-bit machine to test on (and numbers larger than 2G obviously
> work on 64-bit machines). At any rate, though:
>
>> On a 32 bit system, or a system with low memory it may crash before
>> reaching 2 GiB due to memory exhaustion.
>
> Yeah, it is stupid to read the whole thing into memory if we are just
> going to dump it to another filehandle.
>
>> @@ -949,13 +951,21 @@ sub cat_blob {
>> last unless $bytesLeft;
>>
>> my $bytesToRead = $bytesLeft < 1024 ? $bytesLeft : 1024;
>> - my $read = read($in, $blob, $bytesToRead, $bytesRead);
>> + my $read = read($in, $blob, $bytesToRead, $blobSize);
>> unless (defined($read)) {
>> $self->_close_cat_blob();
>> throw Error::Simple("in pipe went bad");
>> }
>
> Hmph. The existing code already reads in 1024-byte chunks. For no
> reason, as far as I can tell, since we are just loading the blob buffer
> incrementally into memory, only to then flush it all out at once.
>
> Why do you read at the $blobSize offset? If we are just reading in
> chunks, we be able to just keep writing to the start of our small
> buffer, as we flush each chunk out before trying to read more.
>
> IOW, shouldn't the final code look like this:
>
> my $bytesLeft = $size;
> while ($bytesLeft > 0) {
> my $buf;
> my $bytesToRead = $bytesLeft < 1024 ? $bytesLeft : 1024;
> my $read = read($in, $buf, $bytesToRead);
> unless (defined($read)) {
> $self->_close_cat_blob();
> throw Error::Simple("unable to read cat-blob pipe");
> }
> unless (print $fh $buf) {
> $self->_close_cat_blob();
> throw Error::Simple("unable to write blob output");
> }
>
> $bytesLeft -= $read;
> }
>
> By having the read and flush size be the same, it's much simpler.
My original bugfix did just read 1024, and write 1024. That works fine
and, yes, is simpler.
I changed it to be more similar to the original code in case there
were performance reasons for doing it that way.
That was the only reason I could think of for the design, and adding
the $flushSize variable means that
some motivated person could easily optimize it.
So far I have been too lazy to profile the two versions....
I guess I'll try a trivial git svn init; git svn fetch and check back in.
Its running now.
>
> Your change (and my proposed code) do mean that an error during the read
> operation will result in a truncated output file, rather than an empty
> one. I think that is OK, though. That can happen anyway in the original
> due to a failure in the "print" step. Any caller who wants to be careful
> that they leave only a full file in place must either:
>
> 1. Check the return value of cat_blob and verify that the result has
> $size bytes, and otherwise delete it.
>
> 2. Write to a temporary file, then once success has been returned from
> cat_blob, rename the result into place.
>
> Neither of which is affected by this change.
>
> -Peff
In git svn fetch (which is how I discovered it) the file being passed
to cat_blob is a temporary file, which is checksummed before putting
it into place.
next prev parent reply other threads:[~2013-02-21 23:19 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-02-21 22:13 [PATCH] Fix in Git.pm cat_blob crashes on large files Joshua Clayton
2013-02-21 22:43 ` Jeff King
2013-02-21 23:18 ` Joshua Clayton [this message]
2013-02-21 23:24 ` Jeff King
2013-02-22 15:11 ` Joshua Clayton
2013-02-22 15:38 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAMB+bf+whVFD03neCh-gBORXOBoNjgaCbfP_mh8HgDy6UqGFZA@mail.gmail.com \
--to=stillcompiling@gmail.com \
--cc=git@vger.kernel.org \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).