* [PATCH] Fix in Git.pm cat_blob crashes on large files (resubmit with reviewed-by)
@ 2013-02-22 17:30 Joshua Clayton
2013-02-22 18:34 ` Jeff King
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Joshua Clayton @ 2013-02-22 17:30 UTC (permalink / raw)
To: git; +Cc: Jeff King, Erik Faye-Lund
Read and write each 1024 byte buffer, rather than trying to buffer
the entire content of the file.
Previous code would crash on all files > 2 Gib, when the offset variable
became negative (perhaps below the level of perl), resulting in a crash.
On a 32 bit system, or a system with low memory it might crash before
reaching 2 GiB due to memory exhaustion.
Signed-off-by: Joshua Clayton <stillcompiling@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>
---
perl/Git.pm | 12 +++++-------
1 file changed, 5 insertions(+), 7 deletions(-)
diff --git a/perl/Git.pm b/perl/Git.pm
index 931047c..cc91288 100644
--- a/perl/Git.pm
+++ b/perl/Git.pm
@@ -949,13 +949,16 @@ sub cat_blob {
last unless $bytesLeft;
my $bytesToRead = $bytesLeft < 1024 ? $bytesLeft : 1024;
- my $read = read($in, $blob, $bytesToRead, $bytesRead);
+ my $read = read($in, $blob, $bytesToRead);
unless (defined($read)) {
$self->_close_cat_blob();
throw Error::Simple("in pipe went bad");
}
-
$bytesRead += $read;
+ unless (print $fh $blob) {
+ $self->_close_cat_blob();
+ throw Error::Simple("couldn't write to passed
in filehandle");
+ }
}
# Skip past the trailing newline.
@@ -970,11 +973,6 @@ sub cat_blob {
throw Error::Simple("didn't find newline after blob");
}
- unless (print $fh $blob) {
- $self->_close_cat_blob();
- throw Error::Simple("couldn't write to passed in filehandle");
- }
-
return $size;
}
--
1.7.10.4
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] Fix in Git.pm cat_blob crashes on large files (resubmit with reviewed-by)
2013-02-22 17:30 [PATCH] Fix in Git.pm cat_blob crashes on large files (resubmit with reviewed-by) Joshua Clayton
@ 2013-02-22 18:34 ` Jeff King
2013-02-22 19:17 ` Joshua Clayton
2013-02-22 18:37 ` [PATCH] Fix in Git.pm cat_blob crashes on large files (resubmit Jeff King
2013-02-22 18:49 ` [PATCH] Fix in Git.pm cat_blob crashes on large files (resubmit with reviewed-by) Junio C Hamano
2 siblings, 1 reply; 5+ messages in thread
From: Jeff King @ 2013-02-22 18:34 UTC (permalink / raw)
To: Joshua Clayton; +Cc: git, Erik Faye-Lund
On Fri, Feb 22, 2013 at 09:30:57AM -0800, Joshua Clayton wrote:
> Read and write each 1024 byte buffer, rather than trying to buffer
> the entire content of the file.
OK. Did you ever repeat your timing with a larger symmetric buffer? That
should probably be a separate patch on top, but it might be worth doing
while we are thinking about it.
> Previous code would crash on all files > 2 Gib, when the offset variable
> became negative (perhaps below the level of perl), resulting in a crash.
I'm still slightly dubious of this, just because it doesn't match my
knowledge of perl (which is admittedly imperfect). I'm curious how you
diagnosed it?
> On a 32 bit system, or a system with low memory it might crash before
> reaching 2 GiB due to memory exhaustion.
>
> Signed-off-by: Joshua Clayton <stillcompiling@gmail.com>
> Reviewed-by: Jeff King <peff@peff.net>
The commit message is a good place to mention any side effects, and why
they are not a problem. Something like:
The previous code buffered the whole blob before writing, so any error
reading from cat-file would result in zero bytes being written to the
output stream. After this change, the output may be left in a
partially written state (or even fully written, if we fail when
parsing the final newline from cat-file). However, it's not reasonable
for callers to expect anything about the state of the output when we
return an error (after all, even with full buffering, we might fail
during the writing process). So any caller which cares about this is
broken already, and we do not have to worry about them.
> ---
> perl/Git.pm | 12 +++++-------
> 1 file changed, 5 insertions(+), 7 deletions(-)
The patch itself looks fine to me.
-Peff
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] Fix in Git.pm cat_blob crashes on large files (resubmit
2013-02-22 17:30 [PATCH] Fix in Git.pm cat_blob crashes on large files (resubmit with reviewed-by) Joshua Clayton
2013-02-22 18:34 ` Jeff King
@ 2013-02-22 18:37 ` Jeff King
2013-02-22 18:49 ` [PATCH] Fix in Git.pm cat_blob crashes on large files (resubmit with reviewed-by) Junio C Hamano
2 siblings, 0 replies; 5+ messages in thread
From: Jeff King @ 2013-02-22 18:37 UTC (permalink / raw)
To: Joshua Clayton; +Cc: git, Erik Faye-Lund
On Fri, Feb 22, 2013 at 09:30:57AM -0800, Joshua Clayton wrote:
> Subject: Re: [PATCH] Fix in Git.pm cat_blob crashes on large files
> (resubmit with reviewed-by)
One thing I forgot to note in my other response: the subject is part of
the commit message, so information like "resubmit with..." should go
with other cover letter material after the "---".
It's also customary to say something like "PATCHv2" in the brackets
(which does get stripped off), and mention what is changed since v1
after the "---". That can help reviewers remember what happened, and it
can help the maintainer understand where in the process of review the
patch is.
-Peff
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] Fix in Git.pm cat_blob crashes on large files (resubmit with reviewed-by)
2013-02-22 17:30 [PATCH] Fix in Git.pm cat_blob crashes on large files (resubmit with reviewed-by) Joshua Clayton
2013-02-22 18:34 ` Jeff King
2013-02-22 18:37 ` [PATCH] Fix in Git.pm cat_blob crashes on large files (resubmit Jeff King
@ 2013-02-22 18:49 ` Junio C Hamano
2 siblings, 0 replies; 5+ messages in thread
From: Junio C Hamano @ 2013-02-22 18:49 UTC (permalink / raw)
To: Joshua Clayton, Adam Roben; +Cc: git, Jeff King, Erik Faye-Lund
Joshua Clayton <stillcompiling@gmail.com> writes:
> Read and write each 1024 byte buffer, rather than trying to buffer
> the entire content of the file.
> Previous code would crash on all files > 2 Gib, when the offset variable
> became negative (perhaps below the level of perl), resulting in a crash.
> On a 32 bit system, or a system with low memory it might crash before
> reaching 2 GiB due to memory exhaustion.
>
> Signed-off-by: Joshua Clayton <stillcompiling@gmail.com>
> Reviewed-by: Jeff King <peff@peff.net>
> ---
Thanks.
>> Subject: Re: [PATCH] Fix in Git.pm cat_blob crashes on large files (resubmit with reviewed-by)
Please drop the () part. A rule of thumb is to make "git show"
output understandable by people who read it 6 months from now. They
do not care if the commit is a re-submission.
It seems that this issue was with us since the very beginning of
this sub since it was introduced at 7182530d8cad (Git.pm: Add
hash_and_insert_object and cat_blob, 2008-05-23).
> perl/Git.pm | 12 +++++-------
> 1 file changed, 5 insertions(+), 7 deletions(-)
>
> diff --git a/perl/Git.pm b/perl/Git.pm
> index 931047c..cc91288 100644
> --- a/perl/Git.pm
> +++ b/perl/Git.pm
> @@ -949,13 +949,16 @@ sub cat_blob {
> last unless $bytesLeft;
>
> my $bytesToRead = $bytesLeft < 1024 ? $bytesLeft : 1024;
> - my $read = read($in, $blob, $bytesToRead, $bytesRead);
> + my $read = read($in, $blob, $bytesToRead);
> unless (defined($read)) {
> $self->_close_cat_blob();
> throw Error::Simple("in pipe went bad");
> }
> -
> $bytesRead += $read;
> + unless (print $fh $blob) {
> + $self->_close_cat_blob();
> + throw Error::Simple("couldn't write to passed
> in filehandle");
> + }
Corrupt patch, line-wrapped by your MUA.
I wonder if we still need $bytesRead variable. You have $size that
is the size of the whole blob, so
my $bytesLeft = $size;
while (1) {
my $bytesToRead = $bytesLeft < 1024 ? $bytesLeft : 1024;
my $bytesRead = read($in, $blob, $bytesToRead);
... check errors and use the $blob ...
$bytesLeft -= $bytesRead;
}
may be simpler and easier to read, no?
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] Fix in Git.pm cat_blob crashes on large files (resubmit with reviewed-by)
2013-02-22 18:34 ` Jeff King
@ 2013-02-22 19:17 ` Joshua Clayton
0 siblings, 0 replies; 5+ messages in thread
From: Joshua Clayton @ 2013-02-22 19:17 UTC (permalink / raw)
To: Jeff King; +Cc: git, Erik Faye-Lund
Thanks for all the input and patience.
On Fri, Feb 22, 2013 at 10:34 AM, Jeff King <peff@peff.net> wrote:
> On Fri, Feb 22, 2013 at 09:30:57AM -0800, Joshua Clayton wrote:
>
>> Read and write each 1024 byte buffer, rather than trying to buffer
>> the entire content of the file.
>
> OK. Did you ever repeat your timing with a larger symmetric buffer? That
> should probably be a separate patch on top, but it might be worth doing
> while we are thinking about it.
>
>> Previous code would crash on all files > 2 Gib, when the offset variable
>> became negative (perhaps below the level of perl), resulting in a crash.
>
> I'm still slightly dubious of this, just because it doesn't match my
> knowledge of perl (which is admittedly imperfect). I'm curious how you
> diagnosed it?
I first had the memory exhaustion problem running my git repo on a 32 vm.
After bumping the memory from 512 to 4 GiB, and that failing to fix it
I moved to my workstation with 16 GiB
...reproduced
After the initial crash, I added
print $size, " ", $bytesToRead, " ", $bytesRead, "\n";
right before the read command, and it does indeed crash right after
the $bytesRead variable crosses LONG_MAX
...
2567089913 1024 2147482624
2567089913 1024 2147483648
2567089913 1024 2147484672
Offset outside string at /usr/share/perl5/Git.pm line 901, <GEN36> line 2604.
Note that $bytesRead is still positive.
I know very little perl, but that symptom seems pretty clear
>
>> On a 32 bit system, or a system with low memory it might crash before
>> reaching 2 GiB due to memory exhaustion.
>>
>> Signed-off-by: Joshua Clayton <stillcompiling@gmail.com>
>> Reviewed-by: Jeff King <peff@peff.net>
>
> The commit message is a good place to mention any side effects, and why
> they are not a problem. Something like:
>
> The previous code buffered the whole blob before writing, so any error
> reading from cat-file would result in zero bytes being written to the
> output stream. After this change, the output may be left in a
> partially written state (or even fully written, if we fail when
> parsing the final newline from cat-file). However, it's not reasonable
> for callers to expect anything about the state of the output when we
> return an error (after all, even with full buffering, we might fail
> during the writing process). So any caller which cares about this is
> broken already, and we do not have to worry about them.
>
>> ---
>> perl/Git.pm | 12 +++++-------
>> 1 file changed, 5 insertions(+), 7 deletions(-)
>
> The patch itself looks fine to me.
>
> -Peff
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2013-02-22 19:17 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-02-22 17:30 [PATCH] Fix in Git.pm cat_blob crashes on large files (resubmit with reviewed-by) Joshua Clayton
2013-02-22 18:34 ` Jeff King
2013-02-22 19:17 ` Joshua Clayton
2013-02-22 18:37 ` [PATCH] Fix in Git.pm cat_blob crashes on large files (resubmit Jeff King
2013-02-22 18:49 ` [PATCH] Fix in Git.pm cat_blob crashes on large files (resubmit with reviewed-by) Junio C Hamano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).