* [BUG] serious inflate inconsistency on master
@ 2012-07-03 22:19 Jeff King
2012-07-03 22:40 ` Junio C Hamano
0 siblings, 1 reply; 7+ messages in thread
From: Jeff King @ 2012-07-03 22:19 UTC (permalink / raw)
To: Nguyễn Thái Ngọc Duy; +Cc: git
I'm getting a 'serious inflate consistency' error while running "git
verify-pack" (actually, "git index-pack --verify" under the hood). It
bisects to 4614043 (index-pack: use streaming interface for collision
test on large blobs, 2012-05-24).
The interesting thing about this repository is that it has a 2.8G text
file in it which compresses down to only about 420M. I'm not sure that
4614043 actually introduces the bug, but rather just triggers the code
path.
I'm able to reproduce it with the following script:
# empty repo...
git init repo &&
cd repo &&
# set this low to make sure we follow the unpack_data code-path
git config core.bigfilethreshold 100k &&
# now make a file bigger than our threshold, but that will compress
# well
perl -le 'print for (1..100000)' >file &&
# and then make a commit
git add file &&
git commit -m file &&
# and a pack with it
git repack -ad &&
# and then verify that pack
git verify-pack .git/objects/pack/*.pack
The problem seems to be in index-pack.c:unpack_data, which does this:
> git_inflate_init(&stream);
> stream.next_out = data;
> stream.avail_out = consume ? 64*1024 : obj->size;
>
> do {
> unsigned char *last_out = stream.next_out;
> ssize_t n = (len < 64*1024) ? len : 64*1024;
> n = pread(pack_fd, inbuf, n, from);
> if (n < 0)
> die_errno(_("cannot pread pack file"));
> if (!n)
> die(Q_("premature end of pack file, %lu byte missing",
> "premature end of pack file, %lu bytes missing",
> len),
> len);
> from += n;
> len -= n;
> stream.next_in = inbuf;
> stream.avail_in = n;
> status = git_inflate(&stream, 0);
> if (consume) {
> if (consume(last_out, stream.next_out - last_out, cb_data)) {
> free(inbuf);
> free(data);
> return NULL;
> }
> stream.next_out = data;
> stream.avail_out = 64*1024;
> }
> } while (len && status == Z_OK && !stream.avail_in);
>
> /* This has been inflated OK when first encountered, so... */
> if (status != Z_STREAM_END || stream.total_out != obj->size)
> die(_("serious inflate inconsistency"));
We limit ourselves to handling just 64K at a time. So we read in 64K and
stuff it in the next_in/avail_in buffer. And then we make 64K of buffer
available for zlib to write into via the next_out/avail_out buffer. So
zlib reads the first chunk, and after reading 28K or so fills up the 64K
output buffer and returns. We call consume on the chunk, but when we hit
the outer loop condition, stream.avail_in still mentions the 36K we
haven't processed yet, and the loop ends with status == Z_OK, which
triggers the assertion below it.
So I don't really understand what this !stream.avail_in is doing there
in the do-while loop. Don't we instead need to have an inner loop that
keeps feeding the result of pread into git_inflate until we don't have
any available data left?
Something like the patch below, which seems to work for me, but I still
don't understand the function of the !stream.avail_in check in the outer
loop.
-Peff
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 8b5c1eb..0db1923 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -538,15 +538,19 @@ static void *unpack_data(struct object_entry *obj,
len -= n;
stream.next_in = inbuf;
stream.avail_in = n;
- status = git_inflate(&stream, 0);
- if (consume) {
- if (consume(last_out, stream.next_out - last_out, cb_data)) {
- free(inbuf);
- free(data);
- return NULL;
- }
- stream.next_out = data;
- stream.avail_out = 64*1024;
+ if (!consume)
+ status = git_inflate(&stream, 0);
+ else {
+ do {
+ status = git_inflate(&stream, 0);
+ if (consume(last_out, stream.next_out - last_out, cb_data)) {
+ free(inbuf);
+ free(data);
+ return NULL;
+ }
+ stream.next_out = data;
+ stream.avail_out = 64*1024;
+ } while (status == Z_OK && stream.avail_in);
}
} while (len && status == Z_OK && !stream.avail_in);
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [BUG] serious inflate inconsistency on master
2012-07-03 22:19 [BUG] serious inflate inconsistency on master Jeff King
@ 2012-07-03 22:40 ` Junio C Hamano
2012-07-04 5:35 ` Nguyen Thai Ngoc Duy
0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2012-07-03 22:40 UTC (permalink / raw)
To: Jeff King; +Cc: Nguyễn Thái Ngọc Duy, git
Jeff King <peff@peff.net> writes:
> The problem seems to be in index-pack.c:unpack_data, which does this:
>
>> git_inflate_init(&stream);
>> stream.next_out = data;
>> stream.avail_out = consume ? 64*1024 : obj->size;
>>
>> do {
>> unsigned char *last_out = stream.next_out;
>> ssize_t n = (len < 64*1024) ? len : 64*1024;
>> n = pread(pack_fd, inbuf, n, from);
>> if (n < 0)
>> die_errno(_("cannot pread pack file"));
>> if (!n)
>> die(Q_("premature end of pack file, %lu byte missing",
>> "premature end of pack file, %lu bytes missing",
>> len),
>> len);
>> from += n;
>> len -= n;
>> stream.next_in = inbuf;
>> stream.avail_in = n;
>> status = git_inflate(&stream, 0);
>> if (consume) {
>> if (consume(last_out, stream.next_out - last_out, cb_data)) {
>> free(inbuf);
>> free(data);
>> return NULL;
>> }
>> stream.next_out = data;
>> stream.avail_out = 64*1024;
>> }
>> } while (len && status == Z_OK && !stream.avail_in);
>>
>> /* This has been inflated OK when first encountered, so... */
>> if (status != Z_STREAM_END || stream.total_out != obj->size)
>> die(_("serious inflate inconsistency"));
Yeah, that "if (consume)" part is clearly bogus. It should feed
what it read in "inbuf" fully to git_inflate() in a (missing) inner
loop and keep feeding consume() with the inflated data, but instead
happily goes on to read more from the packfile once the initial part
of the inflated data is fed to consume, ignoring the remainder.
> So I don't really understand what this !stream.avail_in is doing there
> in the do-while loop. Don't we instead need to have an inner loop that
> keeps feeding the result of pread into git_inflate until we don't have
> any available data left?
Exactly. I do not think the avain_in check should be done at the
end of the outer loop at all. When we are buffering the entire
inflated data in core without using consume, we allocate enough
memory to "data" to hold the whole thing, so in that case, it may be
OK to expect that git_inflate() would never return without consuming
the input bytes (i.e. stream.avail_in would always be zero at the
site of the check), but with consume(), we give small piece of
memory as a temporary output area and call git_inflate(), and it is
entirely possible and normal for it to run out of output before
inflating all the input bytes.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [BUG] serious inflate inconsistency on master
2012-07-03 22:40 ` Junio C Hamano
@ 2012-07-04 5:35 ` Nguyen Thai Ngoc Duy
2012-07-04 6:31 ` Junio C Hamano
0 siblings, 1 reply; 7+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-07-04 5:35 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Jeff King, git
On Wed, Jul 4, 2012 at 5:40 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Jeff King <peff@peff.net> writes:
>
>> The problem seems to be in index-pack.c:unpack_data, which does this:
>>
>>> git_inflate_init(&stream);
>>> stream.next_out = data;
>>> stream.avail_out = consume ? 64*1024 : obj->size;
>>>
>>> do {
>>> unsigned char *last_out = stream.next_out;
>>> ssize_t n = (len < 64*1024) ? len : 64*1024;
>>> n = pread(pack_fd, inbuf, n, from);
>>> if (n < 0)
>>> die_errno(_("cannot pread pack file"));
>>> if (!n)
>>> die(Q_("premature end of pack file, %lu byte missing",
>>> "premature end of pack file, %lu bytes missing",
>>> len),
>>> len);
>>> from += n;
>>> len -= n;
>>> stream.next_in = inbuf;
>>> stream.avail_in = n;
>>> status = git_inflate(&stream, 0);
>>> if (consume) {
>>> if (consume(last_out, stream.next_out - last_out, cb_data)) {
>>> free(inbuf);
>>> free(data);
>>> return NULL;
>>> }
>>> stream.next_out = data;
>>> stream.avail_out = 64*1024;
>>> }
>>> } while (len && status == Z_OK && !stream.avail_in);
>>>
>>> /* This has been inflated OK when first encountered, so... */
>>> if (status != Z_STREAM_END || stream.total_out != obj->size)
>>> die(_("serious inflate inconsistency"));
>
> Yeah, that "if (consume)" part is clearly bogus. It should feed
> what it read in "inbuf" fully to git_inflate() in a (missing) inner
> loop and keep feeding consume() with the inflated data, but instead
> happily goes on to read more from the packfile once the initial part
> of the inflated data is fed to consume, ignoring the remainder.
Agreed.
>> So I don't really understand what this !stream.avail_in is doing there
>> in the do-while loop. Don't we instead need to have an inner loop that
>> keeps feeding the result of pread into git_inflate until we don't have
>> any available data left?
>
> Exactly. I do not think the avain_in check should be done at the
> end of the outer loop at all. When we are buffering the entire
> inflated data in core without using consume, we allocate enough
> memory to "data" to hold the whole thing, so in that case, it may be
> OK to expect that git_inflate() would never return without consuming
> the input bytes (i.e. stream.avail_in would always be zero at the
> site of the check), but with consume(), we give small piece of
> memory as a temporary output area and call git_inflate(), and it is
> entirely possible and normal for it to run out of output before
> inflating all the input bytes.
Maybe the !stream.avail_in check is for safety? obj->size could be
wrong due to an index-pack bug, leading to insufficient avail_out..
By the way I searched the commit that introduces that check with "git
log --follow -p builtin/index-pack.c" but I could not find it. What
did I do wrong?
--
Duy
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [BUG] serious inflate inconsistency on master
2012-07-04 5:35 ` Nguyen Thai Ngoc Duy
@ 2012-07-04 6:31 ` Junio C Hamano
2012-07-04 7:01 ` Nguyen Thai Ngoc Duy
2012-07-04 7:12 ` Jeff King
0 siblings, 2 replies; 7+ messages in thread
From: Junio C Hamano @ 2012-07-04 6:31 UTC (permalink / raw)
To: Nguyen Thai Ngoc Duy; +Cc: Jeff King, git
Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:
> By the way I searched the commit that introduces that check with "git
> log --follow -p builtin/index-pack.c" but I could not find it. What
> did I do wrong?
Your commit 8a2e163cc shows changes to the file at ll.535-540; these
come from 776ea370 builtin-index-pack.c ll.383-388.
$ git show 776ea370:builtin-index-pack.c
The get_data_from_pack() function in that commit gives sufficient
buffer to output side (avail_out starts out as obj->size), and feeds
the data from the packfile in chunks. With the arrangement this
commit makes to call git_inflate(), it should never get stuck
because it ran out of output buffer. In each iteration of the loop,
when the function returns, status should read Z_OK and the function
should have consumed all input.
But the version that uses consume() function does not give
sufficient output buffer to ensure that the input will always be
inflated fully (avoiding to use large output buffer is the whole
point of your patch after all), so with your patch, that no longer
holds true.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [BUG] serious inflate inconsistency on master
2012-07-04 6:31 ` Junio C Hamano
@ 2012-07-04 7:01 ` Nguyen Thai Ngoc Duy
2012-07-04 7:24 ` Jeff King
2012-07-04 7:12 ` Jeff King
1 sibling, 1 reply; 7+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-07-04 7:01 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Jeff King, git
On Wed, Jul 4, 2012 at 1:31 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:
>
>> By the way I searched the commit that introduces that check with "git
>> log --follow -p builtin/index-pack.c" but I could not find it. What
>> did I do wrong?
>
> Your commit 8a2e163cc shows changes to the file at ll.535-540; these
> come from 776ea370 builtin-index-pack.c ll.383-388.
>
> $ git show 776ea370:builtin-index-pack.c
git newbie's hat's on. How do you find 776ea370, git-blame? Another
question is why doesn't git-log show that commit?
--
Duy
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [BUG] serious inflate inconsistency on master
2012-07-04 6:31 ` Junio C Hamano
2012-07-04 7:01 ` Nguyen Thai Ngoc Duy
@ 2012-07-04 7:12 ` Jeff King
1 sibling, 0 replies; 7+ messages in thread
From: Jeff King @ 2012-07-04 7:12 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Nguyen Thai Ngoc Duy, git
On Tue, Jul 03, 2012 at 11:31:43PM -0700, Junio C Hamano wrote:
> Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:
>
> > By the way I searched the commit that introduces that check with "git
> > log --follow -p builtin/index-pack.c" but I could not find it. What
> > did I do wrong?
>
> Your commit 8a2e163cc shows changes to the file at ll.535-540; these
> come from 776ea370 builtin-index-pack.c ll.383-388.
>
> $ git show 776ea370:builtin-index-pack.c
>
> The get_data_from_pack() function in that commit gives sufficient
> buffer to output side (avail_out starts out as obj->size), and feeds
> the data from the packfile in chunks. With the arrangement this
> commit makes to call git_inflate(), it should never get stuck
> because it ran out of output buffer. In each iteration of the loop,
> when the function returns, status should read Z_OK and the function
> should have consumed all input.
>
> But the version that uses consume() function does not give
> sufficient output buffer to ensure that the input will always be
> inflated fully (avoiding to use large output buffer is the whole
> point of your patch after all), so with your patch, that no longer
> holds true.
Yeah, that makes sense. I was wondering if we could get rid of the
avail_in check (and instead just _always_ loop to drain avail_in,
whether we are using consume() or not). The intent would be that the
loop would be a harmless no-op in the non-consume case, because we would
always drain the input completely on the first call.
But that's not right; if we _didn't_ drain it, it's probably because the
input is malformed (i.e., the deflated data is larger than the object
size claims), and we would loop infinitely (because we are not extending
the output buffer during each run of the loop).
So the patch I posted earlier is the right direction. I didn't properly
deal with moving last_out into the inner loop in that patch, but after
checking to see where it goes, I'm pretty sure it's superfluous. Here it
is with last_out removed and a proper commit message:
-- >8 --
Subject: index-pack: loop while inflating objects in unpack_data
When the unpack_data function is given a consume() callback,
it unpacks only 64K of the input at a time, feeding it to
git_inflate along with a 64K output buffer. However,
because we are inflating, there is a good chance that the
output buffer will fill before consuming all of the input.
In this case, we need to loop on git_inflate until we have
fed the whole input buffer, feeding each chunk of output to
the consume buffer.
The current code does not do this, and as a result, will
fail the loop condition and trigger a fatal "serious inflate
inconsistency" error in this case.
While we're rearranging the loop, let's get rid of the
extra last_out pointer. It is meant to point to the
beginning of the buffer that we feed to git_inflate, but in
practice this is always the beginning of our same 64K
buffer, because:
1. At the beginning of the loop, we are feeding the
buffer.
2. At the end of the loop, if we are using a consume()
function, we reset git_inflate's pointer to the
beginning of the buffer. If we are not using a
consume() function, then we do not care about the value
of last_out at all.
Signed-off-by: Jeff King <peff@peff.net>
---
builtin/index-pack.c | 23 +++++++++++++----------
1 file changed, 13 insertions(+), 10 deletions(-)
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 8b5c1eb..50d3876 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -524,7 +524,6 @@ static void *unpack_data(struct object_entry *obj,
stream.avail_out = consume ? 64*1024 : obj->size;
do {
- unsigned char *last_out = stream.next_out;
ssize_t n = (len < 64*1024) ? len : 64*1024;
n = pread(pack_fd, inbuf, n, from);
if (n < 0)
@@ -538,15 +537,19 @@ static void *unpack_data(struct object_entry *obj,
len -= n;
stream.next_in = inbuf;
stream.avail_in = n;
- status = git_inflate(&stream, 0);
- if (consume) {
- if (consume(last_out, stream.next_out - last_out, cb_data)) {
- free(inbuf);
- free(data);
- return NULL;
- }
- stream.next_out = data;
- stream.avail_out = 64*1024;
+ if (!consume)
+ status = git_inflate(&stream, 0);
+ else {
+ do {
+ status = git_inflate(&stream, 0);
+ if (consume(data, stream.next_out - data, cb_data)) {
+ free(inbuf);
+ free(data);
+ return NULL;
+ }
+ stream.next_out = data;
+ stream.avail_out = 64*1024;
+ } while (status == Z_OK && stream.avail_in);
}
} while (len && status == Z_OK && !stream.avail_in);
--
1.7.11.rc1.21.g3c8d91e
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [BUG] serious inflate inconsistency on master
2012-07-04 7:01 ` Nguyen Thai Ngoc Duy
@ 2012-07-04 7:24 ` Jeff King
0 siblings, 0 replies; 7+ messages in thread
From: Jeff King @ 2012-07-04 7:24 UTC (permalink / raw)
To: Nguyen Thai Ngoc Duy; +Cc: Junio C Hamano, git
On Wed, Jul 04, 2012 at 02:01:06PM +0700, Nguyen Thai Ngoc Duy wrote:
> On Wed, Jul 4, 2012 at 1:31 PM, Junio C Hamano <gitster@pobox.com> wrote:
> > Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:
> >
> >> By the way I searched the commit that introduces that check with "git
> >> log --follow -p builtin/index-pack.c" but I could not find it. What
> >> did I do wrong?
> >
> > Your commit 8a2e163cc shows changes to the file at ll.535-540; these
> > come from 776ea370 builtin-index-pack.c ll.383-388.
> >
> > $ git show 776ea370:builtin-index-pack.c
>
> git newbie's hat's on. How do you find 776ea370, git-blame? Another
> question is why doesn't git-log show that commit?
I used git-blame to find it. As to your second question, I believe it is
one of the side-effects of the way --follow is bolted onto the revision
traversal. Look at:
gitk -- builtin/index-pack.c builtin-index-pack.c
and you will see that the commit in question happened on a simultaneous
branch with the big builtin rename commit. Since we process 776ea370
before we hit the rename commit, we do not yet realize that
builtin-index-pack.c is of interest to us.
-Peff
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2012-07-04 7:24 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-07-03 22:19 [BUG] serious inflate inconsistency on master Jeff King
2012-07-03 22:40 ` Junio C Hamano
2012-07-04 5:35 ` Nguyen Thai Ngoc Duy
2012-07-04 6:31 ` Junio C Hamano
2012-07-04 7:01 ` Nguyen Thai Ngoc Duy
2012-07-04 7:24 ` Jeff King
2012-07-04 7:12 ` Jeff King
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).