* [BUG] serious inflate inconsistency on master @ 2012-07-03 22:19 Jeff King 2012-07-03 22:40 ` Junio C Hamano 0 siblings, 1 reply; 7+ messages in thread From: Jeff King @ 2012-07-03 22:19 UTC (permalink / raw) To: Nguyễn Thái Ngọc Duy; +Cc: git I'm getting a 'serious inflate consistency' error while running "git verify-pack" (actually, "git index-pack --verify" under the hood). It bisects to 4614043 (index-pack: use streaming interface for collision test on large blobs, 2012-05-24). The interesting thing about this repository is that it has a 2.8G text file in it which compresses down to only about 420M. I'm not sure that 4614043 actually introduces the bug, but rather just triggers the code path. I'm able to reproduce it with the following script: # empty repo... git init repo && cd repo && # set this low to make sure we follow the unpack_data code-path git config core.bigfilethreshold 100k && # now make a file bigger than our threshold, but that will compress # well perl -le 'print for (1..100000)' >file && # and then make a commit git add file && git commit -m file && # and a pack with it git repack -ad && # and then verify that pack git verify-pack .git/objects/pack/*.pack The problem seems to be in index-pack.c:unpack_data, which does this: > git_inflate_init(&stream); > stream.next_out = data; > stream.avail_out = consume ? 64*1024 : obj->size; > > do { > unsigned char *last_out = stream.next_out; > ssize_t n = (len < 64*1024) ? len : 64*1024; > n = pread(pack_fd, inbuf, n, from); > if (n < 0) > die_errno(_("cannot pread pack file")); > if (!n) > die(Q_("premature end of pack file, %lu byte missing", > "premature end of pack file, %lu bytes missing", > len), > len); > from += n; > len -= n; > stream.next_in = inbuf; > stream.avail_in = n; > status = git_inflate(&stream, 0); > if (consume) { > if (consume(last_out, stream.next_out - last_out, cb_data)) { > free(inbuf); > free(data); > return NULL; > } > stream.next_out = data; > stream.avail_out = 64*1024; > } > } while (len && status == Z_OK && !stream.avail_in); > > /* This has been inflated OK when first encountered, so... */ > if (status != Z_STREAM_END || stream.total_out != obj->size) > die(_("serious inflate inconsistency")); We limit ourselves to handling just 64K at a time. So we read in 64K and stuff it in the next_in/avail_in buffer. And then we make 64K of buffer available for zlib to write into via the next_out/avail_out buffer. So zlib reads the first chunk, and after reading 28K or so fills up the 64K output buffer and returns. We call consume on the chunk, but when we hit the outer loop condition, stream.avail_in still mentions the 36K we haven't processed yet, and the loop ends with status == Z_OK, which triggers the assertion below it. So I don't really understand what this !stream.avail_in is doing there in the do-while loop. Don't we instead need to have an inner loop that keeps feeding the result of pread into git_inflate until we don't have any available data left? Something like the patch below, which seems to work for me, but I still don't understand the function of the !stream.avail_in check in the outer loop. -Peff diff --git a/builtin/index-pack.c b/builtin/index-pack.c index 8b5c1eb..0db1923 100644 --- a/builtin/index-pack.c +++ b/builtin/index-pack.c @@ -538,15 +538,19 @@ static void *unpack_data(struct object_entry *obj, len -= n; stream.next_in = inbuf; stream.avail_in = n; - status = git_inflate(&stream, 0); - if (consume) { - if (consume(last_out, stream.next_out - last_out, cb_data)) { - free(inbuf); - free(data); - return NULL; - } - stream.next_out = data; - stream.avail_out = 64*1024; + if (!consume) + status = git_inflate(&stream, 0); + else { + do { + status = git_inflate(&stream, 0); + if (consume(last_out, stream.next_out - last_out, cb_data)) { + free(inbuf); + free(data); + return NULL; + } + stream.next_out = data; + stream.avail_out = 64*1024; + } while (status == Z_OK && stream.avail_in); } } while (len && status == Z_OK && !stream.avail_in); ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [BUG] serious inflate inconsistency on master 2012-07-03 22:19 [BUG] serious inflate inconsistency on master Jeff King @ 2012-07-03 22:40 ` Junio C Hamano 2012-07-04 5:35 ` Nguyen Thai Ngoc Duy 0 siblings, 1 reply; 7+ messages in thread From: Junio C Hamano @ 2012-07-03 22:40 UTC (permalink / raw) To: Jeff King; +Cc: Nguyễn Thái Ngọc Duy, git Jeff King <peff@peff.net> writes: > The problem seems to be in index-pack.c:unpack_data, which does this: > >> git_inflate_init(&stream); >> stream.next_out = data; >> stream.avail_out = consume ? 64*1024 : obj->size; >> >> do { >> unsigned char *last_out = stream.next_out; >> ssize_t n = (len < 64*1024) ? len : 64*1024; >> n = pread(pack_fd, inbuf, n, from); >> if (n < 0) >> die_errno(_("cannot pread pack file")); >> if (!n) >> die(Q_("premature end of pack file, %lu byte missing", >> "premature end of pack file, %lu bytes missing", >> len), >> len); >> from += n; >> len -= n; >> stream.next_in = inbuf; >> stream.avail_in = n; >> status = git_inflate(&stream, 0); >> if (consume) { >> if (consume(last_out, stream.next_out - last_out, cb_data)) { >> free(inbuf); >> free(data); >> return NULL; >> } >> stream.next_out = data; >> stream.avail_out = 64*1024; >> } >> } while (len && status == Z_OK && !stream.avail_in); >> >> /* This has been inflated OK when first encountered, so... */ >> if (status != Z_STREAM_END || stream.total_out != obj->size) >> die(_("serious inflate inconsistency")); Yeah, that "if (consume)" part is clearly bogus. It should feed what it read in "inbuf" fully to git_inflate() in a (missing) inner loop and keep feeding consume() with the inflated data, but instead happily goes on to read more from the packfile once the initial part of the inflated data is fed to consume, ignoring the remainder. > So I don't really understand what this !stream.avail_in is doing there > in the do-while loop. Don't we instead need to have an inner loop that > keeps feeding the result of pread into git_inflate until we don't have > any available data left? Exactly. I do not think the avain_in check should be done at the end of the outer loop at all. When we are buffering the entire inflated data in core without using consume, we allocate enough memory to "data" to hold the whole thing, so in that case, it may be OK to expect that git_inflate() would never return without consuming the input bytes (i.e. stream.avail_in would always be zero at the site of the check), but with consume(), we give small piece of memory as a temporary output area and call git_inflate(), and it is entirely possible and normal for it to run out of output before inflating all the input bytes. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [BUG] serious inflate inconsistency on master 2012-07-03 22:40 ` Junio C Hamano @ 2012-07-04 5:35 ` Nguyen Thai Ngoc Duy 2012-07-04 6:31 ` Junio C Hamano 0 siblings, 1 reply; 7+ messages in thread From: Nguyen Thai Ngoc Duy @ 2012-07-04 5:35 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jeff King, git On Wed, Jul 4, 2012 at 5:40 AM, Junio C Hamano <gitster@pobox.com> wrote: > Jeff King <peff@peff.net> writes: > >> The problem seems to be in index-pack.c:unpack_data, which does this: >> >>> git_inflate_init(&stream); >>> stream.next_out = data; >>> stream.avail_out = consume ? 64*1024 : obj->size; >>> >>> do { >>> unsigned char *last_out = stream.next_out; >>> ssize_t n = (len < 64*1024) ? len : 64*1024; >>> n = pread(pack_fd, inbuf, n, from); >>> if (n < 0) >>> die_errno(_("cannot pread pack file")); >>> if (!n) >>> die(Q_("premature end of pack file, %lu byte missing", >>> "premature end of pack file, %lu bytes missing", >>> len), >>> len); >>> from += n; >>> len -= n; >>> stream.next_in = inbuf; >>> stream.avail_in = n; >>> status = git_inflate(&stream, 0); >>> if (consume) { >>> if (consume(last_out, stream.next_out - last_out, cb_data)) { >>> free(inbuf); >>> free(data); >>> return NULL; >>> } >>> stream.next_out = data; >>> stream.avail_out = 64*1024; >>> } >>> } while (len && status == Z_OK && !stream.avail_in); >>> >>> /* This has been inflated OK when first encountered, so... */ >>> if (status != Z_STREAM_END || stream.total_out != obj->size) >>> die(_("serious inflate inconsistency")); > > Yeah, that "if (consume)" part is clearly bogus. It should feed > what it read in "inbuf" fully to git_inflate() in a (missing) inner > loop and keep feeding consume() with the inflated data, but instead > happily goes on to read more from the packfile once the initial part > of the inflated data is fed to consume, ignoring the remainder. Agreed. >> So I don't really understand what this !stream.avail_in is doing there >> in the do-while loop. Don't we instead need to have an inner loop that >> keeps feeding the result of pread into git_inflate until we don't have >> any available data left? > > Exactly. I do not think the avain_in check should be done at the > end of the outer loop at all. When we are buffering the entire > inflated data in core without using consume, we allocate enough > memory to "data" to hold the whole thing, so in that case, it may be > OK to expect that git_inflate() would never return without consuming > the input bytes (i.e. stream.avail_in would always be zero at the > site of the check), but with consume(), we give small piece of > memory as a temporary output area and call git_inflate(), and it is > entirely possible and normal for it to run out of output before > inflating all the input bytes. Maybe the !stream.avail_in check is for safety? obj->size could be wrong due to an index-pack bug, leading to insufficient avail_out.. By the way I searched the commit that introduces that check with "git log --follow -p builtin/index-pack.c" but I could not find it. What did I do wrong? -- Duy ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [BUG] serious inflate inconsistency on master 2012-07-04 5:35 ` Nguyen Thai Ngoc Duy @ 2012-07-04 6:31 ` Junio C Hamano 2012-07-04 7:01 ` Nguyen Thai Ngoc Duy 2012-07-04 7:12 ` Jeff King 0 siblings, 2 replies; 7+ messages in thread From: Junio C Hamano @ 2012-07-04 6:31 UTC (permalink / raw) To: Nguyen Thai Ngoc Duy; +Cc: Jeff King, git Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes: > By the way I searched the commit that introduces that check with "git > log --follow -p builtin/index-pack.c" but I could not find it. What > did I do wrong? Your commit 8a2e163cc shows changes to the file at ll.535-540; these come from 776ea370 builtin-index-pack.c ll.383-388. $ git show 776ea370:builtin-index-pack.c The get_data_from_pack() function in that commit gives sufficient buffer to output side (avail_out starts out as obj->size), and feeds the data from the packfile in chunks. With the arrangement this commit makes to call git_inflate(), it should never get stuck because it ran out of output buffer. In each iteration of the loop, when the function returns, status should read Z_OK and the function should have consumed all input. But the version that uses consume() function does not give sufficient output buffer to ensure that the input will always be inflated fully (avoiding to use large output buffer is the whole point of your patch after all), so with your patch, that no longer holds true. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [BUG] serious inflate inconsistency on master 2012-07-04 6:31 ` Junio C Hamano @ 2012-07-04 7:01 ` Nguyen Thai Ngoc Duy 2012-07-04 7:24 ` Jeff King 2012-07-04 7:12 ` Jeff King 1 sibling, 1 reply; 7+ messages in thread From: Nguyen Thai Ngoc Duy @ 2012-07-04 7:01 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jeff King, git On Wed, Jul 4, 2012 at 1:31 PM, Junio C Hamano <gitster@pobox.com> wrote: > Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes: > >> By the way I searched the commit that introduces that check with "git >> log --follow -p builtin/index-pack.c" but I could not find it. What >> did I do wrong? > > Your commit 8a2e163cc shows changes to the file at ll.535-540; these > come from 776ea370 builtin-index-pack.c ll.383-388. > > $ git show 776ea370:builtin-index-pack.c git newbie's hat's on. How do you find 776ea370, git-blame? Another question is why doesn't git-log show that commit? -- Duy ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [BUG] serious inflate inconsistency on master 2012-07-04 7:01 ` Nguyen Thai Ngoc Duy @ 2012-07-04 7:24 ` Jeff King 0 siblings, 0 replies; 7+ messages in thread From: Jeff King @ 2012-07-04 7:24 UTC (permalink / raw) To: Nguyen Thai Ngoc Duy; +Cc: Junio C Hamano, git On Wed, Jul 04, 2012 at 02:01:06PM +0700, Nguyen Thai Ngoc Duy wrote: > On Wed, Jul 4, 2012 at 1:31 PM, Junio C Hamano <gitster@pobox.com> wrote: > > Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes: > > > >> By the way I searched the commit that introduces that check with "git > >> log --follow -p builtin/index-pack.c" but I could not find it. What > >> did I do wrong? > > > > Your commit 8a2e163cc shows changes to the file at ll.535-540; these > > come from 776ea370 builtin-index-pack.c ll.383-388. > > > > $ git show 776ea370:builtin-index-pack.c > > git newbie's hat's on. How do you find 776ea370, git-blame? Another > question is why doesn't git-log show that commit? I used git-blame to find it. As to your second question, I believe it is one of the side-effects of the way --follow is bolted onto the revision traversal. Look at: gitk -- builtin/index-pack.c builtin-index-pack.c and you will see that the commit in question happened on a simultaneous branch with the big builtin rename commit. Since we process 776ea370 before we hit the rename commit, we do not yet realize that builtin-index-pack.c is of interest to us. -Peff ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [BUG] serious inflate inconsistency on master 2012-07-04 6:31 ` Junio C Hamano 2012-07-04 7:01 ` Nguyen Thai Ngoc Duy @ 2012-07-04 7:12 ` Jeff King 1 sibling, 0 replies; 7+ messages in thread From: Jeff King @ 2012-07-04 7:12 UTC (permalink / raw) To: Junio C Hamano; +Cc: Nguyen Thai Ngoc Duy, git On Tue, Jul 03, 2012 at 11:31:43PM -0700, Junio C Hamano wrote: > Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes: > > > By the way I searched the commit that introduces that check with "git > > log --follow -p builtin/index-pack.c" but I could not find it. What > > did I do wrong? > > Your commit 8a2e163cc shows changes to the file at ll.535-540; these > come from 776ea370 builtin-index-pack.c ll.383-388. > > $ git show 776ea370:builtin-index-pack.c > > The get_data_from_pack() function in that commit gives sufficient > buffer to output side (avail_out starts out as obj->size), and feeds > the data from the packfile in chunks. With the arrangement this > commit makes to call git_inflate(), it should never get stuck > because it ran out of output buffer. In each iteration of the loop, > when the function returns, status should read Z_OK and the function > should have consumed all input. > > But the version that uses consume() function does not give > sufficient output buffer to ensure that the input will always be > inflated fully (avoiding to use large output buffer is the whole > point of your patch after all), so with your patch, that no longer > holds true. Yeah, that makes sense. I was wondering if we could get rid of the avail_in check (and instead just _always_ loop to drain avail_in, whether we are using consume() or not). The intent would be that the loop would be a harmless no-op in the non-consume case, because we would always drain the input completely on the first call. But that's not right; if we _didn't_ drain it, it's probably because the input is malformed (i.e., the deflated data is larger than the object size claims), and we would loop infinitely (because we are not extending the output buffer during each run of the loop). So the patch I posted earlier is the right direction. I didn't properly deal with moving last_out into the inner loop in that patch, but after checking to see where it goes, I'm pretty sure it's superfluous. Here it is with last_out removed and a proper commit message: -- >8 -- Subject: index-pack: loop while inflating objects in unpack_data When the unpack_data function is given a consume() callback, it unpacks only 64K of the input at a time, feeding it to git_inflate along with a 64K output buffer. However, because we are inflating, there is a good chance that the output buffer will fill before consuming all of the input. In this case, we need to loop on git_inflate until we have fed the whole input buffer, feeding each chunk of output to the consume buffer. The current code does not do this, and as a result, will fail the loop condition and trigger a fatal "serious inflate inconsistency" error in this case. While we're rearranging the loop, let's get rid of the extra last_out pointer. It is meant to point to the beginning of the buffer that we feed to git_inflate, but in practice this is always the beginning of our same 64K buffer, because: 1. At the beginning of the loop, we are feeding the buffer. 2. At the end of the loop, if we are using a consume() function, we reset git_inflate's pointer to the beginning of the buffer. If we are not using a consume() function, then we do not care about the value of last_out at all. Signed-off-by: Jeff King <peff@peff.net> --- builtin/index-pack.c | 23 +++++++++++++---------- 1 file changed, 13 insertions(+), 10 deletions(-) diff --git a/builtin/index-pack.c b/builtin/index-pack.c index 8b5c1eb..50d3876 100644 --- a/builtin/index-pack.c +++ b/builtin/index-pack.c @@ -524,7 +524,6 @@ static void *unpack_data(struct object_entry *obj, stream.avail_out = consume ? 64*1024 : obj->size; do { - unsigned char *last_out = stream.next_out; ssize_t n = (len < 64*1024) ? len : 64*1024; n = pread(pack_fd, inbuf, n, from); if (n < 0) @@ -538,15 +537,19 @@ static void *unpack_data(struct object_entry *obj, len -= n; stream.next_in = inbuf; stream.avail_in = n; - status = git_inflate(&stream, 0); - if (consume) { - if (consume(last_out, stream.next_out - last_out, cb_data)) { - free(inbuf); - free(data); - return NULL; - } - stream.next_out = data; - stream.avail_out = 64*1024; + if (!consume) + status = git_inflate(&stream, 0); + else { + do { + status = git_inflate(&stream, 0); + if (consume(data, stream.next_out - data, cb_data)) { + free(inbuf); + free(data); + return NULL; + } + stream.next_out = data; + stream.avail_out = 64*1024; + } while (status == Z_OK && stream.avail_in); } } while (len && status == Z_OK && !stream.avail_in); -- 1.7.11.rc1.21.g3c8d91e ^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2012-07-04 7:24 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-07-03 22:19 [BUG] serious inflate inconsistency on master Jeff King 2012-07-03 22:40 ` Junio C Hamano 2012-07-04 5:35 ` Nguyen Thai Ngoc Duy 2012-07-04 6:31 ` Junio C Hamano 2012-07-04 7:01 ` Nguyen Thai Ngoc Duy 2012-07-04 7:24 ` Jeff King 2012-07-04 7:12 ` Jeff King
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).