From: Jeff King <peff@peff.net>
To: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Cc: git@vger.kernel.org
Subject: [BUG] serious inflate inconsistency on master
Date: Tue, 3 Jul 2012 18:19:01 -0400 [thread overview]
Message-ID: <20120703221900.GA28897@sigill.intra.peff.net> (raw)
I'm getting a 'serious inflate consistency' error while running "git
verify-pack" (actually, "git index-pack --verify" under the hood). It
bisects to 4614043 (index-pack: use streaming interface for collision
test on large blobs, 2012-05-24).
The interesting thing about this repository is that it has a 2.8G text
file in it which compresses down to only about 420M. I'm not sure that
4614043 actually introduces the bug, but rather just triggers the code
path.
I'm able to reproduce it with the following script:
# empty repo...
git init repo &&
cd repo &&
# set this low to make sure we follow the unpack_data code-path
git config core.bigfilethreshold 100k &&
# now make a file bigger than our threshold, but that will compress
# well
perl -le 'print for (1..100000)' >file &&
# and then make a commit
git add file &&
git commit -m file &&
# and a pack with it
git repack -ad &&
# and then verify that pack
git verify-pack .git/objects/pack/*.pack
The problem seems to be in index-pack.c:unpack_data, which does this:
> git_inflate_init(&stream);
> stream.next_out = data;
> stream.avail_out = consume ? 64*1024 : obj->size;
>
> do {
> unsigned char *last_out = stream.next_out;
> ssize_t n = (len < 64*1024) ? len : 64*1024;
> n = pread(pack_fd, inbuf, n, from);
> if (n < 0)
> die_errno(_("cannot pread pack file"));
> if (!n)
> die(Q_("premature end of pack file, %lu byte missing",
> "premature end of pack file, %lu bytes missing",
> len),
> len);
> from += n;
> len -= n;
> stream.next_in = inbuf;
> stream.avail_in = n;
> status = git_inflate(&stream, 0);
> if (consume) {
> if (consume(last_out, stream.next_out - last_out, cb_data)) {
> free(inbuf);
> free(data);
> return NULL;
> }
> stream.next_out = data;
> stream.avail_out = 64*1024;
> }
> } while (len && status == Z_OK && !stream.avail_in);
>
> /* This has been inflated OK when first encountered, so... */
> if (status != Z_STREAM_END || stream.total_out != obj->size)
> die(_("serious inflate inconsistency"));
We limit ourselves to handling just 64K at a time. So we read in 64K and
stuff it in the next_in/avail_in buffer. And then we make 64K of buffer
available for zlib to write into via the next_out/avail_out buffer. So
zlib reads the first chunk, and after reading 28K or so fills up the 64K
output buffer and returns. We call consume on the chunk, but when we hit
the outer loop condition, stream.avail_in still mentions the 36K we
haven't processed yet, and the loop ends with status == Z_OK, which
triggers the assertion below it.
So I don't really understand what this !stream.avail_in is doing there
in the do-while loop. Don't we instead need to have an inner loop that
keeps feeding the result of pread into git_inflate until we don't have
any available data left?
Something like the patch below, which seems to work for me, but I still
don't understand the function of the !stream.avail_in check in the outer
loop.
-Peff
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 8b5c1eb..0db1923 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -538,15 +538,19 @@ static void *unpack_data(struct object_entry *obj,
len -= n;
stream.next_in = inbuf;
stream.avail_in = n;
- status = git_inflate(&stream, 0);
- if (consume) {
- if (consume(last_out, stream.next_out - last_out, cb_data)) {
- free(inbuf);
- free(data);
- return NULL;
- }
- stream.next_out = data;
- stream.avail_out = 64*1024;
+ if (!consume)
+ status = git_inflate(&stream, 0);
+ else {
+ do {
+ status = git_inflate(&stream, 0);
+ if (consume(last_out, stream.next_out - last_out, cb_data)) {
+ free(inbuf);
+ free(data);
+ return NULL;
+ }
+ stream.next_out = data;
+ stream.avail_out = 64*1024;
+ } while (status == Z_OK && stream.avail_in);
}
} while (len && status == Z_OK && !stream.avail_in);
next reply other threads:[~2012-07-03 22:19 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-07-03 22:19 Jeff King [this message]
2012-07-03 22:40 ` [BUG] serious inflate inconsistency on master Junio C Hamano
2012-07-04 5:35 ` Nguyen Thai Ngoc Duy
2012-07-04 6:31 ` Junio C Hamano
2012-07-04 7:01 ` Nguyen Thai Ngoc Duy
2012-07-04 7:24 ` Jeff King
2012-07-04 7:12 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120703221900.GA28897@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=pclouds@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).