From: Samuel Bronson <naesten@gmail.com>
To: git@vger.kernel.org
Subject: Re: [PATCH] Add failing test: "fsck survives inflate errors"
Date: Sun, 20 Jul 2014 16:43:16 -0400 [thread overview]
Message-ID: <87mwc37od7.fsf@naesten.mooo.com> (raw)
In-Reply-To: <1405831383-22477-1-git-send-email-naesten@gmail.com> (Samuel Bronson's message of "Sun, 20 Jul 2014 00:43:03 -0400")
The following message is a courtesy copy of an article
that has been posted to gmane.comp.version-control.git as well.
Oh, I forgot to provide any analysis of the problem. Oops.
It may be just as well, though; I was tired enough that I might have
botched it in any case. So, have an analysis:
While inflate errors are obviously NOT GOOD, and should perhaps be
instantly fatal for most commands, git fsck is something of a special
case because it is useful to have *it* report as many corrupt objects as
possible in one run.
Unfortunately, this is not currently the case, as shown by the provided
testcase.
The output for this testcase is:
,----
| checking known breakage:
| hash1=ffffffffffffffffffffffffffffffffffffffff &&
| hash2=fffffffffffffffffffffffffffffffffffffffe &&
| mkdir -p .git/objects/ff &&
| echo not-zlib >$(sha1_file $hash1) &&
| test_when_finished "remove_object $hash1" &&
| echo not-zlib >$(sha1_file $hash2) &&
| test_when_finished "remove_object $hash2" &&
|
| # Return value is not documented
| test_might_fail git fsck 2>out &&
| cat out && echo ====== &&
| grep "$hash1.*corrupt" out &&
| grep "$hash2.*corrupt" out
|
| error: inflate: data stream error (incorrect header check)
| error: unable to unpack ffffffffffffffffffffffffffffffffffffffff header
| error: inflate: data stream error (incorrect header check)
| fatal: loose object ffffffffffffffffffffffffffffffffffffffff (stored in .git/objects/ff/ffffffffffffffffffffffffffffffffffffff) is corrupt
| ======
| fatal: loose object ffffffffffffffffffffffffffffffffffffffff (stored in .git/objects/ff/ffffffffffffffffffffffffffffffffffffff) is corrupt
| not ok 5 - fsck survives inflate errors # TODO known breakage
`----
If I flip it from expect_failure to expect_success and run the test with
-i, then go into the trash directory and run "gdb ../../git-fsck", I can
obtain this (thoroughly rehearsed & trimmed) gdb transcript:
,----
| % gdb ../../git-fsck
| GNU gdb (Debian 7.7.1-3) 7.7.1
...
| Reading symbols from ../../git-fsck...done.
| (gdb) break error
| Breakpoint 1 at 0x813d24c: file usage.c, line 143.
| (gdb) break die
| Breakpoint 2 at 0x813d152: file usage.c, line 94.
| (gdb) run
| Starting program: /home/naesten/hacking/git/git-fsck
| [Thread debugging using libthread_db enabled]
| Using host libthread_db library "/lib/i386-linux-gnu/i686/cmov/libthread_db.so.1".
| Checking object directories: 100% (256/256), done.
|
| Breakpoint 1, error (err=0x8182f7a "inflate: %s (%s)") at usage.c:143
| 143 {
| (gdb) bt
| #0 error (err=0x8182f7a "inflate: %s (%s)") at usage.c:143
| #1 0x081452ff in git_inflate (strm=0xbfffe6b8, flush=0)
| at zlib.c:144
| #2 0x08125367 in unpack_sha1_header (stream=<optimized out>,
| map=<optimized out>, mapsize=<optimized out>,
| buffer=<optimized out>, bufsiz=<optimized out>)
| at sha1_file.c:1515
| #3 0x08125546 in sha1_loose_object_info (
| sha1=sha1@entry=0x82659d4 '\377' <repeats 20 times>,
| oi=oi@entry=0xbfffe788) at sha1_file.c:2528
| #4 0x08126b2d in sha1_object_info_extended (
| sha1=0x82659d4 '\377' <repeats 20 times>, oi=0xbfffe788, flags=1)
| at sha1_file.c:2565
| #5 0x0812666f in sha1_object_info (
| sha1=0x82659d4 '\377' <repeats 20 times>, sizep=0x0)
| at sha1_file.c:2601
| #6 0x080f6941 in parse_object (
| sha1=0x82659d4 '\377' <repeats 20 times>) at object.c:247
| #7 0x080758ac in fsck_sha1 (
| sha1=sha1@entry=0x82659d4 '\377' <repeats 20 times>)
| at builtin/fsck.c:333
...
| (gdb) c
| Continuing.
| error: inflate: data stream error (incorrect header check)
|
| Breakpoint 1, error (err=0x817c525 "unable to unpack %s header")
| at usage.c:143
| 143 {
| (gdb) bt
| #0 error (err=0x817c525 "unable to unpack %s header") at usage.c:143
| #1 0x08125564 in sha1_loose_object_info (
| sha1=sha1@entry=0x82659d4 '\377' <repeats 20 times>,
| oi=oi@entry=0xbfffe788) at sha1_file.c:2529
| #2 0x08126b2d in sha1_object_info_extended (
| sha1=0x82659d4 '\377' <repeats 20 times>, oi=0xbfffe788, flags=1)
| at sha1_file.c:2565
| #3 0x0812666f in sha1_object_info (
| sha1=0x82659d4 '\377' <repeats 20 times>, sizep=0x0)
| at sha1_file.c:2601
| #4 0x080f6941 in parse_object (
| sha1=0x82659d4 '\377' <repeats 20 times>) at object.c:247
...
| (gdb) frame 4
| #4 0x080f6941 in parse_object (
| sha1=0x82659d4 '\377' <repeats 20 times>) at object.c:247
| warning: Source file is more recent than executable.
| 247 sha1_object_info(sha1, NULL) == OBJ_BLOB)) { // <-- first error
| (gdb) down
| #3 0x0812666f in sha1_object_info (
| sha1=0x82659d4 '\377' <repeats 20 times>, sizep=0x0)
| at sha1_file.c:2601
| 2601 if (sha1_object_info_extended(sha1, &oi, LOOKUP_REPLACE_OBJECT) < 0)
| (gdb) fin
| Run till exit from #3 0x0812666f in sha1_object_info (
| sha1=0x82659d4 '\377' <repeats 20 times>, sizep=0x0)
| at sha1_file.c:2601
| error: unable to unpack ffffffffffffffffffffffffffffffffffffffff header
| parse_object (sha1=0x82659d4 '\377' <repeats 20 times>)
| at object.c:246
| 246 (!obj && has_sha1_file(sha1) &&
| Value returned is $1 = -1
| (gdb) c
| Continuing.
|
| Breakpoint 1, error (err=0x8182f7a "inflate: %s (%s)") at usage.c:143
| 143 {
| (gdb) bt
| #0 error (err=0x8182f7a "inflate: %s (%s)") at usage.c:143
| #1 0x081452ff in git_inflate (strm=0xbfffe710, flush=0)
| at zlib.c:144
| #2 0x08125367 in unpack_sha1_header (stream=<optimized out>,
| map=<optimized out>, mapsize=<optimized out>,
| buffer=<optimized out>, bufsiz=<optimized out>)
| at sha1_file.c:1515
| #3 0x08125429 in unpack_sha1_file (map=map@entry=0xb7fdc000,
| mapsize=<optimized out>, type=type@entry=0xbfffe7d8,
| size=0xbfffe7dc, sha1=0x82659d4 '\377' <repeats 20 times>)
| at sha1_file.c:1620
| #4 0x08126024 in read_object (
| sha1=sha1@entry=0x82659d4 '\377' <repeats 20 times>,
| type=type@entry=0xbfffe7d8, size=size@entry=0xbfffe7dc)
| at sha1_file.c:2667
| #5 0x08126c33 in read_sha1_file_extended (
| sha1=0x82659d4 '\377' <repeats 20 times>, type=0xbfffe7d8,
| size=0xbfffe7dc, flag=1) at sha1_file.c:2690
| #6 0x080f69b8 in read_sha1_file (size=0xbfffe7dc, type=0xbfffe7d8,
| sha1=0x82659d4 '\377' <repeats 20 times>) at cache.h:853
| #7 parse_object (sha1=0x82659d4 '\377' <repeats 20 times>)
| at object.c:256
...
| (gdb) c
| Continuing.
| error: inflate: data stream error (incorrect header check)
|
| Breakpoint 2, die (
| err=0x817cdbc "loose object %s (stored in %s) is corrupt")
| at usage.c:94
| 94 {
| (gdb) bt
| #0 die (err=0x817cdbc "loose object %s (stored in %s) is corrupt")
| at usage.c:94
| #1 0x08126cc1 in read_sha1_file_extended (
| sha1=0x82659d4 '\377' <repeats 20 times>, type=0xbfffe7d8,
| size=0xbfffe7dc, flag=1) at sha1_file.c:2705
| #2 0x080f69b8 in read_sha1_file (size=0xbfffe7dc, type=0xbfffe7d8,
| sha1=0x82659d4 '\377' <repeats 20 times>) at cache.h:853
| #3 parse_object (sha1=0x82659d4 '\377' <repeats 20 times>)
| at object.c:256
...
| (gdb) frame 3
| #3 parse_object (sha1=0x82659d4 '\377' <repeats 20 times>)
| at object.c:256
| 256 buffer = read_sha1_file(sha1, &type, &size); // <-- death
| (gdb)
`----
It's probably clearest what's happened here if I just show you this ...
>From object.c:
--8<---------------cut here---------------start------------->8---
struct object *parse_object(const unsigned char *sha1)
{
unsigned long size;
enum object_type type;
int eaten;
const unsigned char *repl = lookup_replace_object(sha1);
void *buffer;
struct object *obj;
obj = lookup_object(sha1);
if (obj && obj->parsed)
return obj;
if ((obj && obj->type == OBJ_BLOB) ||
(!obj && has_sha1_file(sha1) &&
sha1_object_info(sha1, NULL) == OBJ_BLOB)) { // <-- first error
if (check_sha1_signature(repl, NULL, 0, NULL) < 0) {
error("sha1 mismatch %s", sha1_to_hex(repl));
return NULL;
}
parse_blob_buffer(lookup_blob(sha1), NULL, 0);
return lookup_object(sha1);
}
buffer = read_sha1_file(sha1, &type, &size); // <-- death
if (buffer) {
if (check_sha1_signature(repl, buffer, size, typename(type)) < 0) {
free(buffer);
error("sha1 mismatch %s", sha1_to_hex(repl));
return NULL;
}
obj = parse_object_buffer(sha1, type, size, buffer, &eaten);
if (!eaten)
free(buffer);
return obj;
}
return NULL;
}
--8<---------------cut here---------------end--------------->8---
(I've clearly added some marginal comments to my copy. ;-)
The first two lines of output
> error: inflate: data stream error (incorrect header check)
> error: unable to unpack ffffffffffffffffffffffffffffffffffffffff header
come from deep within the call "sha1_object_info(sha1, NULL)".
The next two lines
> error: inflate: data stream error (incorrect header check)
> fatal: loose object ffffffffffffffffffffffffffffffffffffffff (stored in .git/objects/ff/ffffffffffffffffffffffffffffffffffffff) is corrupt
and death come from the call "read_sha1_file(sha1, &type, &size)", which
is fair because that's exactly what the documentation comment above
read_sha1_file_extended() *says* will happen:
--8<---------------cut here---------------start------------->8---
/*
* This function dies on corrupt objects; the callers who want to
* deal with them should arrange to call read_object() and give error
* messages themselves.
*/
--8<---------------cut here---------------end--------------->8---
So, given that parse_object()'s documentation is:
--8<---------------cut here---------------start------------->8---
/*
* Returns the object, having parsed it to find out what it is.
*
* Returns NULL if the object is missing or corrupt.
*/
--8<---------------cut here---------------end--------------->8---
it probably should not call read_sha1_file() on a corrupt object.
Options for fixing this would appear to include:
1. Saving the result of sha1_object_info(sha1, NULL) to a variable and
returning early if the object is corrupt. (But what happens if there
is corruption far enough in that it isn't seen when trying to grab
the object header?)
2. Calling read_object() and giving our own error messages.
3. Making read_sha1_file_extended only *optionally* die; since it's
calling die() directly.
I'm also a bit nervous about this, though:
--8<---------------cut here---------------start------------->8---
static inline void *read_sha1_file(const unsigned char *sha1, enum object_type *type, unsigned long *size)
{
return read_sha1_file_extended(sha1, type, size, LOOKUP_REPLACE_OBJECT);
}
--8<---------------cut here---------------end--------------->8---
Do we really want that happening while scanning the loose objects?
--
Hi! I'm a .signature virus! Copy me into your ~/.signature to help me spread!
next prev parent reply other threads:[~2014-07-20 20:43 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-07-20 4:43 [PATCH] Add failing test: "fsck survives inflate errors" Samuel Bronson
2014-07-20 20:43 ` Samuel Bronson [this message]
2014-07-30 2:42 ` [BUG] parse_object() does not behave as documented Samuel Bronson
2014-08-09 5:43 ` Samuel Bronson
2014-08-09 6:53 ` [PATCH] Add failing test: "fsck survives inflate errors" Duy Nguyen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87mwc37od7.fsf@naesten.mooo.com \
--to=naesten@gmail.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).