git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vegard Nossum <vegard.nossum@oracle.com>
To: Jeff King <peff@peff.net>
Cc: gitster@pobox.com, git@vger.kernel.org
Subject: Re: [RFC][PATCH] index-pack: add testcases found using AFL
Date: Fri, 10 Mar 2017 20:34:45 +0100	[thread overview]
Message-ID: <eec5ab2a-7fe7-b47f-8073-a8212a9634f1@oracle.com> (raw)
In-Reply-To: <20170310190641.i7geazhrlmzzfna6@sigill.intra.peff.net>

On 10/03/2017 20:06, Jeff King wrote:
> On Fri, Mar 10, 2017 at 04:15:56PM +0100, Vegard Nossum wrote:
>
>> I've used AFL to generate a corpus of pack files that maximises the edge
>> coverage for 'git index-pack'.
>>
>> This is a supplement to (and not a replacement for) the regular test cases
>> where we know exactly what each test is checking for. These testcases are
>> more useful for avoiding regressions in edge cases or as a starting point
>> for future fuzzing efforts.
>>
>> To see the output of running 'git index-pack' on each file, you can do
>> something like this:
>>
>>   make -C t GIT_TEST_OPTS="--run=34 --verbose" t5300-pack-object.sh
>>
>> I observe the following coverage changes (for t5300 only):
>>
>>   path                  old%  new%    pp
>>   ----------------------------------------
>>   builtin/index-pack.c  74.3  76.6   2.3
>>   pack-write.c          79.8  80.4    .6
>>   patch-delta.c         67.4  81.4  14.0
>>   usage.c               26.6  35.5   8.9
>>   wrapper.c             42.0  46.1   4.1
>>   zlib.c                58.7  64.1   5.4
>
> I'm not sure how I feel about this. More coverage is good, I guess, but
> we don't have any idea what these packfiles are doing, or whether
> index-pack is behaving sanely in the new lines. The most we can say is
> that we tested more lines of code and that nothing segfaulted or
> triggered something like ASAN.
>
> That's something I guess, but I'm not enthused by the idea of just
> dumping a bunch of binary test cases that nobody, not even the author,
> understands.

I understand your concern. This is how I see it:

Negatives:

  - 'make test' takes 1 second longer to run

  - 548K data added to git.git

Positives:

  - regularly exercising more of the code, especially some of the corner
cases which are not caught by the rest of the test suite, possibly
catching bugs in a security-critical bit of git before it makes it into
a release

  - no impact to existing code, everything self-contained in 1 directory

  - giving more people access to the testcases I discovered without
having to repeat the effort of setting up AFL, fixing up SHA1 checksums,
minimising the corpus, running AFL for a week, etc. (each step by itself
is pretty small, but taken altogether I think it's worthwhile not
to have to repeat that).

Then I guess you have to weigh the negatives and positives. For me it's
a clear net win, but others may see it differently.

For sure, I (or somebody else) can go through each testcase and figure
out what it's doing, what it's doing differently from the existing
manual testcases in t5300, and what its expected output should be. It's
not that I couldn't understand what each testcase is doing if I tried,
but I don't think it's worth the effort. I've run everything under
valgrind and the only thing that turned up were some suspicious-looking
allocations (which AFAICT should be safe to ignore because of git's
built-in limits). Otherwise it's mostly hitting sanity checks:

error: inflate: data stream error (incorrect header check)
fatal: pack has 4 unresolved deltas
fatal: pack has bad object at offset 12: delta base offset is out of bound
fatal: invalid tag
...

These are errors you wouldn't see normally and which the existing
testcases don't check for (or not exhaustively or systematically, in any
case).

If somebody were to look at the code and say: "hey, that check looks a
bit off" (which is something I personally do all the time), then being
able to quickly find an input to execute exactly that line of code is
extremely valuable -- and you can do that simply by running the
testcases through gcov.

Anyway, the patch/data is there, use it or don't.


Vegard

  reply	other threads:[~2017-03-10 19:35 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20170310151556.18490-1-vegard.nossum@oracle.com>
2017-03-10 16:00 ` [RFC][PATCH] index-pack: add testcases found using AFL Vegard Nossum
2017-03-10 19:06 ` Jeff King
2017-03-10 19:34   ` Vegard Nossum [this message]
2017-03-10 19:42     ` Jeff King
2017-03-10 21:18       ` Vegard Nossum
2017-03-12 12:24         ` Jeff King
2017-03-10 22:58   ` Ævar Arnfjörð Bjarmason
2017-03-12 12:32     ` Jeff King
2017-03-12 13:44       ` Vegard Nossum
2017-03-12 18:14       ` Junio C Hamano
2017-03-13 11:07         ` Vegard Nossum
2017-03-13 17:11           ` Junio C Hamano
2017-03-13 19:13             ` Vegard Nossum

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=eec5ab2a-7fe7-b47f-8073-a8212a9634f1@oracle.com \
    --to=vegard.nossum@oracle.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).