[PATCH v5] kernel: add kcov code coverage

public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed

From: kirill@shutemov.name (Kirill A. Shutemov)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v5] kernel: add kcov code coverage
Date: Tue, 19 Jan 2016 16:17:23 +0200	[thread overview]
Message-ID: <20160119141723.GA21596@node.shutemov.name> (raw)
In-Reply-To: <CACT4Y+YeM+_12e9WzG3306QPEDspNiHFmCBO4yjW_B-jzPrzdw@mail.gmail.com>

On Tue, Jan 19, 2016 at 03:06:10PM +0100, Dmitry Vyukov wrote:
> On Tue, Jan 19, 2016 at 1:42 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> > kcov provides code coverage collection for coverage-guided fuzzing
> > (randomized testing). Coverage-guided fuzzing is a testing technique
> > that uses coverage feedback to determine new interesting inputs to a
> > system. A notable user-space example is AFL
> > (http://lcamtuf.coredump.cx/afl/). However, this technique is not
> > widely used for kernel testing due to missing compiler and kernel
> > support.
> >
> > kcov does not aim to collect as much coverage as possible. It aims
> > to collect more or less stable coverage that is function of syscall
> > inputs. To achieve this goal it does not collect coverage in
> > soft/hard interrupts and instrumentation of some inherently
> > non-deterministic or non-interesting parts of kernel is disbled
> > (e.g. scheduler, locking).
> >
> > Currently there is a single coverage collection mode (tracing),
> > but the API anticipates additional collection modes.
> > Initially I also implemented a second mode which exposes
> > coverage in a fixed-size hash table of counters (what Quentin
> > used in his original patch). I've dropped the second mode for
> > simplicity.
> >
> > This patch adds the necessary support on kernel side.
> > The complimentary compiler support was added in gcc revision 231296.
> >
> > We've used this support to build syzkaller system call fuzzer,
> > which has found 90 kernel bugs in just 2 months:
> > https://github.com/google/syzkaller/wiki/Found-Bugs
> > We've also found 30+ bugs in our internal systems with syzkaller.
> > Another (yet unexplored) direction where kcov coverage would greatly
> > help is more traditional "blob mutation". For example, mounting
> > a random blob as a filesystem, or receiving a random blob over wire.
> >
> > Why not gcov. Typical fuzzing loop looks as follows: (1) reset
> > coverage, (2) execute a bit of code, (3) collect coverage, repeat.
> > A typical coverage can be just a dozen of basic blocks (e.g. an
> > invalid input). In such context gcov becomes prohibitively expensive
> > as reset/collect coverage steps depend on total number of basic
> > blocks/edges in program (in case of kernel it is about 2M). Cost of
> > kcov depends only on number of executed basic blocks/edges. On top of
> > that, kernel requires per-thread coverage because there are
> > always background threads and unrelated processes that also produce
> > coverage. With inlined gcov instrumentation per-thread coverage is not
> > possible.
> >
> > kcov exposes kernel PCs and control flow to user-space which
> > is insecure. But debugfs should not be mapped as user accessible.
> >
> > Based on a patch by Quentin Casasnovas.
> > Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
> > Reviewed-by: Kees Cook <keescook@chromium.org>
> > ---
> > Anticipating reasonable questions regarding usage of this feature.
> > Quentin Casasnovas and Vegard Nossum also plan to use kcov for
> > coverage-guided fuzzing. Currently they use a custom kernel patch
> > for their fuzzer and found several dozens of bugs.
> > There is also interest from Intel 0-DAY kernel test infrastructure.
> >
> > Based on commit a200dcb34693084e56496960d855afdeaaf9578f.
> >
> > v2: - added note to commit desciption that kcov is insecure,
> >       but debugfs should not be mapped as user accessible.
> >     - make CONFIG_KCOV depend on CONFIG_ARCH_HAS_KCOV
> >       instead of conditional inclusion with if/endif
> >       (as per Kees comments).
> >
> > v3: - disabled instrumentation of lib/hweight.c
> >     - changed task_struct.kcov_size type to unsigned
> >     - moved kcov.c from kernel/kcov/ to kernel/
> >     - fixed multi-line comment formatting
> >     - changed BUG_ONs to WARN_ONs
> >     - added kcov_get() helper
> >
> > v4: - pre-populate mapping with pages in kcov_mmap()
> >     - don't get kcov references on vma open/copy,
> >       vma holds a reference to the file which is enough
> >     - extend KCOV_INIT_TRACE to support both compressed
> >       4-byte PCs and full 8-byte PCs (it now accepts a struct)
> >     - update example in Documentation/kcov.txt
> >
> > v5: - export only unsigned long PCs (no compression to 4 bytes)
> >     - remove KCOV dependency on !RANDOMIZE_BASE
> 
> 
> I've made some measurements. Currently I have ~30MB of coverage data.
> Let's say it will grow 2x over time, that's 60MB. I also use a GC
> language so it actually consumes 2x = 120MB. If PCs are doubled,
> that's 240MB. I think I can live with this. Or I can somehow compress
> PCs to 4 bytes in user-space.
> So changed kcov to expose only unsigned-long-sized PCs as is. This
> makes the interface much cleaner. And also removes all potential
> issues wrt other archs and KASLR (user-space can canonicalize PCs
> using /proc/modules and kaslr base for text).

I wanted to mention one problem with 'long's: it will not work with 32-bit
userspace on 64-bit kernel. I think we need to have a way to communicate
size of PC.

-- 
 Kirill A. Shutemov

next prev parent reply	other threads:[~2016-01-19 14:17 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-19 12:42 [PATCH v5] kernel: add kcov code coverage Dmitry Vyukov
2016-01-19 14:06 ` Dmitry Vyukov
2016-01-19 14:17   ` Kirill A. Shutemov [this message]
2016-01-19 14:20     ` Dmitry Vyukov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160119141723.GA21596@node.shutemov.name \
    --to=kirill@shutemov.name \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox