Re: Talk proposal: What 125K kernel bugs tell us about testing gaps

public inbox for kernelci@lists.linux.dev
 help / color / mirror / Atom feed

From: Greg KH <gregkh@linuxfoundation.org>
To: Jenny Qu <jenny@pebblebed.com>
Cc: kernelci@lists.linux.dev
Subject: Re: Talk proposal: What 125K kernel bugs tell us about testing gaps
Date: Thu, 5 Feb 2026 15:22:05 +0100	[thread overview]
Message-ID: <2026020530-cilantro-resisting-c65e@gregkh> (raw)
In-Reply-To: <CAPBP3tTV6CCB5-wmFhWKkyKJ1Cpj0EkmOH73By-zkOKDW8vYtQ@mail.gmail.com>

On Thu, Feb 05, 2026 at 12:58:20AM -0800, Jenny Qu wrote:
> [resending to list - accidentally replied off-list]
> 
> On Wed, Feb 04, 2026 at 11:00:00PM, Greg KH wrote:
> > I hate to say "your ai model could be replaced with a sql statement"
> 
> Fair point on the descriptive statistics. I should have been clearer:
> the 125K bug analysis was training data, not the contribution. verhaal
> and the LWN employer reports (Jonathan Corbet's per-release stats
> using the gitdm database) already cover the descriptive side well.
> 
> The part SQL can't do is the predictive model. VulnBERT takes a raw
> git diff *before merge* and predicts whether it introduces a
> vulnerability. The evaluation is a strict temporal holdout: trained
> on commits with Fixes: tags from <=2023, tested on 2024 commits that
> later received Fixes: tags. 92% recall, 1.2% FPR on that split.

Cool!  So you have re-implemented Sasha's AUTOSEL bot? :)

Note, there are papers and presentations about how that works for the
past 10 years, you might want to look into that as it seems that your
models are the same here (prediction as to what type of commit is a
fix).

> To be direct about limitations: those numbers are on historical data
> where we know ground truth. The model catches patterns it's seen
> before (unbalanced refcounts, missing NULL checks, lock/unlock
> mismatches). It will miss novel bug classes it hasn't been trained on.
> It's a triage tool and not yet an oracle.

That's fine, we need that.  And if you have a pattern that it matches,
let's add it to our coccinelle ruleset so that it does not come back in!

> And it's not ready for production use yet. I'm reworking the
> architecture. The current approach uses CodeBERT embeddings with
> handcrafted features, and I think incorporating LLM reasoning traces
> over diffs will do substantially better. I don't want to hand anyone
> a tool that generates false confidence.

Look at the ebpf "AI" patch reviews that are happening on the mailing
list today already if you want an example of how this could work.  Try
running it on the output of the lore.kernel.org git repos (email is in
git format for others to work easily off of, including the tool 'lei').
Then if your tool catches problems, email them to the patch authors and
list to let them know!

That's the best thing we can do now, catch bugs before they are
committed.

> 1. Subsystem-level test prioritization. The lifetime gap between
>    CAN bus (4.2 years) and gpu/i915 (1.4 years) almost certainly
>    reflects testing coverage differences. i915 has dedicated
>    fuzzing infrastructure and active reviewers like Chris Wilson
>    and Ville Syrjala. KernelCI could use lifetime data as a signal
>    for where to invest in test enablement. This is actionable now,
>    no ML required.

Yes, that is directly due to fuzzing issues.  Fuzzers work on a "layer
by layer" basis, working deeper into the kernel and adding different
subsystems all the time.  That's why you will see "waves" of bugfixes
happening like this.  It's normal and to be expected.

> 2. Longer-term: commit-level risk scoring to allocate CI resources.
>    Flag high-risk commits for extra sanitizer runs, longer fuzzing
>    passes. Low-risk commits get the standard pipeline. But this
>    needs a model I trust enough to deploy, and I'm not there yet.

Again, look at what is already happening on these types of reviews and
perhaps plug your model into that as well and see what happens?  We're
always wanting more code review to help alleviate our most limited
resource, maintainers to review changes.

thanks!

greg k-h

next prev parent reply	other threads:[~2026-02-05 14:22 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-05  2:49 Talk proposal: What 125K kernel bugs tell us about testing gaps Jenny Qu
2026-02-05  7:00 ` Greg KH
2026-02-05  8:58   ` Jenny Qu
2026-02-05 14:22     ` Greg KH [this message]
2026-02-05 19:31     ` Donald Zickus
     [not found]     ` <CAK18DXbBKCVPFfWMg3DCv_iHiUOWiAvAtVZ-J1nfQJ3fhbdb-g@mail.gmail.com>
2026-02-05 19:57       ` Jenny Qu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2026020530-cilantro-resisting-c65e@gregkh \
    --to=gregkh@linuxfoundation.org \
    --cc=jenny@pebblebed.com \
    --cc=kernelci@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox