Re: bug-introducing patches (or: -rc cycles suck)

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Sasha Levin <Alexander.Levin@microsoft.com>
To: Julia Lawall <julia.lawall@lip6.fr>
Cc: Greg KH <gregkh@linuxfoundation.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: bug-introducing patches (or: -rc cycles suck)
Date: Tue, 1 May 2018 16:24:20 +0000	[thread overview]
Message-ID: <20180501162418.GC1468@sasha-vm> (raw)
In-Reply-To: <alpine.DEB.2.20.1804302109140.2269@hadrien>

On Mon, Apr 30, 2018 at 09:12:08PM +0200, Julia Lawall wrote:
>
>
>On Mon, 30 Apr 2018, Sasha Levin wrote:
>
>> Working on AUTOSEL, it became even more obvious to me how difficult it is for a patch to get a proper review. Maintainers found it difficult to keep up with the upstream work for their subsystem, and reviewing additional -stable patches put even more load on them which some suggested would be more than what they can handle.
>>
>> While AUTOSEL tries to understand if a patch fixes a bug, this was a bit late: the bug was already introduced, folks already have to deal with it, and the kernel is broken. I was wondering if I can do a similar process to AUTOSEL, but teach the AI about bug-introducing patches.
>>
>> When someone fixes a bug, he would describe the patch differently than he would if he was writing a new feature. This lets AUTOSEL build on different commit message constructs, among various inputs, to recognize bug fixes. However, people are unaware that they introduce a bug, so the commit message for bug introducing patches is essentially the same as for commits that don't introduce a bug. This meant that I had to try and source data out of different sources.
>>
>> Few of the parameters I ended up using are:
>>  - -next data (days spent in -next, changes in the patch between -next trees, ...)
>>  - Mailing list data (was this patch ever sent to a ML? How long before it was merged? How many replies did it get? ...)
>>  - Author/commiter/maintainer chain data. Just like sports, some folks are more likely to produce better results than others. This goes beyond just "skill", but also looks at things such as whether the author patches a subsystem he's "familiar with" (== subsystem where most of his patches usually go), or is he modifying a subsystem he never sent a patch for.
>>  - Patch complexity metrics - various code metrics to indicate how "complex" a patch is. Think 100 lines of whitespace fixes vs 100 lines that significantly changes a subsystem.
>>  - Kernel process correctness - I tried using "violations" of the kernel process (patch formatting, correctness of the mailing to lkml, etc) as an indicator of how familiar the author is with the kernel, with the presumption that folks who are newer to kernel development are more likely to introduce bugs
>
>I'm not completely sure to understand what you are doing.  Is there also
>some connection to things that are identified in some way as being bug
>introducing patches?  Or are you just using these as metrics of low
>quality?

Yes! My theory is that the things I listed above are actually better at
identifying bug introducing commits than plain code patterns or metrics.

To some extent, Coccinelle, smatch, etc already deal with identifying
problematic code patterns and addressing them.

>I wonder how far one could get by just collecting the set of patches that
>are referenced with fixes tags by stable patches, and then using machine
>learning taking into account only the code to find other patches that make
>similar changes.

This is exactly the training set I used. I didn't try looking at the
code itself because I don't have a good idea about how to turn code
patterns into something meaningfull for ML. Code metrics didn't prove
to be too useful for AUTOSEL so I sort of ignored it here (I only used
the same metrics we use for AUTOSEL).

     prev parent reply	other threads:[~2018-05-01 16:24 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-30 17:58 bug-introducing patches (or: -rc cycles suck) Sasha Levin
2018-04-30 18:38 ` Greg KH
2018-04-30 19:09 ` Willy Tarreau
2018-05-01 16:19   ` Sasha Levin
2018-05-01 16:50     ` Willy Tarreau
2018-05-01 17:21       ` Sasha Levin
2018-04-30 19:12 ` Julia Lawall
2018-05-01 16:24   ` Sasha Levin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180501162418.GC1468@sasha-vm \
    --to=alexander.levin@microsoft.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=julia.lawall@lip6.fr \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox