Re: on ai generated and code provenance

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Kevin Wolf <kwolf@redhat.com>
To: Warner Losh <imp@bsdimp.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
	qemu-devel@nongnu.org, stefanha@redhat.com
Subject: Re: on ai generated and code provenance
Date: Wed, 27 May 2026 10:41:36 +0200	[thread overview]
Message-ID: <ahauQKLOU1tzDtbb@redhat.com> (raw)
In-Reply-To: <CANCZdfonroZmdRRpPdHzTKR_m8qyVdSG14gXB-K3BTuv=Qgw9g@mail.gmail.com>

Am 26.05.2026 um 21:52 hat Warner Losh geschrieben:
> On Tue, May 26, 2026 at 1:32 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> 
> > On Tue, May 26, 2026 at 08:59:55PM +0200, Kevin Wolf wrote:
> > > So yes, trivial patches is another obvious starting point. The challenge
> > > there is defining the line where a patch stops being trivial. So I'm not
> > > completely sure if making this distinction in a policy is a good idea;
> > > maybe practically speaking it has to be all or nothing in terms of
> > > creativity (for lack of a better word).
> >
> > Let the maintainers decide?
> >
> > Or we can enumerate things:
> > - fixing tool (compiler/checkpatch/smatch) errors/warnings in obvious ways
> > (e.g. suggested by the
> >   tools itself, such as initializing an uninitialized variable)
> > - propagating API changes (e.g. rebasing a patch after an API change)
> > - anything that could be done by a perl/sed/coccinelle script
> > - adding or fixing code comments
> >
> 
> Those are good examples. Perhaps the following words are good place to start
> to frame what I've seen expressed here:
> 
> The QEMU Project currently may accept limited uses of AI that produce
> high quality patches that are limited in the creative content added.
> While maintainers will ultimately decide, changes like the following
> fall within this policy
> 1. Fixing obvious warnings in the obvious ways suggested by the tool
> 2. Tree wide API changes, and other similar mechanical changes done
>    today with perl/python/sed/coccinelle

As I said in the paragraph you quoted below, I don't think we should
encourage using AI for tasks that a deterministic tool could do. If you
can use a deterministic tool like sed or Coccinelle for the job, you
should. I know that writing Coccinelle spatches can be challenging; that
is the part that you can ask AI to help with. (Perl and Python follow
the same logic as long as the script is simple, but obviously you have
to stop when the helper script becomes almost as complex as the change
itself.)

Letting AI perform the change directly instead may be an acceptable
shortcut for a one-man hobby project that nobody else will ever look at,
but in the context of a community project like QEMU in which your
changes have to be reviewed and understood by others, it matters a lot
that the output of the tool is reproducible. Otherwise, you're creating
unnecessary work for others, and that isn't acceptable.

So maybe we should even explicitly mention a recommendation like the
following:

    If you can use a deterministic tool, don't use AI instead. If you
    don't know how to use the deterministic tool, use the AI to tell you
    how to use it instead of trying to replace it.

> 3. Limited, small changes to fix bugs or add a small new feature whose
>    scope is less than about 100 lines and the originator can explain
>    them all or the meta issues about the patch.

Not sure if mentioning a number of lines is wise. 100 lines can be
mostly boilerplate and simple sequential code or they can be a deeply
nested complex algorithm.

> Maintainers are free to accept or reject changes outside these
> guidelines, but please check with the maintainers before sending to
> keep the load from AI content to something they can manage. Large and
> Very Large patches, especailly ones that have not been deeply
> analyised and tested by humans, should be avoided.
> 
> Though maybe the list of 'exceptions' needs work. But the basic
> framing is that we will accept some, high quality patches. Maintainers
> have some discression for larger pieces to a point, and we still don't
> want to drown in AI slop.

Yes, if we decide that we do want to make patch complexity/creative
expression/whatever you may call it part of the criteria, then having a
list like this looks like a possible approach. The details of what
exactly should be in it would certainly lead to more discussion, though.

Kevin

> Warner
> 
> 
> >
> > > As an aside, personally, I'm not convinced that AI can be a "better
> > > sed". If it's really about mechanical changes, I think the resulting
> > > patch is much more reviewable if the agent doesn't modify the code, but
> > > just generate the sed command line or the Coccinelle patch and that is
> > > included in the commit message. Reviewers can then just review that and
> > > then reproduce the result themselves for comparison. This is impossible
> > > with AI prompts and agents do tend to forget an instance of something to
> > > replace here and there, so you do have to review the result carefully.
> > >
> > > But none of these "better sed" problems need to handled in an AI policy.
> > > If a patch is hard to review, the maintainer will already reject it on
> > > those grounds.
> >
> > Absolutely.

next prev parent reply	other threads:[~2026-05-27  8:46 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-24 12:42 on ai generated and code provenance Michael S. Tsirkin
2026-05-24 17:06 ` Alex Bennée
2026-05-24 17:42   ` Michael S. Tsirkin
2026-05-24 18:26   ` Warner Losh
2026-05-24 20:04     ` Michael S. Tsirkin
2026-05-24 20:11   ` Michael S. Tsirkin
2026-05-24 20:44     ` Stefan Hajnoczi
2026-05-25 15:27       ` Stefan Hajnoczi
2026-05-25 16:32 ` Paolo Bonzini
2026-05-25 17:15   ` Warner Losh
2026-05-25 19:44     ` Stefan Hajnoczi
2026-05-25 22:36       ` Michael S. Tsirkin
2026-05-26 13:16         ` Stefan Hajnoczi
2026-05-25 19:56     ` Paolo Bonzini
2026-05-26 21:48     ` Philippe Mathieu-Daudé
2026-05-26  8:23   ` Peter Maydell
2026-05-26  9:28     ` Alex Bennée
2026-05-26  9:57     ` Paolo Bonzini
2026-05-26 11:27       ` BALATON Zoltan
2026-05-26 12:30         ` Michael S. Tsirkin
2026-05-26 12:37           ` Manos Pitsidianakis
2026-05-26 13:00             ` Michael S. Tsirkin
2026-05-26 13:22         ` Stefan Hajnoczi
2026-05-26 14:01           ` Warner Losh
2026-05-27  7:11     ` Philippe Mathieu-Daudé
2026-05-26 17:43 ` Kevin Wolf
2026-05-26 18:03   ` Michael S. Tsirkin
2026-05-26 18:59     ` Kevin Wolf
2026-05-26 19:30       ` Michael S. Tsirkin
2026-05-26 19:52         ` Warner Losh
2026-05-27  8:41           ` Kevin Wolf [this message]
2026-05-27 10:01             ` Paolo Bonzini
2026-05-27 10:43               ` Alex Bennée
2026-05-27 12:49                 ` Kevin Wolf
2026-05-27 10:53               ` Kevin Wolf
2026-05-27 12:33                 ` Paolo Bonzini
2026-05-27 12:43                   ` Michael S. Tsirkin
2026-05-27 10:54               ` Alistair Francis
2026-05-27 14:21                 ` Warner Losh
2026-05-28  1:59                   ` Alistair Francis
2026-05-28  5:06                     ` Michael S. Tsirkin
2026-05-28  7:32                       ` Paolo Bonzini
2026-05-27 14:11               ` Michael S. Tsirkin
2026-05-27 14:14               ` Warner Losh
2026-05-27 14:51                 ` Kevin Wolf
2026-05-27 16:41                   ` Michael S. Tsirkin
2026-05-27 16:50                     ` Kevin Wolf
2026-05-27 16:56                       ` Michael S. Tsirkin
2026-05-27 17:06                       ` Michael S. Tsirkin
2026-05-27 17:15                         ` Warner Losh
2026-05-27 17:07                       ` Warner Losh
2026-05-27 16:05                 ` Paolo Bonzini
2026-05-27 16:48                   ` Michael S. Tsirkin
2026-05-27 16:57                     ` Warner Losh
2026-05-27 17:05                       ` Michael S. Tsirkin
2026-05-27 17:48                       ` Paolo Bonzini
2026-05-27 16:39               ` Michael S. Tsirkin
2026-05-26 19:50       ` Michael S. Tsirkin
2026-05-27  7:44         ` Kevin Wolf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ahauQKLOU1tzDtbb@redhat.com \
    --to=kwolf@redhat.com \
    --cc=imp@bsdimp.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.