Re: on ai generated and code provenance

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Kevin Wolf <kwolf@redhat.com>
To: "Alex Bennée" <alex.bennee@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>, Warner Losh <imp@bsdimp.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	qemu-devel@nongnu.org, stefanha@redhat.com
Subject: Re: on ai generated and code provenance
Date: Wed, 27 May 2026 14:49:21 +0200	[thread overview]
Message-ID: <ahboUSAiArue3tTF@redhat.com> (raw)
In-Reply-To: <87se7dxhd4.fsf@draig.linaro.org>

Am 27.05.2026 um 12:43 hat Alex Bennée geschrieben:
> Paolo Bonzini <pbonzini@redhat.com> writes:
> 
> > On 5/27/26 10:41, Kevin Wolf wrote:
> >> Am 26.05.2026 um 21:52 hat Warner Losh geschrieben:
> >>> The QEMU Project currently may accept limited uses of AI that produce
> >>> high quality patches that are limited in the creative content added.
> >>> While maintainers will ultimately decide, changes like the following
> >>> fall within this policy
> >>> 1. Fixing obvious warnings in the obvious ways suggested by the tool
> >>> 2. Tree wide API changes, and other similar mechanical changes done
> >>>     today with perl/python/sed/coccinelle
> >> As I said in the paragraph you quoted below, I don't think we should
> >> encourage using AI for tasks that a deterministic tool could do.
> >
> > In some cases such a tool does not exist.  Much to my surprise, there
> > is no tool to do static type inference on Python code, but AI is very
> > good at doing it.
> >
> >> Letting AI perform the change directly instead may be an acceptable
> >> shortcut for a one-man hobby project that nobody else will ever look at,
> >> but in the context of a community project like QEMU in which your
> >> changes have to be reviewed and understood by others, it matters a lot
> >> that the output of the tool is reproducible. Otherwise, you're creating
> >> unnecessary work for others, and that isn't acceptable.
> >
> > When applicable, going through coccinelle (with the aid of AI if
> > needed! is indeed a good middle ground as it helps reviewers for large
> > changes. If you have many slightly different but easily separated
> > changes (e.g. you can split the patch by struct field), it may make
> > things worse.
> >
> > Its also worth noting that in other cases even sed or coccinelle,
> > while deterministic, cannot produce 100% of the patch.
> >
> >> So maybe we should even explicitly mention a recommendation like the
> >> following:
> >>      If you can use a deterministic tool, don't use AI instead. If
> >> you
> >>      don't know how to use the deterministic tool, use the AI to tell you
> >>      how to use it instead of trying to replace it.
> >
> > I like it.
> >
> >>> 3. Limited, small changes to fix bugs or add a small new feature whose
> >>>     scope is less than about 100 lines and the originator can explain
> >>>     them all or the meta issues about the patch.
> >> Not sure if mentioning a number of lines is wise. 100 lines can be
> >> mostly boilerplate and simple sequential code or they can be a deeply
> >> nested complex algorithm.
> >
> > I'd put the threshold at 20-50 at most.
> >
> >> I think I would see more use in a tag like (better name welcome):
> >>     AI-used-for: [code|tests|docs|commit message]...
> >
> > I like this *a lot*.  No need for free advertisement, but some
> > traceability is useful.
> >
> > For tools such as sed or coccinelle, having the exact script in the
> > patch or commit message useful.  Plus, the execution of the script
> > more or lesss delimits the commit by itself (or 90%+ of it).  For LLMs
> > it's a bit less clear cut because separating docs makes little sense.
> > And the exact model is pointless, it will be obsolete in 6 months and
> > provide no useful information.
> >
> > So, something like:
> >
> > ------------------- 8< -------------------
> > Use of AI-generated content
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > The QEMU project currently allows using AI/LLM tools to produce
> > patches in scenarios with limited creative content:
> >
> > Mechanical changes
> >   If you can use a deterministic tool or a script, don't use AI instead.
> >   If you don't know how to do the change deterministically, you may
> >   ask the AI for help, rather than having it stand in for the tools.
> 
> I like the idea of pointing people towards tools but I wouldn't be quite
> so prescriptive. The series MST referred to was easily eyeball-able and
> I suspect the extra steps would generate friction for contributions.
> That said the wider the change to the code base the more likely a random
> hallucination can get lost in the noise.
> 
> Maybe:
> 
>   Mechanical changes
>     Using AI tools to make simple mechanical changes is allowed. For larger
>     tree-wide changes it is strongly recommended to use a deterministic
>     tool like `sed` or `coccinelle`. You can use AI to help you craft the
>     invocation for you.

I think we do want to discourage the direct use of AI in such cases,
while not outright banning it. So maybe just a minor tweak to Paolo's
wording?

    Mechanical changes
      If you can use a deterministic tool or a script, it is preferred
      that you use it and not replace it with AI. If you don't know how
      to do the change deterministically, you can ask the AI for help,
      rather than having it stand in for the tools.

Kevin

next prev parent reply	other threads:[~2026-05-27 12:50 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-24 12:42 on ai generated and code provenance Michael S. Tsirkin
2026-05-24 17:06 ` Alex Bennée
2026-05-24 17:42   ` Michael S. Tsirkin
2026-05-24 18:26   ` Warner Losh
2026-05-24 20:04     ` Michael S. Tsirkin
2026-05-24 20:11   ` Michael S. Tsirkin
2026-05-24 20:44     ` Stefan Hajnoczi
2026-05-25 15:27       ` Stefan Hajnoczi
2026-05-25 16:32 ` Paolo Bonzini
2026-05-25 17:15   ` Warner Losh
2026-05-25 19:44     ` Stefan Hajnoczi
2026-05-25 22:36       ` Michael S. Tsirkin
2026-05-26 13:16         ` Stefan Hajnoczi
2026-05-25 19:56     ` Paolo Bonzini
2026-05-26 21:48     ` Philippe Mathieu-Daudé
2026-05-26  8:23   ` Peter Maydell
2026-05-26  9:28     ` Alex Bennée
2026-05-26  9:57     ` Paolo Bonzini
2026-05-26 11:27       ` BALATON Zoltan
2026-05-26 12:30         ` Michael S. Tsirkin
2026-05-26 12:37           ` Manos Pitsidianakis
2026-05-26 13:00             ` Michael S. Tsirkin
2026-05-26 13:22         ` Stefan Hajnoczi
2026-05-26 14:01           ` Warner Losh
2026-05-27  7:11     ` Philippe Mathieu-Daudé
2026-05-26 17:43 ` Kevin Wolf
2026-05-26 18:03   ` Michael S. Tsirkin
2026-05-26 18:59     ` Kevin Wolf
2026-05-26 19:30       ` Michael S. Tsirkin
2026-05-26 19:52         ` Warner Losh
2026-05-27  8:41           ` Kevin Wolf
2026-05-27 10:01             ` Paolo Bonzini
2026-05-27 10:43               ` Alex Bennée
2026-05-27 12:49                 ` Kevin Wolf [this message]
2026-05-27 10:53               ` Kevin Wolf
2026-05-27 12:33                 ` Paolo Bonzini
2026-05-27 12:43                   ` Michael S. Tsirkin
2026-05-27 10:54               ` Alistair Francis
2026-05-27 14:21                 ` Warner Losh
2026-05-28  1:59                   ` Alistair Francis
2026-05-28  5:06                     ` Michael S. Tsirkin
2026-05-28  7:32                       ` Paolo Bonzini
2026-05-27 14:11               ` Michael S. Tsirkin
2026-05-27 14:14               ` Warner Losh
2026-05-27 14:51                 ` Kevin Wolf
2026-05-27 16:41                   ` Michael S. Tsirkin
2026-05-27 16:50                     ` Kevin Wolf
2026-05-27 16:56                       ` Michael S. Tsirkin
2026-05-27 17:06                       ` Michael S. Tsirkin
2026-05-27 17:15                         ` Warner Losh
2026-05-27 17:07                       ` Warner Losh
2026-05-27 16:05                 ` Paolo Bonzini
2026-05-27 16:48                   ` Michael S. Tsirkin
2026-05-27 16:57                     ` Warner Losh
2026-05-27 17:05                       ` Michael S. Tsirkin
2026-05-27 17:48                       ` Paolo Bonzini
2026-05-27 16:39               ` Michael S. Tsirkin
2026-05-26 19:50       ` Michael S. Tsirkin
2026-05-27  7:44         ` Kevin Wolf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ahboUSAiArue3tTF@redhat.com \
    --to=kwolf@redhat.com \
    --cc=alex.bennee@linaro.org \
    --cc=imp@bsdimp.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.