Re: on ai generated and code provenance

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Alex Bennée" <alex.bennee@linaro.org>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>,  Warner Losh <imp@bsdimp.com>,
	 "Michael S. Tsirkin" <mst@redhat.com>,
	 qemu-devel@nongnu.org,  stefanha@redhat.com
Subject: Re: on ai generated and code provenance
Date: Wed, 27 May 2026 11:43:35 +0100	[thread overview]
Message-ID: <87se7dxhd4.fsf@draig.linaro.org> (raw)
In-Reply-To: <f8791a2d-257b-4233-aafb-ccd45e695542@redhat.com> (Paolo Bonzini's message of "Wed, 27 May 2026 12:01:10 +0200")

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 5/27/26 10:41, Kevin Wolf wrote:
>> Am 26.05.2026 um 21:52 hat Warner Losh geschrieben:
>>> The QEMU Project currently may accept limited uses of AI that produce
>>> high quality patches that are limited in the creative content added.
>>> While maintainers will ultimately decide, changes like the following
>>> fall within this policy
>>> 1. Fixing obvious warnings in the obvious ways suggested by the tool
>>> 2. Tree wide API changes, and other similar mechanical changes done
>>>     today with perl/python/sed/coccinelle
>> As I said in the paragraph you quoted below, I don't think we should
>> encourage using AI for tasks that a deterministic tool could do.
>
> In some cases such a tool does not exist.  Much to my surprise, there
> is no tool to do static type inference on Python code, but AI is very
> good at doing it.
>
>> Letting AI perform the change directly instead may be an acceptable
>> shortcut for a one-man hobby project that nobody else will ever look at,
>> but in the context of a community project like QEMU in which your
>> changes have to be reviewed and understood by others, it matters a lot
>> that the output of the tool is reproducible. Otherwise, you're creating
>> unnecessary work for others, and that isn't acceptable.
>
> When applicable, going through coccinelle (with the aid of AI if
> needed! is indeed a good middle ground as it helps reviewers for large
> changes. If you have many slightly different but easily separated
> changes (e.g. you can split the patch by struct field), it may make
> things worse.
>
> Its also worth noting that in other cases even sed or coccinelle,
> while deterministic, cannot produce 100% of the patch.
>
>> So maybe we should even explicitly mention a recommendation like the
>> following:
>>      If you can use a deterministic tool, don't use AI instead. If
>> you
>>      don't know how to use the deterministic tool, use the AI to tell you
>>      how to use it instead of trying to replace it.
>
> I like it.
>
>>> 3. Limited, small changes to fix bugs or add a small new feature whose
>>>     scope is less than about 100 lines and the originator can explain
>>>     them all or the meta issues about the patch.
>> Not sure if mentioning a number of lines is wise. 100 lines can be
>> mostly boilerplate and simple sequential code or they can be a deeply
>> nested complex algorithm.
>
> I'd put the threshold at 20-50 at most.
>
>> I think I would see more use in a tag like (better name welcome):
>>     AI-used-for: [code|tests|docs|commit message]...
>
> I like this *a lot*.  No need for free advertisement, but some
> traceability is useful.
>
> For tools such as sed or coccinelle, having the exact script in the
> patch or commit message useful.  Plus, the execution of the script
> more or lesss delimits the commit by itself (or 90%+ of it).  For LLMs
> it's a bit less clear cut because separating docs makes little sense.
> And the exact model is pointless, it will be obsolete in 6 months and
> provide no useful information.
>
> So, something like:
>
> ------------------- 8< -------------------
> Use of AI-generated content
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> The QEMU project currently allows using AI/LLM tools to produce
> patches in scenarios with limited creative content:
>
> Mechanical changes
>   If you can use a deterministic tool or a script, don't use AI instead.
>   If you don't know how to do the change deterministically, you may
>   ask the AI for help, rather than having it stand in for the tools.

I like the idea of pointing people towards tools but I wouldn't be quite
so prescriptive. The series MST referred to was easily eyeball-able and
I suspect the extra steps would generate friction for contributions.
That said the wider the change to the code base the more likely a random
hallucination can get lost in the noise.

Maybe:

  Mechanical changes
    Using AI tools to make simple mechanical changes is allowed. For larger
    tree-wide changes it is strongly recommended to use a deterministic
    tool like `sed` or `coccinelle`. You can use AI to help you craft the
    invocation for you.

?

> Small bug fixes
>   These should be limited to 20 lines of code or less, not including
>   tests.  You are still expected to understand and explain your changes
>   and the rationale behind them.
>
> These boundaries do not apply to other uses of AI, such as researching
> APIs or algorithms, static analysis, or debugging, provided their output
> is not included in contributions.  Larger uses of AI are allowed as an
> experiment, but they should be agreed upon with the maintainer prior
> to submission.
>
> Use of AI does not remove the need for authors to comply with all other
> requirements for contribution.  In particular, the "Signed-off-by"
> label in a patch submission is a statement that the author takes
> responsibility for the entire contents of the patch, certifying that
> their patch submission is made in accordance with the rules of the
> `Developer's Certificate of Origin (DCO) <dco>`.
>
> Commit messages for AI-assisted changes
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> When AI/LLM tools produce or substantively shape your patch, add an
> ``AI-used-for:`` trailer.  The text of the trailer could be one or
> more of ``code``, ``tests``, ``docs``, ``research``, possibly followed
> by an explanation in parentheses::
>
>     AI-used-for: tests, docs
>     AI-used-for: code
>     AI-used-for: code (refactoring)
>     AI-used-for: code (prototype)
>     AI-used-for: research
>
> The trailer is intended as a clarification of your DCO obligations as
> well as to guide reviewers.  It is not intended for minimal presence
> such as autocomplete or asking for a pre-review of the patch, and it
> does not remove your responsibility to understand the changes that you
> are submitting.
>
> Include the prompt in the commit message if it helps a reviewer judge
> the result:
>
> * yes: "move field ``foo`` from ``struct aa`` to ``struct bb``.  If a
>   function already has a local variable or parameter of type ``struct
>   bb``, use it instead of accessing ``aa.bb``."
>
> * yes: "add an implementation of the trait for ``Mutex<T: MyTrait>``,
>   forwarding the member functions to ``T`` while taking the lock
>   around the calls".
>
> * no: "write user-facing documentation for the new tool"
>
> * no: "write testcases for the new functions"
>
> Deterministic tooling (sed, coccinelle, formatters) is out of scope
> for the trailer, but should be mentioned in the commit message.

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

next prev parent reply	other threads:[~2026-05-27 10:44 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-24 12:42 on ai generated and code provenance Michael S. Tsirkin
2026-05-24 17:06 ` Alex Bennée
2026-05-24 17:42   ` Michael S. Tsirkin
2026-05-24 18:26   ` Warner Losh
2026-05-24 20:04     ` Michael S. Tsirkin
2026-05-24 20:11   ` Michael S. Tsirkin
2026-05-24 20:44     ` Stefan Hajnoczi
2026-05-25 15:27       ` Stefan Hajnoczi
2026-05-25 16:32 ` Paolo Bonzini
2026-05-25 17:15   ` Warner Losh
2026-05-25 19:44     ` Stefan Hajnoczi
2026-05-25 22:36       ` Michael S. Tsirkin
2026-05-26 13:16         ` Stefan Hajnoczi
2026-05-25 19:56     ` Paolo Bonzini
2026-05-26 21:48     ` Philippe Mathieu-Daudé
2026-05-26  8:23   ` Peter Maydell
2026-05-26  9:28     ` Alex Bennée
2026-05-26  9:57     ` Paolo Bonzini
2026-05-26 11:27       ` BALATON Zoltan
2026-05-26 12:30         ` Michael S. Tsirkin
2026-05-26 12:37           ` Manos Pitsidianakis
2026-05-26 13:00             ` Michael S. Tsirkin
2026-05-26 13:22         ` Stefan Hajnoczi
2026-05-26 14:01           ` Warner Losh
2026-05-27  7:11     ` Philippe Mathieu-Daudé
2026-05-26 17:43 ` Kevin Wolf
2026-05-26 18:03   ` Michael S. Tsirkin
2026-05-26 18:59     ` Kevin Wolf
2026-05-26 19:30       ` Michael S. Tsirkin
2026-05-26 19:52         ` Warner Losh
2026-05-27  8:41           ` Kevin Wolf
2026-05-27 10:01             ` Paolo Bonzini
2026-05-27 10:43               ` Alex Bennée [this message]
2026-05-27 12:49                 ` Kevin Wolf
2026-05-27 10:53               ` Kevin Wolf
2026-05-27 12:33                 ` Paolo Bonzini
2026-05-27 12:43                   ` Michael S. Tsirkin
2026-05-27 10:54               ` Alistair Francis
2026-05-27 14:21                 ` Warner Losh
2026-05-28  1:59                   ` Alistair Francis
2026-05-28  5:06                     ` Michael S. Tsirkin
2026-05-28  7:32                       ` Paolo Bonzini
2026-05-27 14:11               ` Michael S. Tsirkin
2026-05-27 14:14               ` Warner Losh
2026-05-27 14:51                 ` Kevin Wolf
2026-05-27 16:41                   ` Michael S. Tsirkin
2026-05-27 16:50                     ` Kevin Wolf
2026-05-27 16:56                       ` Michael S. Tsirkin
2026-05-27 17:06                       ` Michael S. Tsirkin
2026-05-27 17:15                         ` Warner Losh
2026-05-27 17:07                       ` Warner Losh
2026-05-27 16:05                 ` Paolo Bonzini
2026-05-27 16:48                   ` Michael S. Tsirkin
2026-05-27 16:57                     ` Warner Losh
2026-05-27 17:05                       ` Michael S. Tsirkin
2026-05-27 17:48                       ` Paolo Bonzini
2026-05-27 16:39               ` Michael S. Tsirkin
2026-05-26 19:50       ` Michael S. Tsirkin
2026-05-27  7:44         ` Kevin Wolf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87se7dxhd4.fsf@draig.linaro.org \
    --to=alex.bennee@linaro.org \
    --cc=imp@bsdimp.com \
    --cc=kwolf@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.