Re: [PATCH v2] docs/devel: relax policy on AI-generated contributions

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Michael S. Tsirkin" <mst@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-devel@nongnu.org, "Alex Bennée" <alex.bennee@linaro.org>,
	"Alistair Francis" <alistair.francis@wdc.com>,
	"BALATON Zoltan" <balaton@eik.bme.hu>,
	"Daniel P. Berrangé" <berrange@redhat.com>,
	"Fabiano Rosas" <farosas@suse.de>,
	"Kevin Wolf" <kwolf@redhat.com>,
	"Peter Maydell" <peter.maydell@linaro.org>,
	"Warner Losh" <imp@bsdimp.com>,
	"Philippe Mathieu-Daudé" <philmd@linaro.org>,
	"Paolo Bonzini" <bonzini@gnu.org>
Subject: Re: [PATCH v2] docs/devel: relax policy on AI-generated contributions
Date: Tue, 23 Jun 2026 15:26:55 -0400	[thread overview]
Message-ID: <20260623150758-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20260529094619.1034458-1-pbonzini@redhat.com>

On Fri, May 29, 2026 at 11:46:19AM +0200, Paolo Bonzini wrote:
> Until now QEMU's code provenance policy declined any contribution
> believed to include or derive from AI-generated content.  A blanket ban
> was easy to maintain while LLM output was rarely usable on its own, but
> as the tools improved an absolute prohibition has become harder to
> justify.

In the hope to move this forward, here's an attempt to get all feedback
in one place. GPT-5.4. Unreliable of course but we have the contributors
here after all ). Guys, anyone feels any of his feedback got missed or
misstated?


## Alex Bennée

- Pointed out two text nits in the patch: a stray closing `**` and the wording `deterministic tool`, suggesting `deterministic tool or script`.
- Wanted an explicit rule that AI must not write commit messages, because writing the summary and rationale is part of demonstrating that the human author understands the change.
- Was okay with AI helping only with grammar and spelling correction of human-written commit messages.
- Later posted an experimental AI-generated rewrite that:
  - split the AI policy into a dedicated `ai-usage.rst`,
  - added explicit human-accountability language,
  - made `Signed-off-by` human-only,
  - discouraged prompt dumping in commit messages,
  - and banned AI-attribution tags other than `AI-used-for:`.
- Noted that the model was good at extracting text from the discussion but was not applying real judgment, so he normally prefers to review and reword AI-generated documentation hunks by hand.
- In the later security-report side discussion, said the interesting part is whether AI audit tooling finds useful issues compared to fuzzing/static analysis, not which model/vendor produced the report.

## BALATON Zoltan

- Said the terminology around `trailers` was confusing because elsewhere in the docs these are referred to as `tags`; that mismatch made the draft harder to follow.
- Otherwise thought the revised wording was clearer.
- Later objected to treating LLM output as presumptively public domain:
  - generated output may still contain copied or derived GPL code,
  - or code originating from incompatible or proprietary sources,
  - so "no human copyright holder" does not automatically make the result safe or public domain.
- Also noted that code added to QEMU without an explicit license is still governed by QEMU's licensing rules, so simplistic public-domain assumptions are risky.

## Peter Maydell

- Objected to allowing an individual maintainer to decide that larger AI-generated contributions are acceptable:
  - if the project concern is legal/provenance blast radius, that should be a project-wide rule,
  - not something that varies by maintainer preference.
- Was especially skeptical of allowing AI-generated documentation and comments:
  - code at least has compile/test guardrails,
  - prose docs have only human review,
  - and documentation/comments are supposed to reflect intended behavior, not auto-generated explanations.
- Drew a distinction between:
  - acceptable assistance such as grammar correction or translation of human-authored text,
  - versus asking AI to draft documentation from scratch, which he was much less happy with.
- In the later Coverity-style discussion, said issue identifiers can occasionally be useful for refinding patches/commits, though not as part of an everyday workflow.

## Stefan Hajnoczi

- Flagged one sentence in the proposal as potentially suggesting that submitters no longer need to understand the code they send.
- Wanted the policy to stay firmly in the model where the human contributor still understands and is responsible for the submission.
- Suggested wording along the lines of "since the risk of bugs not discovered by the submitter increases".
- Suggested moving the AI policy into a separate document and referencing it from `AGENTS.md`, so coding agents operating in-tree are explicitly told to refuse tasks that violate the policy.

## Daniel P. Berrangé

- Objected to using "projects accepting AI-assisted content have not run into serious legal trouble so far" as reassuring evidence:
  - copyright risk is a slow-burn issue,
  - lack of lawsuits so far is not strong evidence that the legal risk is low.
- Said `small bug fixes` does not line up well with the separate concern about `core code`:
  - the real policy goal is closer to low originality / low copyrightability risk / easy reversibility,
  - not "bug fix" as a category.
- Strongly preferred putting the AI rules into a dedicated `ai-usage` document for easier linking and clarity.
- Wanted the policy to cover social expectations as well as legal/technical ones:
  - QEMU collaboration should remain human-to-human,
  - contributors should not feed review mail into an LLM and paste the answer back,
  - reviewers using AI should disclose that fact,
  - and contributor identities should still represent real humans even when pseudonymous.
- Said the policy's "spirit" needs to live in the policy text itself, not only in a commit message, because later readers of the policy will never see the commit message rationale.
- Wanted stronger and earlier wording that only a human may add `Signed-off-by`, and pointed to Linux kernel wording as a model.
- Wanted explicit human authorship of commit messages and cover letters where non-trivial explanation is required.
- Liked `AI-used-for:` because it gives reviewers useful information without advertising a vendor.
- Wanted `shape your patch` tightened to something like `shape the content of the submitted patch`, so background AI use is excluded more clearly.
- Wanted unconditional disclosure of AI use, because provenance matters even when the surviving AI-generated portion is small.
- Thought prompts generally should not be included:
  - if the information matters to reviewers, it belongs in the human-written commit message,
  - otherwise it just adds clutter.
- Thought `QEMU does not use Assisted-by / Co-authored-by / Generated-by` was too weak:
  - if those tags are unwanted, the policy should explicitly forbid them,
  - and possibly enforce that in `checkpatch.pl`.
- Said the general tag rules should also be documented earlier in the provenance docs, not only in the AI section.
- Was especially wary of prose documentation under `docs/`:
  - AI prose can become convincing-sounding slop,
  - review of prose is already expensive,
  - and non-expert contributors may not actually be able to fact-check the text.
- Proposed handling documentation more incrementally:
  - start with a tightly constrained initial docs policy,
  - then relax it later only if experience shows that broader allowances are worth it.
- Was more comfortable with inline API docs/comments than with prose documentation.
- Opposed leaving larger exceptions to individual maintainer discretion because that would create inconsistent standards across subsystems and confuse contributors.
- Repeatedly pushed back on the `20 lines` rule:
  - it is not measuring the right thing,
  - and if there are already larger low-risk categories like mechanical or boilerplate code, the rule is the wrong policy center.
- Raised licensing concerns around AI-generated new files and SPDX handling:
  - some guidance suggests whole-file AI output should not automatically get a license header unless human edits make it copyrightable,
  - but QEMU should still make clear that human edits to AI-generated code are assumed GPL-2.0-or-later unless explicitly stated otherwise.
- Also clarified that any "public domain" argument for LLM output only makes sense when the output is not credibly a derived work:
  - cloning QEMU into another language would likely still be a derived work,
  - and some non-trivial feature code that follows established QEMU design patterns could also plausibly still be GPL-derived.
- Rejected the idea that "mechanical" should be left to personal taste:
  - if reasonable people might disagree whether a change is mechanical,
  - the policy should assume it is not mechanical.
- Said that if mechanical changes or boilerplate are allowed, the policy should define them clearly enough that contributors can understand what is allowed without having to ask permission in advance.
- In the later security-report side discussion, argued against giving AI tools `Reported-by` credit:
  - the accountable party is the human reporter,
  - and the project should not provide free advertising to tool vendors.
- When Alex posted an AI-generated rewrite of the policy, said it had incorporated comments too indiscriminately, become more verbose, lost structure, and was drifting toward slop.

## Kevin Wolf

- Said `20 lines or less` is a poor proxy for what the project is actually trying to allow:
  - the real target is trivial or low-complexity code,
  - and it is easy to write 20 lines that are not trivial at all.
- Suggested line count could at most be an example, not the entire rule.
- Noted that "just say no to slop" is easy to say but not especially comfortable in practice for maintainers.
- Was skeptical that LLM workflows are meaningfully reproducible anyway.
- In the later security-report discussion, said AI-found issues feel analogous to Coverity:
  - they generally do not deserve a `Reported-by` trailer,
  - but mention in commit-message prose can make sense if useful.

## Michael S. Tsirkin

- Suggested explicitly allowing AI to correct grammar and spelling in text the contributor already wrote, as long as AI is not writing the text from scratch.
- Argued there are cases where a maintainer may reasonably judge generated code to be so QEMU-specific or so tightly coupled to the current tree that accidental copying risk is negligible.
- Repeatedly emphasized that AI is especially useful for helping non-native English speakers.
- Questioned how effective the `20 lines` rule would be if many small AI-assisted contributions simply accumulate over time.
- Later suggested that if mechanical changes are allowed, the policy should say `clearly mechanical` or `obviously mechanical` and include examples.
- Also suggested that for borderline `mechanical` changes, contributors should check with maintainers up front because what counts as mechanical is still a maintainer judgment.
- In the licensing discussion, argued that if something truly is public domain, a human can still submit it under GPL terms, and that the policy could explicitly say contributing it to QEMU implies appropriate GPL licensing.
- Floated declining whole new AI-generated files for now unless they are just reorganizations of existing code that already inherit SPDX/licensing context.
- Also suggested maintainers can warn and eventually ignore repeat slop submitters.
- In response to the later security-report question, said AI-assisted security scanning was already allowed under the current policy.

## Christian Borntraeger

- Asked how the policy should treat a human-submitted patch that is based on an AI-generated security report.
- Asked whether, if such reports are allowed, the project should add something like `Reported-by: Claude` or `Reported-by: ChatGPT`.

     prev parent reply	other threads:[~2026-06-23 19:27 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-29  9:46 [PATCH v2] docs/devel: relax policy on AI-generated contributions Paolo Bonzini
2026-05-29 11:52 ` Alex Bennée
2026-05-29 13:06   ` Paolo Bonzini
2026-05-29 13:10     ` Michael S. Tsirkin
2026-05-29 11:59 ` BALATON Zoltan
2026-05-29 15:34 ` Peter Maydell
2026-05-29 15:46   ` Michael S. Tsirkin
2026-05-29 15:55     ` Peter Maydell
2026-05-29 16:17     ` Paolo Bonzini
2026-05-29 17:47       ` Michael S. Tsirkin
2026-06-02  7:38   ` Michael S. Tsirkin
2026-06-02  8:09     ` Paolo Bonzini
2026-06-02 15:53 ` Stefan Hajnoczi
2026-06-03 11:35   ` Paolo Bonzini
2026-06-03 14:55     ` Stefan Hajnoczi
2026-06-03 14:59 ` Daniel P. Berrangé
2026-06-03 15:06   ` Michael S. Tsirkin
2026-06-03 15:35   ` Paolo Bonzini
2026-06-03 17:54     ` Daniel P. Berrangé
2026-06-04 10:37       ` Paolo Bonzini
2026-06-05  9:17         ` Daniel P. Berrangé
2026-06-05  9:25           ` Michael S. Tsirkin
2026-06-05  9:39             ` Daniel P. Berrangé
2026-06-05  9:48               ` Michael S. Tsirkin
2026-06-05 10:23                 ` Daniel P. Berrangé
2026-06-05 10:28                   ` Michael S. Tsirkin
2026-06-05 10:34                     ` Daniel P. Berrangé
2026-06-05 11:26                   ` Paolo Bonzini
2026-06-05 12:39                   ` BALATON Zoltan
2026-06-05 13:00                     ` Daniel P. Berrangé
2026-06-03 18:14     ` Alex Bennée
2026-06-03 18:20       ` Daniel P. Berrangé
2026-06-04 10:04         ` Alex Bennée
2026-06-04  6:08       ` Michael S. Tsirkin
2026-06-05 10:12     ` Kevin Wolf
2026-06-05 10:23       ` Michael S. Tsirkin
2026-06-16 17:06 ` Christian Borntraeger
2026-06-16 17:30   ` Michael S. Tsirkin
2026-06-16 17:44   ` Daniel P. Berrangé
2026-06-17  6:28     ` Alex Bennée
2026-06-17  8:38   ` Kevin Wolf
2026-06-17  8:49     ` Daniel P. Berrangé
2026-06-17 14:44       ` Kevin Wolf
2026-06-17 15:49         ` Peter Maydell
2026-06-23 19:26 ` Michael S. Tsirkin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260623150758-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=alex.bennee@linaro.org \
    --cc=alistair.francis@wdc.com \
    --cc=balaton@eik.bme.hu \
    --cc=berrange@redhat.com \
    --cc=bonzini@gnu.org \
    --cc=farosas@suse.de \
    --cc=imp@bsdimp.com \
    --cc=kwolf@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.