* [PATCH] docs/code-provenance: add an exception for non-creative AI changes @ 2025-09-25 7:56 Paolo Bonzini 2025-09-26 14:38 ` Peter Maydell 2025-09-29 18:28 ` Daniel P. Berrangé 0 siblings, 2 replies; 7+ messages in thread From: Paolo Bonzini @ 2025-09-25 7:56 UTC (permalink / raw) To: qemu-devel; +Cc: peter.maydell, stefanha, berrange, alex.bennee AI tools can be used as a natural language refactoring engine for simple tasks such as modifying all callers of a given function or all accesses to a variable. These tasks are interesting for an exception because: * it is credible for a contributor to claim DCO compliance. If the contributor can reasonably make the same change with different tools or with just an editor, which tool is used (including an LLM) should have no bearing on compliance. This also applies to less simple tasks such as adding Python type annotations. * they are relatively easy to test and review, and can provide noticeable time savings; * this kind of change is easily separated from more complex non-AI-generated ones, which we encourage people to do anyway. It is therefore natural to highlight them as AI-generated. Make an exception for patches that have "limited creative content" - that is, mechanical transformations where the creativity lies in deciding what to change rather than in how to implement the change. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- docs/devel/code-provenance.rst | 27 +++++++++++++++++++++++---- 1 file changed, 23 insertions(+), 4 deletions(-) diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst index 8cdc56f6649..d6e86636964 100644 --- a/docs/devel/code-provenance.rst +++ b/docs/devel/code-provenance.rst @@ -290,9 +290,11 @@ Use of AI-generated content TL;DR: - **Current QEMU project policy is to DECLINE any contributions which are + **The general QEMU project policy is to DECLINE any contributions which are believed to include or derive from AI generated content. This includes - ChatGPT, Claude, Copilot, Llama and similar tools.** + ChatGPT, Claude, Copilot, Llama and similar tools.** The following exceptions + are acceptable: + * **Limited creative content** (e.g., mechanical transformations) **This policy does not apply to other uses of AI, such as researching APIs or algorithms, static analysis, or debugging, provided their output is not @@ -323,8 +325,9 @@ content generators commonly available today is unclear. The QEMU project is not willing or able to accept the legal risks of non-compliance. The QEMU project thus requires that contributors refrain from using AI content -generators on patches intended to be submitted to the project, and will -decline any contribution if use of AI is either known or suspected. +generators on patches intended to be submitted to the project, with exceptions +outlined below. If use of AI is known or suspected to go beyond the exceptions, +QEMU will decline a contribution. Examples of tools impacted by this policy includes GitHub's CoPilot, OpenAI's ChatGPT, Anthropic's Claude, and Meta's Code Llama, and code/content @@ -347,3 +350,19 @@ requirements for contribution. In particular, the "Signed-off-by" label in a patch submission is a statement that the author takes responsibility for the entire contents of the patch, including any parts that were generated or assisted by AI tools or other tools. + +The following exceptions are currently in place: + +**Limited creative content** + Mechanical transformations where there is reasonably only one way to + implement the change. Any tool, as well as a manual change, would + produce substantially the same modifications to the code. Examples + include adjustments to data structures, mechanical API migrations, + or applying non-functional changes uniformly across a codebase. + +It is highly encouraged to provide background information such as the +prompts that were used, and to not mix AI- and human-written code in the +same commit, as much as possible. + +Maintainers should ask for a second opinion and avoid applying the +exception to their own patch submissions. -- 2.51.0 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] docs/code-provenance: add an exception for non-creative AI changes 2025-09-25 7:56 [PATCH] docs/code-provenance: add an exception for non-creative AI changes Paolo Bonzini @ 2025-09-26 14:38 ` Peter Maydell 2025-09-26 19:26 ` Paolo Bonzini 2025-09-29 18:36 ` Daniel P. Berrangé 2025-09-29 18:28 ` Daniel P. Berrangé 1 sibling, 2 replies; 7+ messages in thread From: Peter Maydell @ 2025-09-26 14:38 UTC (permalink / raw) To: Paolo Bonzini; +Cc: qemu-devel, stefanha, berrange, alex.bennee On Thu, 25 Sept 2025 at 08:56, Paolo Bonzini <pbonzini@redhat.com> wrote: > > AI tools can be used as a natural language refactoring engine for simple > tasks such as modifying all callers of a given function or all accesses > to a variable. These tasks are interesting for an exception because: > > * it is credible for a contributor to claim DCO compliance. If the > contributor can reasonably make the same change with different tools or > with just an editor, which tool is used (including an LLM) should have > no bearing on compliance. This also applies to less simple tasks such > as adding Python type annotations. > > * they are relatively easy to test and review, and can provide noticeable > time savings; > > * this kind of change is easily separated from more complex non-AI-generated > ones, which we encourage people to do anyway. It is therefore natural > to highlight them as AI-generated. > > Make an exception for patches that have "limited creative content" - that > is, mechanical transformations where the creativity lies in deciding what > to change rather than in how to implement the change. I figure I'll state my personal opinion on this one. This isn't intended to be any kind of 'veto' on the question: I don't feel that strongly about it (and I don't think I ought to have a personal veto in any case). I'm not enthusiastic. The current policy is essentially "the legal risks are unclear and the project isn't willing to accept them". That's a straightforward rule to follow that doesn't require either the contributor or the reviewer or the project to make a possibly difficult judgement call on what counts as not in fact risky. As soon as we start adding exceptions then either we the project are making those judgement calls, or else we're pushing them on contributors or reviewers. I prefer the simple "'no' until the legal picture becomes less murky" rule we have currently. -- PMM ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] docs/code-provenance: add an exception for non-creative AI changes 2025-09-26 14:38 ` Peter Maydell @ 2025-09-26 19:26 ` Paolo Bonzini 2025-09-29 9:51 ` Daniel P. Berrangé 2025-09-29 18:36 ` Daniel P. Berrangé 1 sibling, 1 reply; 7+ messages in thread From: Paolo Bonzini @ 2025-09-26 19:26 UTC (permalink / raw) To: Peter Maydell Cc: qemu-devel, Hajnoczi, Stefan, P. Berrange, Daniel, Alex Bennée [-- Attachment #1: Type: text/plain, Size: 2323 bytes --] On Fri, Sep 26, 2025, 16:39 Peter Maydell <peter.maydell@linaro.org> wrote: > I figure I'll state my personal opinion on this one. This isn't > intended to be any kind of 'veto' on the question: I don't > feel that strongly about it (and I don't think I ought to > have a personal veto in any case). > > I'm not enthusiastic. The current policy is essentially > "the legal risks are unclear and the project isn't willing > to accept them". That's a straightforward rule to follow > that doesn't require either the contributor or the reviewer > or the project to make a possibly difficult judgement call on > what counts as not in fact risky. As soon as we start adding > exceptions then either we the project are making those > judgement calls, or else we're pushing them on contributors > or reviewers. I prefer the simple "'no' until the legal > picture becomes less murky" rule we have currently. > In principle I agree. I am not enthusiastic either. There are however two problems in the current policy. First, the policy is based on a honor code; in some cases the use of AI can be easily spotted, but in general it's anything but trivial especially in capable hands where, for example, code is generated by AI but commit messages are not. As such, the policy cannot prevent inclusion of AI generated code, it only tells you who is to blame. Second, for this specific kind of change it is, pretty much, impossible to tell whether it's generated with AI or by a specialized tool or by hand. If you provide a way for people to be honest about their tool usage, and allow it at least in some cases, there's a nonzero chance they will be; if you just tell them a hard no, and lying by omission has more than plausible deniability, there's a relatively high chance that they will just stay silent on the matter while still using the tool. In other words, as much as I would also like a simple policy, I expect fewer undiscovered violations with the exception in place—even beyond what the exception allows. And given the stated goal of using proposals and actual usage to inform future policy, this approach could serve that goal better than plain prohibition. That said, I am okay with having no exception if that's the consensus. Thanks, Paolo > -- PMM > > [-- Attachment #2: Type: text/html, Size: 3297 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] docs/code-provenance: add an exception for non-creative AI changes 2025-09-26 19:26 ` Paolo Bonzini @ 2025-09-29 9:51 ` Daniel P. Berrangé 2025-09-29 11:52 ` Peter Maydell 0 siblings, 1 reply; 7+ messages in thread From: Daniel P. Berrangé @ 2025-09-29 9:51 UTC (permalink / raw) To: Paolo Bonzini Cc: Peter Maydell, qemu-devel, Hajnoczi, Stefan, Alex Bennée On Fri, Sep 26, 2025 at 09:26:47PM +0200, Paolo Bonzini wrote: > On Fri, Sep 26, 2025, 16:39 Peter Maydell <peter.maydell@linaro.org> wrote: > > > I figure I'll state my personal opinion on this one. This isn't > > intended to be any kind of 'veto' on the question: I don't > > feel that strongly about it (and I don't think I ought to > > have a personal veto in any case). > > > > I'm not enthusiastic. The current policy is essentially > > "the legal risks are unclear and the project isn't willing > > to accept them". That's a straightforward rule to follow > > that doesn't require either the contributor or the reviewer > > or the project to make a possibly difficult judgement call on > > what counts as not in fact risky. As soon as we start adding > > exceptions then either we the project are making those > > judgement calls, or else we're pushing them on contributors > > or reviewers. I prefer the simple "'no' until the legal > > picture becomes less murky" rule we have currently. > > > > In principle I agree. I am not enthusiastic either. There are however two > problems in the current policy. > > First, the policy is based on a honor code; in some cases the use of AI can > be easily spotted, but in general it's anything but trivial especially in > capable hands where, for example, code is generated by AI but commit > messages are not. As such, the policy cannot prevent inclusion of AI > generated code, it only tells you who is to blame. The policy is intentionally based on an honour code, because trust in contributors intention is a fundamental foundation of a well functioning OSS project. When projects start to view contributors as untrustworthy, then IME they end up with burdensome processes (often pushed by corporate demands), such as copyright assignment / CLA, instead of the lightweight DCO (self-certification, honour based) process we have today. The policy never intended for the project or our maintainers to take any active steps to identify and block AI contributions from contributors with ill-intent. That would be a sisyphean task IMHO. We're in a situation where many organizations are strongly pushing their employees to use AI tools where possible. In the absence of any written policy, a project has effectively created an implicit policy that they are willing to accept AI contributions. Contributors have few sources of information on implications of AI in the context of OSS projects, which are not written by companies with a vested (biased) interest in maximising use of AI. So the QEMU policy serves several purposes IMHO: * Provides an interpretation of the DCO wrt LLM generated content that reflects community rather than corporate viewpoint * Provide cover to the project itself by stating our intent wrt what we are willing to accept, pushing liability to the contributor (as is the case with DCO in general) * Gives contributors clear guidance on whether or not they can submit AI generated content to QEMU * Gives contributors a policy to point their employer to when asked why they didn't use any AI tools in context of QEMU > Second, for this specific kind of change it is, pretty much, impossible to > tell whether it's generated with AI or by a specialized tool or by hand. If > you provide a way for people to be honest about their tool usage, and allow > it at least in some cases, there's a nonzero chance they will be; if you > just tell them a hard no, and lying by omission has more than plausible > deniability, there's a relatively high chance that they will just stay > silent on the matter while still using the tool. I find this to be a somewhat distastful view of our contributors, as it is giving a strong implication that they are likely to act dishonestly wrt our contribution policies. I don't think that is a fair reflection of our contributors in general. I'll never say never, as it is quite possible for any community to have a malicious contributor, and someone may well do this just out of spite to try and prove our policy "wrong" because of the high profile of this policy. I don't think we should design our contribution policies around the worst of people though. Policies are a social contract and there is little we can do to force compliance against a motivated person with ill-intent. If we ever did identify someone willfully ignoring the policies (whether on AI, or any other aspect), our main recourse is limited to no longer accepting their work / participation in the project. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] docs/code-provenance: add an exception for non-creative AI changes 2025-09-29 9:51 ` Daniel P. Berrangé @ 2025-09-29 11:52 ` Peter Maydell 0 siblings, 0 replies; 7+ messages in thread From: Peter Maydell @ 2025-09-29 11:52 UTC (permalink / raw) To: Daniel P. Berrangé Cc: Paolo Bonzini, qemu-devel, Hajnoczi, Stefan, Alex Bennée On Mon, 29 Sept 2025 at 10:51, Daniel P. Berrangé <berrange@redhat.com> wrote: > > On Fri, Sep 26, 2025 at 09:26:47PM +0200, Paolo Bonzini wrote: > > On Fri, Sep 26, 2025, 16:39 Peter Maydell <peter.maydell@linaro.org> wrote: > > > > > I figure I'll state my personal opinion on this one. This isn't > > > intended to be any kind of 'veto' on the question: I don't > > > feel that strongly about it (and I don't think I ought to > > > have a personal veto in any case). > > > > > > I'm not enthusiastic. The current policy is essentially > > > "the legal risks are unclear and the project isn't willing > > > to accept them". That's a straightforward rule to follow > > > that doesn't require either the contributor or the reviewer > > > or the project to make a possibly difficult judgement call on > > > what counts as not in fact risky. As soon as we start adding > > > exceptions then either we the project are making those > > > judgement calls, or else we're pushing them on contributors > > > or reviewers. I prefer the simple "'no' until the legal > > > picture becomes less murky" rule we have currently. > > > > > > > In principle I agree. I am not enthusiastic either. There are however two > > problems in the current policy. > > > > First, the policy is based on a honor code; in some cases the use of AI can > > be easily spotted, but in general it's anything but trivial especially in > > capable hands where, for example, code is generated by AI but commit > > messages are not. As such, the policy cannot prevent inclusion of AI > > generated code, it only tells you who is to blame. > > The policy is intentionally based on an honour code, because trust in > contributors intention is a fundamental foundation of a well functioning > OSS project. When projects start to view contributors as untrustworthy, > then IME they end up with burdensome processes (often pushed by corporate > demands), such as copyright assignment / CLA, instead of the lightweight > DCO (self-certification, honour based) process we have today. Mmm. I think there's a difference between: * we think this category of AI generated changes is sufficiently low-risk to the project and sufficiently useful to be worth awarding it an exception and * we think that this category of AI generated changes is one we can't trust contributors not to just send in anyway, so we give it an exception in the hope they might at least tell us when they're doing it The commit message for this patch is making the first argument; if we really think that, that's fine, but I don't think we should make the change with the former argument as justification if really we're doing it because we're worried about the second. And I'm definitely sceptical that we should change our policy just because we think people are going to deliberately breach it if we do not... thanks -- PMM ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] docs/code-provenance: add an exception for non-creative AI changes 2025-09-26 14:38 ` Peter Maydell 2025-09-26 19:26 ` Paolo Bonzini @ 2025-09-29 18:36 ` Daniel P. Berrangé 1 sibling, 0 replies; 7+ messages in thread From: Daniel P. Berrangé @ 2025-09-29 18:36 UTC (permalink / raw) To: Peter Maydell; +Cc: Paolo Bonzini, qemu-devel, stefanha, alex.bennee On Fri, Sep 26, 2025 at 03:38:49PM +0100, Peter Maydell wrote: > > I'm not enthusiastic. The current policy is essentially > "the legal risks are unclear and the project isn't willing > to accept them". Broadly speaking the legal risks are unclear. The challenge from Paolo though is there are some usage scenarios where the legal risks are negligible, even in today's murky situation wrt training material license laundering > That's a straightforward rule to follow > that doesn't require either the contributor or the reviewer > or the project to make a possibly difficult judgement call on > what counts as not in fact risky. As soon as we start adding > exceptions then either we the project are making those > judgement calls, or else we're pushing them on contributors > or reviewers. I prefer the simple "'no' until the legal > picture becomes less murky" rule we have currently. The simplicity of the current rule is very appealing, but at the same time I find it hard to justify why we should reject usage in some of these scenarios. So we have a choice of deciding we're going to accept the collatoral damage of rejecting what are almost certainly low risk contributions, or tolerate a little more complexity in our policy via exceptions. I'm willing to entertain the idea of exceptions, as long as we don't make it too onerous for our maintainers to evaluate patches with a reasonable consistency across our different maintainers. Something should be able to pass an obvious & simple "sniff test" to be able to qualify under an exception. If we find ourselves having to debate & ponder applicability, then the exception would be unworkable. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] docs/code-provenance: add an exception for non-creative AI changes 2025-09-25 7:56 [PATCH] docs/code-provenance: add an exception for non-creative AI changes Paolo Bonzini 2025-09-26 14:38 ` Peter Maydell @ 2025-09-29 18:28 ` Daniel P. Berrangé 1 sibling, 0 replies; 7+ messages in thread From: Daniel P. Berrangé @ 2025-09-29 18:28 UTC (permalink / raw) To: Paolo Bonzini; +Cc: qemu-devel, peter.maydell, stefanha, alex.bennee On Thu, Sep 25, 2025 at 09:56:30AM +0200, Paolo Bonzini wrote: > AI tools can be used as a natural language refactoring engine for simple > tasks such as modifying all callers of a given function or all accesses > to a variable. These tasks are interesting for an exception because: > > * it is credible for a contributor to claim DCO compliance. If the > contributor can reasonably make the same change with different tools or > with just an editor, which tool is used (including an LLM) should have > no bearing on compliance. This also applies to less simple tasks such > as adding Python type annotations. When I read refactoring, I consider * No functional change * The change is a logical transformation of text The creative act is defining the rule for what the transformation will do. The resulting patch is a derivative work of the existing code, along with the rules for the transformation. In traditional cases, the rules are an awk prompt, or a semantic patch (coccinelle), but in this case, the rules are the AI agent prompt The AI training material will guide the tool on how to interpret your natural language rules, but overall it looks implausible to claim the resulting patch would be a derivative work of any training material. The only exception seems to be if some QEMU fork had made exactly the same refactoring already, which seems unlikely in general, when we disregard malicious contributors. Type annotations gets more complex to rationalize, because that is about writing net new functional code. As an analogy though, if there was a clean room dev team, whom were fed the commit message, their output would likely be identical. The defining characteristic is the lack of plausible (&correct) alternative approaches. All tools, whether AI or not, should converge on the same result. It again looks implausible to suggest the resulting patch could be a derivative work of any particular agent training material, unless some QEMU fork had made exactly the same functional change already. This feels unlikely. TL:DR: even in today's world where the laundering of training material licenses, vs AI output licenses is murky, there is likely sufficient confidence in these particular scenarios to claim the training material imposes negligible risk to QEMU. > * they are relatively easy to test and review, and can provide noticeable > time savings; > > * this kind of change is easily separated from more complex non-AI-generated > ones, which we encourage people to do anyway. It is therefore natural > to highlight them as AI-generated. > > Make an exception for patches that have "limited creative content" - that > is, mechanical transformations where the creativity lies in deciding what > to change rather than in how to implement the change. > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > --- > docs/devel/code-provenance.rst | 27 +++++++++++++++++++++++---- > 1 file changed, 23 insertions(+), 4 deletions(-) > > diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst > index 8cdc56f6649..d6e86636964 100644 > --- a/docs/devel/code-provenance.rst > +++ b/docs/devel/code-provenance.rst > @@ -290,9 +290,11 @@ Use of AI-generated content > > TL;DR: > > - **Current QEMU project policy is to DECLINE any contributions which are > + **The general QEMU project policy is to DECLINE any contributions which are > believed to include or derive from AI generated content. This includes > - ChatGPT, Claude, Copilot, Llama and similar tools.** > + ChatGPT, Claude, Copilot, Llama and similar tools.** The following exceptions > + are acceptable: > + * **Limited creative content** (e.g., mechanical transformations) Ought to link to the detailed description further down. > > **This policy does not apply to other uses of AI, such as researching APIs > or algorithms, static analysis, or debugging, provided their output is not > @@ -323,8 +325,9 @@ content generators commonly available today is unclear. The QEMU project is > not willing or able to accept the legal risks of non-compliance. > > The QEMU project thus requires that contributors refrain from using AI content > -generators on patches intended to be submitted to the project, and will > -decline any contribution if use of AI is either known or suspected. > +generators on patches intended to be submitted to the project, with exceptions > +outlined below. If use of AI is known or suspected to go beyond the exceptions, > +QEMU will decline a contribution. > > Examples of tools impacted by this policy includes GitHub's CoPilot, OpenAI's > ChatGPT, Anthropic's Claude, and Meta's Code Llama, and code/content > @@ -347,3 +350,19 @@ requirements for contribution. In particular, the "Signed-off-by" > label in a patch submission is a statement that the author takes > responsibility for the entire contents of the patch, including any parts > that were generated or assisted by AI tools or other tools. > + > +The following exceptions are currently in place: > + > +**Limited creative content** > + Mechanical transformations where there is reasonably only one way to > + implement the change. Any tool, as well as a manual change, would > + produce substantially the same modifications to the code. Examples > + include adjustments to data structures, mechanical API migrations, > + or applying non-functional changes uniformly across a codebase. I can rationalize in my mind what I would be willing to accept under this description, but at the same time this is (intentionally) fairly fuzzy and open to interpretation. > +It is highly encouraged to provide background information such as the > +prompts that were used, and to not mix AI- and human-written code in the > +same commit, as much as possible. It would be nice to require full separation, but if an AI gets you 95% of the way there, and the remaining 5% is just stupid typos/ mistakes, it is better for a human to just fix it immediately, than to continue re-prompting the agent to try again and hoping you win. > +Maintainers should ask for a second opinion and avoid applying the > +exception to their own patch submissions. Could it be simpler as: Sign off is mandatory from a maintainer whom is not the author. Or where you implying that we need SoB from two maintainers ? In theory non-author S-o-B is best practice already, but analyzing git shows we have about 10% of our commits with no independent S-o-B at all. Often such commits are precisely the NFC refactorings, or the simple mechanical additions. So perhaps going from zero-or-one independent SoBs, to at least one SoB is sufficiently strong ? With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-09-29 18:38 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-09-25 7:56 [PATCH] docs/code-provenance: add an exception for non-creative AI changes Paolo Bonzini 2025-09-26 14:38 ` Peter Maydell 2025-09-26 19:26 ` Paolo Bonzini 2025-09-29 9:51 ` Daniel P. Berrangé 2025-09-29 11:52 ` Peter Maydell 2025-09-29 18:36 ` Daniel P. Berrangé 2025-09-29 18:28 ` Daniel P. Berrangé
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).