[RFC PATCH 0/4] docs/code-provenance: make AI policy clearer and more practical

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH 0/4] docs/code-provenance: make AI policy clearer and more practical
@ 2025-09-22 11:32 Paolo Bonzini
  2025-09-22 11:32 ` [RFC PATCH 1/4] docs/code-provenance: clarify scope very early Paolo Bonzini
                   ` (3 more replies)
  0 siblings, 4 replies; 20+ messages in thread
From: Paolo Bonzini @ 2025-09-22 11:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Daniel P . Berrangé, Markus Armbruster,
	Peter Maydell, Stefan Hajnoczi

These patches contain three changes to QEMU's code provenance policy
with respect to AI-generated content.  I am sorting them from least to
most controversial.

First, I am emphasizing the intended scope: the policy is not about
content generators, it is about generated content (patch 1).

Second, I am adding some procedural requirements and liability boundaries
to the exception process (patches 2-3).  These changes provide a structure
for the process and clarify that the process is not an expansion of the
maintainers' responsibilities.

On top of these changes, however, I am also expanding the exception
process so that it is actually feasible to request and obtain an
exception. Requesting "clarity of the license and copyright status
for the tool's output" is almost asking for the impossible; a problem
that is also shared by other AI policies such as the Linux Foundation's
(https://www.linuxfoundation.org/legal/generative-ai). Therefore, add a
second case for an exception, limited but practical, which is "limited
or non-existing creative content" (patch 4).

Paolo

Paolo Bonzini (4):
  docs/code-provenance: clarify scope very early
  docs/code-provenance: make the exception process more prominent
  docs/code-provenance: clarify the scope of AI exceptions
  docs/code-provenance: make the exception process feasible

 docs/devel/code-provenance.rst | 46 +++++++++++++++++++++++-----------
 1 file changed, 31 insertions(+), 15 deletions(-)

-- 
2.51.0

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC PATCH 1/4] docs/code-provenance: clarify scope very early
  2025-09-22 11:32 [RFC PATCH 0/4] docs/code-provenance: make AI policy clearer and more practical Paolo Bonzini
@ 2025-09-22 11:32 ` Paolo Bonzini
  2025-09-22 11:34   ` Daniel P. Berrangé
  2025-09-22 12:52   ` Alex Bennée
  2025-09-22 11:32 ` [RFC PATCH 2/4] docs/code-provenance: make the exception process more prominent Paolo Bonzini
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 20+ messages in thread
From: Paolo Bonzini @ 2025-09-22 11:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Daniel P . Berrangé, Markus Armbruster,
	Peter Maydell, Stefan Hajnoczi

The AI policy in QEMU is not about content generators, it is about
generated content.  Other uses are explicitly not covered.  Rename the
policy and mention its scope only as a matter of convenience to the
reader, in the TL;DR section.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 docs/devel/code-provenance.rst | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
index b5aae2e2532..dba99a26f64 100644
--- a/docs/devel/code-provenance.rst
+++ b/docs/devel/code-provenance.rst
@@ -285,8 +285,8 @@ Such tools are acceptable to use, provided there is clearly defined copyright
 and licensing for their output. Note in particular the caveats applying to AI
 content generators below.
 
-Use of AI content generators
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Use of AI-generated content
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 TL;DR:
 
@@ -294,6 +294,10 @@ TL;DR:
   believed to include or derive from AI generated content. This includes
   ChatGPT, Claude, Copilot, Llama and similar tools.**
 
+  **This policy does not apply to other uses of AI, such as researching APIs
+  or algorithms, static analysis, or debugging, provided their output is not
+  included in contributions.**
+
 The increasing prevalence of AI-assisted software development results in a
 number of difficult legal questions and risks for software projects, including
 QEMU.  Of particular concern is content generated by `Large Language Models
@@ -322,9 +326,6 @@ The QEMU project thus requires that contributors refrain from using AI content
 generators on patches intended to be submitted to the project, and will
 decline any contribution if use of AI is either known or suspected.
 
-This policy does not apply to other uses of AI, such as researching APIs or
-algorithms, static analysis, or debugging, provided their output is not to be
-included in contributions.
 
 Examples of tools impacted by this policy includes GitHub's CoPilot, OpenAI's
 ChatGPT, Anthropic's Claude, and Meta's Code Llama, and code/content
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 2/4] docs/code-provenance: make the exception process more prominent
  2025-09-22 11:32 [RFC PATCH 0/4] docs/code-provenance: make AI policy clearer and more practical Paolo Bonzini
  2025-09-22 11:32 ` [RFC PATCH 1/4] docs/code-provenance: clarify scope very early Paolo Bonzini
@ 2025-09-22 11:32 ` Paolo Bonzini
  2025-09-22 13:24   ` Daniel P. Berrangé
  2025-09-22 11:32 ` [RFC PATCH 3/4] docs/code-provenance: clarify the scope of AI exceptions Paolo Bonzini
  2025-09-22 11:32 ` [RFC PATCH 4/4] docs/code-provenance: make the exception process feasible Paolo Bonzini
  3 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2025-09-22 11:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Daniel P . Berrangé, Markus Armbruster,
	Peter Maydell, Stefan Hajnoczi

The exception process is a second thought in QEMU's policy for AI-generated
content.  It is not really possible to understand how people want to use
these tools without formalizing it a bit more and encouraging people to
request exceptions if they see a good use for AI-generated content.

Note that right now, in my opinion, the exception process remains
infeasible, because there is no agreement on how to "demonstrate
clarity of the license and copyright status for the tool's output".
This will be sorted out separately.

What is missing: do we want a formal way to identify commits for which an
exception to the AI policy was granted?  The common way to do so seems to
be "Generated-by" or "Assisted-by" but I don't want to turn commit message
into an ad space.  I would lean more towards something like

  AI-exception-granted-by: Mary Maintainer <mary.maintainer@mycorp.test>

but at the same time I don't want to invent something just for QEMU.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 docs/devel/code-provenance.rst | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
index dba99a26f64..d435ab145cf 100644
--- a/docs/devel/code-provenance.rst
+++ b/docs/devel/code-provenance.rst
@@ -292,7 +292,8 @@ TL;DR:
 
   **Current QEMU project policy is to DECLINE any contributions which are
   believed to include or derive from AI generated content. This includes
-  ChatGPT, Claude, Copilot, Llama and similar tools.**
+  ChatGPT, Claude, Copilot, Llama and similar tools.  Exceptions may be
+  requested on a case-by-case basis.**
 
   **This policy does not apply to other uses of AI, such as researching APIs
   or algorithms, static analysis, or debugging, provided their output is not
@@ -322,18 +323,19 @@ How contributors could comply with DCO terms (b) or (c) for the output of AI
 content generators commonly available today is unclear.  The QEMU project is
 not willing or able to accept the legal risks of non-compliance.
 
-The QEMU project thus requires that contributors refrain from using AI content
-generators on patches intended to be submitted to the project, and will
-decline any contribution if use of AI is either known or suspected.
+The QEMU project requires contributors to refrain from using AI content
+generators without going through an exception request process.
+AI-generated code will only be included in the project after the
+exception request has been evaluated by the QEMU project.  To be
+granted an exception, a contributor will need to demonstrate clarity of
+the license and copyright status for the tool's output in relation to its
+training model and code, to the satisfaction of the project maintainers.
 
+Maintainers are not allow to grant an exception on their own patch
+submissions.
 
 Examples of tools impacted by this policy includes GitHub's CoPilot, OpenAI's
 ChatGPT, Anthropic's Claude, and Meta's Code Llama, and code/content
 generation agents which are built on top of such tools.
-
 This policy may evolve as AI tools mature and the legal situation is
-clarifed. In the meanwhile, requests for exceptions to this policy will be
-evaluated by the QEMU project on a case by case basis. To be granted an
-exception, a contributor will need to demonstrate clarity of the license and
-copyright status for the tool's output in relation to its training model and
-code, to the satisfaction of the project maintainers.
+clarified.
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 3/4] docs/code-provenance: clarify the scope of AI exceptions
  2025-09-22 11:32 [RFC PATCH 0/4] docs/code-provenance: make AI policy clearer and more practical Paolo Bonzini
  2025-09-22 11:32 ` [RFC PATCH 1/4] docs/code-provenance: clarify scope very early Paolo Bonzini
  2025-09-22 11:32 ` [RFC PATCH 2/4] docs/code-provenance: make the exception process more prominent Paolo Bonzini
@ 2025-09-22 11:32 ` Paolo Bonzini
  2025-09-22 13:02   ` Alex Bennée
  2025-09-22 11:32 ` [RFC PATCH 4/4] docs/code-provenance: make the exception process feasible Paolo Bonzini
  3 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2025-09-22 11:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Daniel P . Berrangé, Markus Armbruster,
	Peter Maydell, Stefan Hajnoczi

Using phrasing from https://openinfra.org/legal/ai-policy (with just
"commit" replaced by "submission", because we do not submit changes
as commits but rather emails), clarify that the maintainer who bestows
their blessing on the AI-generated contribution is not responsible
for its copyright or license status beyond what is required by the
Developer's Certificate of Origin.

[This is not my preferred phrasing.  I would prefer something lighter
like "the "Signed-off-by" label in the contribution gives the author
responsibility".  But for the sake of not reinventing the wheel I am
keeping the exact works from the OpenInfra policy.]

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 docs/devel/code-provenance.rst | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
index d435ab145cf..a5838f63649 100644
--- a/docs/devel/code-provenance.rst
+++ b/docs/devel/code-provenance.rst
@@ -334,6 +334,11 @@ training model and code, to the satisfaction of the project maintainers.
 Maintainers are not allow to grant an exception on their own patch
 submissions.
 
+Even after an exception is granted, the "Signed-off-by" label in the
+contribution is a statement that the author takes responsibility for the
+entire contents of the submission, including any parts that were generated
+or assisted by AI tools or other tools.
+
 Examples of tools impacted by this policy includes GitHub's CoPilot, OpenAI's
 ChatGPT, Anthropic's Claude, and Meta's Code Llama, and code/content
 generation agents which are built on top of such tools.
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 4/4] docs/code-provenance: make the exception process feasible
  2025-09-22 11:32 [RFC PATCH 0/4] docs/code-provenance: make AI policy clearer and more practical Paolo Bonzini
                   ` (2 preceding siblings ...)
  2025-09-22 11:32 ` [RFC PATCH 3/4] docs/code-provenance: clarify the scope of AI exceptions Paolo Bonzini
@ 2025-09-22 11:32 ` Paolo Bonzini
  2025-09-22 11:46   ` Peter Maydell
  3 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2025-09-22 11:32 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Daniel P . Berrangé, Markus Armbruster,
	Peter Maydell, Stefan Hajnoczi

I do not think that anyone knows how to demonstrate "clarity of the
copyright status in relation to training".  This makes the exception
process for AI-generated code both impossible to use, and useless as a
way to inform future changes to QEMU's code provenance policies.

On the other hand, AI tools can be used as a natural language refactoring
engine for simple tasks such as modifying all callers of a given function
or even less simple ones such as adding Python type annotations.
These tasks have a very low risk of introducing training material in
the code base, and can provide noticeable time savings because they are
easily tested and reviewed; for the lack of a better term, I will call
these "tasks with limited or non-existing creative content".

Allow requesting an exception on the grounds of lack of creative content,
while keeping it clear that maintainers can deny it.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 docs/devel/code-provenance.rst | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
index a5838f63649..bfc659d2b4e 100644
--- a/docs/devel/code-provenance.rst
+++ b/docs/devel/code-provenance.rst
@@ -327,9 +327,17 @@ The QEMU project requires contributors to refrain from using AI content
 generators without going through an exception request process.
 AI-generated code will only be included in the project after the
 exception request has been evaluated by the QEMU project.  To be
-granted an exception, a contributor will need to demonstrate clarity of
-the license and copyright status for the tool's output in relation to its
-training model and code, to the satisfaction of the project maintainers.
+granted an exception, a contributor will need to demonstrate one of the
+following, to the satisfaction of the project maintainers:
+
+* clarity of the license and copyright status for the tool's output in
+  relation to its training model and code;
+
+* limited or non-existing creative content of the contribution.
+
+It is highly encouraged to provide background information such as the
+prompts that were used, and to not mix AI- and human-written code in the
+same commit, as much as possible.
 
 Maintainers are not allow to grant an exception on their own patch
 submissions.
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 1/4] docs/code-provenance: clarify scope very early
  2025-09-22 11:32 ` [RFC PATCH 1/4] docs/code-provenance: clarify scope very early Paolo Bonzini
@ 2025-09-22 11:34   ` Daniel P. Berrangé
  2025-09-22 12:52   ` Alex Bennée
  1 sibling, 0 replies; 20+ messages in thread
From: Daniel P. Berrangé @ 2025-09-22 11:34 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel, Alex Bennée, Markus Armbruster, Peter Maydell,
	Stefan Hajnoczi

On Mon, Sep 22, 2025 at 01:32:16PM +0200, Paolo Bonzini wrote:
> The AI policy in QEMU is not about content generators, it is about
> generated content.  Other uses are explicitly not covered.  Rename the
> policy and mention its scope only as a matter of convenience to the
> reader, in the TL;DR section.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  docs/devel/code-provenance.rst | 11 ++++++-----
>  1 file changed, 6 insertions(+), 5 deletions(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 4/4] docs/code-provenance: make the exception process feasible
  2025-09-22 11:32 ` [RFC PATCH 4/4] docs/code-provenance: make the exception process feasible Paolo Bonzini
@ 2025-09-22 11:46   ` Peter Maydell
  2025-09-22 12:06     ` Paolo Bonzini
  2025-09-22 13:04     ` Daniel P. Berrangé
  0 siblings, 2 replies; 20+ messages in thread
From: Peter Maydell @ 2025-09-22 11:46 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel, Alex Bennée, Daniel P . Berrangé,
	Markus Armbruster, Stefan Hajnoczi

On Mon, 22 Sept 2025 at 12:32, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> I do not think that anyone knows how to demonstrate "clarity of the
> copyright status in relation to training".

Yes; to me this is the whole driving force behind the policy.

> On the other hand, AI tools can be used as a natural language refactoring
> engine for simple tasks such as modifying all callers of a given function
> or even less simple ones such as adding Python type annotations.
> These tasks have a very low risk of introducing training material in
> the code base, and can provide noticeable time savings because they are
> easily tested and reviewed; for the lack of a better term, I will call
> these "tasks with limited or non-existing creative content".

Does anybody know how to demonstrate "limited or non-existing
creative content", which I assume is a standin here for
"not copyrightable" ?

-- PMM


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 4/4] docs/code-provenance: make the exception process feasible
  2025-09-22 11:46   ` Peter Maydell
@ 2025-09-22 12:06     ` Paolo Bonzini
  2025-09-22 13:04     ` Daniel P. Berrangé
  1 sibling, 0 replies; 20+ messages in thread
From: Paolo Bonzini @ 2025-09-22 12:06 UTC (permalink / raw)
  To: Peter Maydell
  Cc: qemu-devel, Alex Bennée, Daniel P . Berrangé,
	Markus Armbruster, Stefan Hajnoczi

On Mon, Sep 22, 2025 at 1:47 PM Peter Maydell <peter.maydell@linaro.org> wrote:
> > On the other hand, AI tools can be used as a natural language refactoring
> > engine for simple tasks such as modifying all callers of a given function
> > or even less simple ones such as adding Python type annotations.
> > These tasks have a very low risk of introducing training material in
> > the code base, and can provide noticeable time savings because they are
> > easily tested and reviewed; for the lack of a better term, I will call
> > these "tasks with limited or non-existing creative content".
>
> Does anybody know how to demonstrate "limited or non-existing
> creative content", which I assume is a standin here for
> "not copyrightable" ?

The way *I* would demonstrate it is "there is exactly (or pretty much)
one way to do this change". Any way to do that change (sed,
coccinelle, AI or by hand) would result in the same modification to
the code, with no real freedom to pick an algorithm, a data structure,
or even a way to organize the code.

I wouldn't say however that this is equivalent to non copyrightable.
It's more that the creativity lies in "deciding to do it" rather than
in "coming up with the code to do it". This is also why I mention
having prompts in the commit message; the prompt tells you whether the
AI is making design decisions or just executing a mechanical
transformation.

There's still a substantial amount of grey and I'm okay with treating
anything grey as a "no". If something like "convert this script from
bash to Python" comes up, I'd not try to claim it as "limited creative
content". It may be a boring task with limited variability in output;
but it's still creative and has substantially more copyright
infringement risk.

Paolo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 1/4] docs/code-provenance: clarify scope very early
  2025-09-22 11:32 ` [RFC PATCH 1/4] docs/code-provenance: clarify scope very early Paolo Bonzini
  2025-09-22 11:34   ` Daniel P. Berrangé
@ 2025-09-22 12:52   ` Alex Bennée
  1 sibling, 0 replies; 20+ messages in thread
From: Alex Bennée @ 2025-09-22 12:52 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel, Daniel P . Berrangé, Markus Armbruster,
	Peter Maydell, Stefan Hajnoczi

Paolo Bonzini <pbonzini@redhat.com> writes:

> The AI policy in QEMU is not about content generators, it is about
> generated content.  Other uses are explicitly not covered.  Rename the
> policy and mention its scope only as a matter of convenience to the
> reader, in the TL;DR section.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 3/4] docs/code-provenance: clarify the scope of AI exceptions
  2025-09-22 11:32 ` [RFC PATCH 3/4] docs/code-provenance: clarify the scope of AI exceptions Paolo Bonzini
@ 2025-09-22 13:02   ` Alex Bennée
  2025-09-22 13:38     ` Daniel P. Berrangé
  0 siblings, 1 reply; 20+ messages in thread
From: Alex Bennée @ 2025-09-22 13:02 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel, Daniel P . Berrangé, Markus Armbruster,
	Peter Maydell, Stefan Hajnoczi

Paolo Bonzini <pbonzini@redhat.com> writes:

> Using phrasing from https://openinfra.org/legal/ai-policy (with just
> "commit" replaced by "submission", because we do not submit changes
> as commits but rather emails), clarify that the maintainer who bestows
> their blessing on the AI-generated contribution is not responsible
> for its copyright or license status beyond what is required by the
> Developer's Certificate of Origin.
>
> [This is not my preferred phrasing.  I would prefer something lighter
> like "the "Signed-off-by" label in the contribution gives the author
> responsibility".  But for the sake of not reinventing the wheel I am
> keeping the exact works from the OpenInfra policy.]
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  docs/devel/code-provenance.rst | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
> index d435ab145cf..a5838f63649 100644
> --- a/docs/devel/code-provenance.rst
> +++ b/docs/devel/code-provenance.rst
> @@ -334,6 +334,11 @@ training model and code, to the satisfaction of the project maintainers.
>  Maintainers are not allow to grant an exception on their own patch
>  submissions.
>  
> +Even after an exception is granted, the "Signed-off-by" label in the
> +contribution is a statement that the author takes responsibility for the
> +entire contents of the submission, including any parts that were generated
> +or assisted by AI tools or other tools.
> +

I quite like the LLVM wording which makes expectations clear to the
submitter:

  While the LLVM project has a liberal policy on AI tool use, contributors
  are considered responsible for their contributions. We encourage
  contributors to review all generated code before sending it for review
  to verify its correctness and to understand it so that they can answer
  questions during code review. Reviewing and maintaining generated code
  that the original contributor does not understand is not a good use of
  limited project resources.

It could perhaps be even stronger (must rather than encourage). The key
point to emphasise is we don't want submissions the user of the
generative AI doesn't understand.

While we don't see them because our github lockdown policy auto-closes
PRs we are already seeing a growth in submissions where the authors seem
to have YOLO'd the code generator without really understanding the
changes.

>  Examples of tools impacted by this policy includes GitHub's CoPilot, OpenAI's
>  ChatGPT, Anthropic's Claude, and Meta's Code Llama, and code/content
>  generation agents which are built on top of such tools.

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 4/4] docs/code-provenance: make the exception process feasible
  2025-09-22 11:46   ` Peter Maydell
  2025-09-22 12:06     ` Paolo Bonzini
@ 2025-09-22 13:04     ` Daniel P. Berrangé
  2025-09-22 13:26       ` Peter Maydell
  1 sibling, 1 reply; 20+ messages in thread
From: Daniel P. Berrangé @ 2025-09-22 13:04 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Paolo Bonzini, qemu-devel, Alex Bennée, Markus Armbruster,
	Stefan Hajnoczi

On Mon, Sep 22, 2025 at 12:46:51PM +0100, Peter Maydell wrote:
> On Mon, 22 Sept 2025 at 12:32, Paolo Bonzini <pbonzini@redhat.com> wrote:
> >
> > I do not think that anyone knows how to demonstrate "clarity of the
> > copyright status in relation to training".
> 
> Yes; to me this is the whole driving force behind the policy.
> 
> > On the other hand, AI tools can be used as a natural language refactoring
> > engine for simple tasks such as modifying all callers of a given function
> > or even less simple ones such as adding Python type annotations.
> > These tasks have a very low risk of introducing training material in
> > the code base, and can provide noticeable time savings because they are
> > easily tested and reviewed; for the lack of a better term, I will call
> > these "tasks with limited or non-existing creative content".
> 
> Does anybody know how to demonstrate "limited or non-existing
> creative content", which I assume is a standin here for
> "not copyrightable" ?

That was something we aimed to intentionally avoid specifying in the
policy. It is very hard to define it in a way that will be clearly
understood by all contributors.

Furthermore by defining it explicitly QEMU also weakens its legal
position should any issues arise, because it has pre-emptively
documented its acceptance of certain scenearios. This has the effect
of directing risk away from contributors and back onto the project.

We want to be very clear that the burden / requirement for determining
legal / license compliance of contributions rests on the contributor,
not the project, whether AI is involve or not.

In terms of historical practice, when contributors have come to us
with legal questions about whether they can contribute something or
the legality of cerati nchange, as a general rule we will avoid
giving any clear legal guidance from the project's POV.

Especially with any corporate contributor the rule is to refer that
person back to their own organization's legal department. This makes
it clear where the responsibility is and avoids the QEMU project
pre-emptively setting out its legal interpretation.

TL;DR: I don't think we should attempt to define whether the boundary
is between copyrightable and non-copyrightable code changes. 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 2/4] docs/code-provenance: make the exception process more prominent
  2025-09-22 11:32 ` [RFC PATCH 2/4] docs/code-provenance: make the exception process more prominent Paolo Bonzini
@ 2025-09-22 13:24   ` Daniel P. Berrangé
  2025-09-22 13:56     ` Paolo Bonzini
  0 siblings, 1 reply; 20+ messages in thread
From: Daniel P. Berrangé @ 2025-09-22 13:24 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel, Alex Bennée, Markus Armbruster, Peter Maydell,
	Stefan Hajnoczi

On Mon, Sep 22, 2025 at 01:32:17PM +0200, Paolo Bonzini wrote:
> The exception process is a second thought in QEMU's policy for AI-generated
> content.  It is not really possible to understand how people want to use
> these tools without formalizing it a bit more and encouraging people to
> request exceptions if they see a good use for AI-generated content.
> 
> Note that right now, in my opinion, the exception process remains
> infeasible, because there is no agreement on how to "demonstrate
> clarity of the license and copyright status for the tool's output".
> This will be sorted out separately.

FWIW, I considered that the "exception process" would end up
being something like...

 * someone wants to use a particular tool for something they
   believe is compelling
 * they complain on qemu-devel that our policy blocks their
   valid use
 * we debate it
 * if agreed, we add a note to this code-proveance.rst doc to
   allow it


I would imagine that exceptions might fall into two buckets

 * Descriptions of techniques/scenarios for using tools
   that limit the licensing risk
 * Details of specific tools (or more likely models) that
   are judged to have limited licensing risk

it is hard to predict the future though, so this might be
too simplistic. Time will tell when someone starts the
debate...


IOW, my suggestion would be that the document simply tells
people to raise a thread on qemu-devel if they would like
to discuss need for a particular exception, and mention
that any exceptions will be documented in this doc if they
are aggreed upon.

> What is missing: do we want a formal way to identify commits for which an
> exception to the AI policy was granted?  The common way to do so seems to
> be "Generated-by" or "Assisted-by" but I don't want to turn commit message
> into an ad space.  I would lean more towards something like
> 
>   AI-exception-granted-by: Mary Maintainer <mary.maintainer@mycorp.test>

IMHO the code-provenance.rst doc is what grants the exception, not
any individual person, nor any individual commit.

Whether we want to reference that a given commit is relying on an
exception or not is hard to say at this point as we don't know what
any exception would be like.

Ideally the applicability of an exception could be self-evident
from the commit. Realiyt might be more fuzzy. So if self-evident,
then it likely warrants a sentence two of english text in the
commit to justify its applicability.

IOW, a tag like AI-exception-granted-by doesn't feel like it is
particularly useful.

> 
> but at the same time I don't want to invent something just for QEMU.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  docs/devel/code-provenance.rst | 22 ++++++++++++----------
>  1 file changed, 12 insertions(+), 10 deletions(-)
> 
> diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
> index dba99a26f64..d435ab145cf 100644
> --- a/docs/devel/code-provenance.rst
> +++ b/docs/devel/code-provenance.rst
> @@ -292,7 +292,8 @@ TL;DR:
>  
>    **Current QEMU project policy is to DECLINE any contributions which are
>    believed to include or derive from AI generated content. This includes
> -  ChatGPT, Claude, Copilot, Llama and similar tools.**
> +  ChatGPT, Claude, Copilot, Llama and similar tools.  Exceptions may be
> +  requested on a case-by-case basis.**

I'm not sure what you mean by 'case-by-case basis' ? I certainly don't
think we should entertain debating use of AI in individual patch series,
as that'll be a never ending burden on reviewer/maintainer resources.

Exceptions should be things that can be applied somewhat generically to
tools, or models or usage scenarios IMHO.

>  
>    **This policy does not apply to other uses of AI, such as researching APIs
>    or algorithms, static analysis, or debugging, provided their output is not
> @@ -322,18 +323,19 @@ How contributors could comply with DCO terms (b) or (c) for the output of AI
>  content generators commonly available today is unclear.  The QEMU project is
>  not willing or able to accept the legal risks of non-compliance.
>  
> -The QEMU project thus requires that contributors refrain from using AI content
> -generators on patches intended to be submitted to the project, and will
> -decline any contribution if use of AI is either known or suspected.
> +The QEMU project requires contributors to refrain from using AI content
> +generators without going through an exception request process.
> +AI-generated code will only be included in the project after the
> +exception request has been evaluated by the QEMU project.  To be
> +granted an exception, a contributor will need to demonstrate clarity of
> +the license and copyright status for the tool's output in relation to its
> +training model and code, to the satisfaction of the project maintainers.
>  
> +Maintainers are not allow to grant an exception on their own patch
> +submissions.
>  
>  Examples of tools impacted by this policy includes GitHub's CoPilot, OpenAI's
>  ChatGPT, Anthropic's Claude, and Meta's Code Llama, and code/content
>  generation agents which are built on top of such tools.
> -
>  This policy may evolve as AI tools mature and the legal situation is
> -clarifed. In the meanwhile, requests for exceptions to this policy will be
> -evaluated by the QEMU project on a case by case basis. To be granted an
> -exception, a contributor will need to demonstrate clarity of the license and
> -copyright status for the tool's output in relation to its training model and
> -code, to the satisfaction of the project maintainers.
> +clarified.

I would suggest only this last paragraph be changed


  This policy may evolve as AI tools mature and the legal situation is
  clarifed.

  Exceptions
  ----------

  The QEMU project welcomes discussion on any exceptions to this policy,
  or more general revisions. This can be done by contacting the qemu-devel
  mailing list with details of a proposed tool / model / usage scenario /
  etc that is beneficial to QEMU, while still mitigating the legal risks
  to the project.

  After discussion, any exceptions that can be relied upon in contributions
  will be listed below. The listing of an exception does not remove the
  need for contributors to comply with all other pre-existing contribution
  requirements, including DCO signoff.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 4/4] docs/code-provenance: make the exception process feasible
  2025-09-22 13:04     ` Daniel P. Berrangé
@ 2025-09-22 13:26       ` Peter Maydell
  2025-09-22 14:03         ` Daniel P. Berrangé
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Maydell @ 2025-09-22 13:26 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Paolo Bonzini, qemu-devel, Alex Bennée, Markus Armbruster,
	Stefan Hajnoczi

On Mon, 22 Sept 2025 at 14:05, Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> On Mon, Sep 22, 2025 at 12:46:51PM +0100, Peter Maydell wrote:
> > On Mon, 22 Sept 2025 at 12:32, Paolo Bonzini <pbonzini@redhat.com> wrote:
> > >
> > > I do not think that anyone knows how to demonstrate "clarity of the
> > > copyright status in relation to training".
> >
> > Yes; to me this is the whole driving force behind the policy.
> >
> > > On the other hand, AI tools can be used as a natural language refactoring
> > > engine for simple tasks such as modifying all callers of a given function
> > > or even less simple ones such as adding Python type annotations.
> > > These tasks have a very low risk of introducing training material in
> > > the code base, and can provide noticeable time savings because they are
> > > easily tested and reviewed; for the lack of a better term, I will call
> > > these "tasks with limited or non-existing creative content".
> >
> > Does anybody know how to demonstrate "limited or non-existing
> > creative content", which I assume is a standin here for
> > "not copyrightable" ?
>
> That was something we aimed to intentionally avoid specifying in the
> policy. It is very hard to define it in a way that will be clearly
> understood by all contributors.

> TL;DR: I don't think we should attempt to define whether the boundary
> is between copyrightable and non-copyrightable code changes.

Well, this is why I think a policy that just says "no" is
more easily understandable and followable. As soon as we
start defining and granting exceptions then we're effectively
in the position of making judgements and defining the boundary.

-- PMM


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 3/4] docs/code-provenance: clarify the scope of AI exceptions
  2025-09-22 13:02   ` Alex Bennée
@ 2025-09-22 13:38     ` Daniel P. Berrangé
  0 siblings, 0 replies; 20+ messages in thread
From: Daniel P. Berrangé @ 2025-09-22 13:38 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Paolo Bonzini, qemu-devel, Markus Armbruster, Peter Maydell,
	Stefan Hajnoczi

On Mon, Sep 22, 2025 at 02:02:23PM +0100, Alex Bennée wrote:
> Paolo Bonzini <pbonzini@redhat.com> writes:
> 
> > Using phrasing from https://openinfra.org/legal/ai-policy (with just
> > "commit" replaced by "submission", because we do not submit changes
> > as commits but rather emails), clarify that the maintainer who bestows
> > their blessing on the AI-generated contribution is not responsible
> > for its copyright or license status beyond what is required by the
> > Developer's Certificate of Origin.
> >
> > [This is not my preferred phrasing.  I would prefer something lighter
> > like "the "Signed-off-by" label in the contribution gives the author
> > responsibility".  But for the sake of not reinventing the wheel I am
> > keeping the exact works from the OpenInfra policy.]
> >
> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > ---
> >  docs/devel/code-provenance.rst | 5 +++++
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
> > index d435ab145cf..a5838f63649 100644
> > --- a/docs/devel/code-provenance.rst
> > +++ b/docs/devel/code-provenance.rst
> > @@ -334,6 +334,11 @@ training model and code, to the satisfaction of the project maintainers.
> >  Maintainers are not allow to grant an exception on their own patch
> >  submissions.
> >  
> > +Even after an exception is granted, the "Signed-off-by" label in the
> > +contribution is a statement that the author takes responsibility for the
> > +entire contents of the submission, including any parts that were generated
> > +or assisted by AI tools or other tools.
> > +
> 
> I quite like the LLVM wording which makes expectations clear to the
> submitter:
> 
>   While the LLVM project has a liberal policy on AI tool use, contributors
>   are considered responsible for their contributions. We encourage
>   contributors to review all generated code before sending it for review
>   to verify its correctness and to understand it so that they can answer
>   questions during code review. Reviewing and maintaining generated code
>   that the original contributor does not understand is not a good use of
>   limited project resources.
> 
> It could perhaps be even stronger (must rather than encourage). The key
> point to emphasise is we don't want submissions the user of the
> generative AI doesn't understand.
> 
> While we don't see them because our github lockdown policy auto-closes
> PRs we are already seeing a growth in submissions where the authors seem
> to have YOLO'd the code generator without really understanding the
> changes.

While I understand where the LLVM maintainers are coming from, IMHO
their proposed policy leaves alot to be desired. 80% of the material
in the policy has nothing to do with AI content. Rather it is stating
the general contribution norms that the project expects to be followed
regardless of what tools may have been used.

I think perhaps alot of the contributions norms are previously informal
and learnt on the job as you gradually acclimatize to participation in
a specific project, or first learning about open source in general.

This reliance on informal norms was always somewhat of a problem, but
it is being supercharged by AI. It is now much more likely to see project
interactions from less experienced people, who are relying on AI tools
to provide a quick on-ramp to the project, bypassing the more gradual
learning experience.

As an example of why the distinction between AI policy and general
contribution policy matters, consider the great many bugs / security
reports we've had based off the output of static analysis tools.

Almost none of this was related to AI, but the people submitting
them often failed on basic expectations such as sanity checking
what the tool claimed, or understanding what they were reporting,
or understanding why they're changing the code they way they did.

If we don't already have our "contribution norms" sufficiently
clearly documented, we should improve that independently of any
AI related policy.  The AI related section in our docs should
merely refer the reader over to our other contribution policies
for anything that isn't directly related to AI.

We do have a gap wrt to bug reporting where I think we should document
an expectation that any use of automated tools in the bug report must
be diclosed, whether those tools are AI or not. This should apply to
any static analysis tool.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 2/4] docs/code-provenance: make the exception process more prominent
  2025-09-22 13:24   ` Daniel P. Berrangé
@ 2025-09-22 13:56     ` Paolo Bonzini
  2025-09-22 14:51       ` Daniel P. Berrangé
  0 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2025-09-22 13:56 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Alex Bennée, Markus Armbruster, Peter Maydell,
	Stefan Hajnoczi

On 9/22/25 15:24, Daniel P. Berrangé wrote:
> FWIW, I considered that the "exception process" would end up
> being something like...
> 
>   * someone wants to use a particular tool for something they
>     believe is compelling
>   * they complain on qemu-devel that our policy blocks their
>     valid use
>   * we debate it

I guess we're here, except for hiding the complaint behind a patch. :)

>   * if agreed, we add a note to this code-proveance.rst doc to
>     allow it
> 
> 
> I would imagine that exceptions might fall into two buckets
> 
>   * Descriptions of techniques/scenarios for using tools
>     that limit the licensing risk
>   * Details of specific tools (or more likely models) that
>     are judged to have limited licensing risk
>
> it is hard to predict the future though, so this might be
> too simplistic. Time will tell when someone starts the
> debate...

Yeah, I'm afraid it is; allowing specific tools might not be feasible, 
as the scope of "allow Claude Code" or "allow cut and paste for ChatGPT 
chats" is obviously way too large.  Allowing some usage scenarios seems 
more feasible (as done in patch 4).

>> What is missing: do we want a formal way to identify commits for which an
>> exception to the AI policy was granted?  The common way to do so seems to
>> be "Generated-by" or "Assisted-by" but I don't want to turn commit message
>> into an ad space.  I would lean more towards something like
>>
>>    AI-exception-granted-by: Mary Maintainer <mary.maintainer@mycorp.test>
> 
> IMHO the code-provenance.rst doc is what grants the exception, not
> any individual person, nor any individual commit.
> 
> Whether we want to reference that a given commit is relying on an
> exception or not is hard to say at this point as we don't know what
> any exception would be like.
> 
> Ideally the applicability of an exception could be self-evident
> from the commit. Realiyt might be more fuzzy. So if self-evident,
> then it likely warrants a sentence two of english text in the
> commit to justify its applicability.
> IOW, a tag like AI-exception-granted-by doesn't feel like it is
> particularly useful.

I meant it as more of an audit trail, especially for the case where a 
new submaintainer would prefer to ask someone else, or for the case of a 
maintainer contributing AI-generated code.  If we can keep it simple and 
avoid this, that's fine (it's not even in the policy, only in the commit 
message).

What I do *not* want is Generated-by or Assisted-by.  The exact model or 
tool should matter in deciding whether a contribution fits the 
exception.  Companies tell their employees "you can use this model 
because we have an indemnification contract in place", but I don't think 
we should care about what contracts they have---we have no way to check 
if it's true or if the indemnification extends to QEMU, for example.

>>     **Current QEMU project policy is to DECLINE any contributions which are
>>     believed to include or derive from AI generated content. This includes
>> -  ChatGPT, Claude, Copilot, Llama and similar tools.**
>> +  ChatGPT, Claude, Copilot, Llama and similar tools.  Exceptions may be
>> +  requested on a case-by-case basis.**
> 
> I'm not sure what you mean by 'case-by-case basis' ? I certainly don't
> think we should entertain debating use of AI in individual patch series,
> as that'll be a never ending burden on reviewer/maintainer resources.
> 
> Exceptions should be things that can be applied somewhat generically to
> tools, or models or usage scenarios IMHO.

I meant that at some point a human will have to agree that it fits the 
exception, but yeah it is not the right place to say that.

> I would suggest only this last paragraph be changed
> 
> 
>    This policy may evolve as AI tools mature and the legal situation is
>    clarifed.
> 
>    Exceptions
>    ----------
> 
>    The QEMU project welcomes discussion on any exceptions to this policy,
>    or more general revisions. This can be done by contacting the qemu-devel
>    mailing list with details of a proposed tool / model / usage scenario /
>    etc that is beneficial to QEMU, while still mitigating the legal risks
>    to the project.
> 
>    After discussion, any exceptions that can be relied upon in contributions
>    will be listed below. The listing of an exception does not remove the
>    need for contributors to comply with all other pre-existing contribution
>    requirements, including DCO signoff.

This sounds good (I'd like to keep the requirement that maintainers ask 
for a second opinion when contributing AI-generated code, but that can 
be weaved into your proposal).  Another benefit is that this phrasing is 
independent of the existence of any exceptions.

I'll split the first three patches into its own non-RFC series, and we 
can keep discussing the "refactoring scenario" in this thread.

Paolo



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 4/4] docs/code-provenance: make the exception process feasible
  2025-09-22 13:26       ` Peter Maydell
@ 2025-09-22 14:03         ` Daniel P. Berrangé
  2025-09-22 15:10           ` Paolo Bonzini
  0 siblings, 1 reply; 20+ messages in thread
From: Daniel P. Berrangé @ 2025-09-22 14:03 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Paolo Bonzini, qemu-devel, Alex Bennée, Markus Armbruster,
	Stefan Hajnoczi

On Mon, Sep 22, 2025 at 02:26:00PM +0100, Peter Maydell wrote:
> On Mon, 22 Sept 2025 at 14:05, Daniel P. Berrangé <berrange@redhat.com> wrote:
> >
> > On Mon, Sep 22, 2025 at 12:46:51PM +0100, Peter Maydell wrote:
> > > On Mon, 22 Sept 2025 at 12:32, Paolo Bonzini <pbonzini@redhat.com> wrote:
> > > >
> > > > I do not think that anyone knows how to demonstrate "clarity of the
> > > > copyright status in relation to training".
> > >
> > > Yes; to me this is the whole driving force behind the policy.
> > >
> > > > On the other hand, AI tools can be used as a natural language refactoring
> > > > engine for simple tasks such as modifying all callers of a given function
> > > > or even less simple ones such as adding Python type annotations.
> > > > These tasks have a very low risk of introducing training material in
> > > > the code base, and can provide noticeable time savings because they are
> > > > easily tested and reviewed; for the lack of a better term, I will call
> > > > these "tasks with limited or non-existing creative content".
> > >
> > > Does anybody know how to demonstrate "limited or non-existing
> > > creative content", which I assume is a standin here for
> > > "not copyrightable" ?
> >
> > That was something we aimed to intentionally avoid specifying in the
> > policy. It is very hard to define it in a way that will be clearly
> > understood by all contributors.
> 
> > TL;DR: I don't think we should attempt to define whether the boundary
> > is between copyrightable and non-copyrightable code changes.
> 
> Well, this is why I think a policy that just says "no" is
> more easily understandable and followable. As soon as we
> start defining and granting exceptions then we're effectively
> in the position of making judgements and defining the boundary.

Whether we have our AI policy or not, contributors are still required
to abide by the terms of the DCO, which requires them to understand
the legal situation of any contribution.

Our policy is effectively saying that most use of AI is such that we
don't think it is possible for contributions to claim DCO compliance.

If we think there are situations where it might be credible for a
contributor to claim DCO compliance, we can try to find a way to
describe that situation, without having to explicitly state our
legal interpretation of the "copyrightable vs non-copyrightable"
boundary.

At KVM Forum what was notably raised as the topic fo code
refactoring and whether it is practical to allow some such
usage.

We have historically allowed machine refactoring done by Coccinelle
for example. Someone could asks an AI agent to write a Coccinelle
script for a given task, and then tells the AI to run that script
across the code base. I think that might be a situation where it
would be reasonable to accept the AI driven refactoring, as the
substance of the comit is clearly defined by the Coccinelle
script.

Could that be summarized by saying that we'll allow refactoring
if driven via an intermediate script ? That is still quite a
strict definition that could frustrate much usage, but it at
least feels like something that should have greatl]y reduced
risk compared to direct refactoring by an opaque agent.

As an example though, we have the scripts/clean-includes.pl script
that Markus wrote for manipulating code into our preferred style
for headers.

Whether the headers change is done manually by a human, automated
with Markus' perl script or automated by an AI agent, the end
result should be identical, as there is only one possible end
point and you can describe what that end point should look like.

That said there is  still a questionmark over complexity. Getting
to the end point may be a trival & mundane exercise in some cases,
while requiring considerable intellectual thought in other cases.
The latter is perhaps especially true if wanting simple, easily
bisected series of small steps rather than a big bang conversion.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 2/4] docs/code-provenance: make the exception process more prominent
  2025-09-22 13:56     ` Paolo Bonzini
@ 2025-09-22 14:51       ` Daniel P. Berrangé
  0 siblings, 0 replies; 20+ messages in thread
From: Daniel P. Berrangé @ 2025-09-22 14:51 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel, Alex Bennée, Markus Armbruster, Peter Maydell,
	Stefan Hajnoczi

On Mon, Sep 22, 2025 at 03:56:51PM +0200, Paolo Bonzini wrote:
> On 9/22/25 15:24, Daniel P. Berrangé wrote:
> > FWIW, I considered that the "exception process" would end up
> > being something like...
> > 
> >   * someone wants to use a particular tool for something they
> >     believe is compelling
> >   * they complain on qemu-devel that our policy blocks their
> >     valid use
> >   * we debate it
> 
> I guess we're here, except for hiding the complaint behind a patch. :)
> 
> >   * if agreed, we add a note to this code-proveance.rst doc to
> >     allow it
> > 
> > 
> > I would imagine that exceptions might fall into two buckets
> > 
> >   * Descriptions of techniques/scenarios for using tools
> >     that limit the licensing risk
> >   * Details of specific tools (or more likely models) that
> >     are judged to have limited licensing risk
> > 
> > it is hard to predict the future though, so this might be
> > too simplistic. Time will tell when someone starts the
> > debate...
> 
> Yeah, I'm afraid it is; allowing specific tools might not be feasible, as
> the scope of "allow Claude Code" or "allow cut and paste for ChatGPT chats"
> is obviously way too large.  Allowing some usage scenarios seems more
> feasible (as done in patch 4).

Agreed, when I say an exception a tool, I would find it highly
unlikely we would do so for such a highly generic tool as
Claude/ChatGPT. That would effectively be removing all policy
limitations.

Rather I was thinking about the possibility that certain very
specialized tools might appear.

The usage scenarios exception seems the much more likely one
in the near future.

> > > What is missing: do we want a formal way to identify commits for which an
> > > exception to the AI policy was granted?  The common way to do so seems to
> > > be "Generated-by" or "Assisted-by" but I don't want to turn commit message
> > > into an ad space.  I would lean more towards something like
> > > 
> > >    AI-exception-granted-by: Mary Maintainer <mary.maintainer@mycorp.test>
> > 
> > IMHO the code-provenance.rst doc is what grants the exception, not
> > any individual person, nor any individual commit.
> > 
> > Whether we want to reference that a given commit is relying on an
> > exception or not is hard to say at this point as we don't know what
> > any exception would be like.
> > 
> > Ideally the applicability of an exception could be self-evident
> > from the commit. Realiyt might be more fuzzy. So if self-evident,
> > then it likely warrants a sentence two of english text in the
> > commit to justify its applicability.
> > IOW, a tag like AI-exception-granted-by doesn't feel like it is
> > particularly useful.
> 
> I meant it as more of an audit trail, especially for the case where a new
> submaintainer would prefer to ask someone else, or for the case of a
> maintainer contributing AI-generated code.  If we can keep it simple and
> avoid this, that's fine (it's not even in the policy, only in the commit
> message).

When a maintainer gives an Acked-by or Signed-off-by tag they
are stating the contribution complies with our policies and
that includes this AI policy.

If a maintainer isn't comfortable with the AI exception
applicability they should not give Acked-by/Signed-off-by,
and/or ask another maintainer to give their own NNN-by tag
as a second opinion.

> What I do *not* want is Generated-by or Assisted-by.

Yes, I don't want to see us advertizing commercial products in
git history

>                                                      The exact model or
> tool should matter in deciding whether a contribution fits the exception.
> Companies tell their employees "you can use this model because we have an
> indemnification contract in place", but I don't think we should care about
> what contracts they have---we have no way to check if it's true or if the
> indemnification extends to QEMU, for example.

Employees likely don't have any way to check that either. They'll
just be blindly trusting what little information their employer
provides, if any. We don't want to put our contributions into an
impossible situation wrt determining compliance. It needs to be
practical for them to make a judgement call.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 4/4] docs/code-provenance: make the exception process feasible
  2025-09-22 14:03         ` Daniel P. Berrangé
@ 2025-09-22 15:10           ` Paolo Bonzini
  2025-09-22 16:36             ` Daniel P. Berrangé
  0 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2025-09-22 15:10 UTC (permalink / raw)
  To: Daniel P. Berrangé, Peter Maydell
  Cc: qemu-devel, Alex Bennée, Markus Armbruster, Stefan Hajnoczi

On 9/22/25 16:03, Daniel P. Berrangé wrote:
> Whether we have our AI policy or not, contributors are still required
> to abide by the terms of the DCO, which requires them to understand
> the legal situation of any contribution.
> 
> Our policy is effectively saying that most use of AI is such that we
> don't think it is possible for contributions to claim DCO compliance.
> 
> If we think there are situations where it might be credible for a
> contributor to claim DCO compliance, we can try to find a way to
> describe that situation, without having to explicitly state our
> legal interpretation of the "copyrightable vs non-copyrightable"
> boundary.

Right.  I am sure that a lawyer would find some overlap between my 
definition of "where the creativity lies" and the law's definition of 
"copyrightability", but that's not where I am coming from and I am not 
even pretending to be dispensing legal advice.

The point is more that the tool shouldn't have any bearing on DCO 
compliance if the same contributor can reasonably make the same change 
with different tools or with just an editor.  And we have dozens of 
mechanical changes contributed every year, written either by hand or 
with a wide variety of tools.

I have no QEMU example at hand, but let's look at a commit like 
https://github.com/bonzini/meson/commit/09765594d.  Something like this 
could be plausibly created with AI.  What I care about is:

* to what degree can I automate what I could do by hand.  An AI tool 
moves the break-even point more towards automation.  I would not bring 
up Coccinelle for a 10 line change, in fact I looked by hand at every 
occurrence of ".cfg" and relied on mypy to check if I missed something. 
Maybe an experienced AI user would have reached to AI as the first step?[1]

* keeping people honest.  Between the two cases of "they don't tell and 
I don't realize it is AI-generated" and "they split the commit clearly 
into AI-generated and human-generated parts", an exception makes the 
latter more likely to happen.

> That said there is  still a questionmark over complexity. Getting
> to the end point may be a trival & mundane exercise in some cases,
> while requiring considerable intellectual thought in other cases.
> The latter is perhaps especially true if wanting simple, easily
> bisected series of small steps rather than a big bang conversion.

We encourage anyway people to isolate the mundane parts, therefore they 
could use AI for them if they see fit.  Independent of whether the 
contributor has worked on QEMU before, the more complex parts are also 
signed-off on (and we'd much more likely spot signs of AI usage when 
reviewing them) and that makes me more willing to trust their good faith.

Paolo

[1] I tried "I want to track the PackageConfiguration object per machine 
in mesonbuild/cargo/interpreter.py.  Make PackageState.cfg a PerMachine 
object. Initialize PackageState.cfg when the PackageState is created. 
The old pkg.cfg becomes pkg.cfg[MachineChoice.HOST]" and it did pretty 
much the same changes in a bit more than 2 minutes.  Including the time 
to write the prompt it's almost certainly more than it took me to do it 
by hand, but this time I was doing something else in the meanwhile. :)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 4/4] docs/code-provenance: make the exception process feasible
  2025-09-22 15:10           ` Paolo Bonzini
@ 2025-09-22 16:36             ` Daniel P. Berrangé
  2025-09-22 16:55               ` Paolo Bonzini
  0 siblings, 1 reply; 20+ messages in thread
From: Daniel P. Berrangé @ 2025-09-22 16:36 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Maydell, qemu-devel, Alex Bennée, Markus Armbruster,
	Stefan Hajnoczi

On Mon, Sep 22, 2025 at 05:10:24PM +0200, Paolo Bonzini wrote:
> 
> I have no QEMU example at hand, but let's look at a commit like
> https://github.com/bonzini/meson/commit/09765594d.  Something like this
> could be plausibly created with AI.  What I care about is:

I'd agree it is something AI could likely come up with, given the
right prompt, but in terms of defining policy that conceptally
feels more like new functionality, mixed in with refactoring.

> * to what degree can I automate what I could do by hand.  An AI tool moves
> the break-even point more towards automation.  I would not bring up
> Coccinelle for a 10 line change, in fact I looked by hand at every
> occurrence of ".cfg" and relied on mypy to check if I missed something.
> Maybe an experienced AI user would have reached to AI as the first step?[1]

What matters is not whether Coccinelle was practical to use
or not, and also not whether it was possible to express the
concept in its particular language.

Rather I'm thinking about it as a conceptual guide for whether
a change might be expressible as a plain transformation or not.

I don't think the meson change satisfies that, because you
wouldn't express the new class level properties, or the new
get_or_create_cfg code as an algorithmic refactoring. Those
are a case of creative coding.

> * keeping people honest.  Between the two cases of "they don't tell and I
> don't realize it is AI-generated" and "they split the commit clearly into
> AI-generated and human-generated parts", an exception makes the latter more
> likely to happen.

> [1] I tried "I want to track the PackageConfiguration object per machine in
> mesonbuild/cargo/interpreter.py.  Make PackageState.cfg a PerMachine object.
> Initialize PackageState.cfg when the PackageState is created. The old
> pkg.cfg becomes pkg.cfg[MachineChoice.HOST]" and it did pretty much the same
> changes in a bit more than 2 minutes.  Including the time to write the
> prompt it's almost certainly more than it took me to do it by hand, but this
> time I was doing something else in the meanwhile. :)

When we talk about "limited / non-creative refactoring", my interpretation
would be that it conceptually applies to changes which could be describe as
an algorithmic transformation. This prompt and the resulting code feel like
more than that. The prompt is expressing a creative change, and while the
result includes some algorithmic refactoring it, includes other stuff too.

Describing a policy that allows your meson example, in a way that will be
interpreted in a reasonably consistent way by contributors looks like a
challenge to me.

On the flip side, you might have written the new property / getter method
manually and asked the agent to finish the conversion, and that would
have been acceptable. This is a can or worms to express in a policy.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 4/4] docs/code-provenance: make the exception process feasible
  2025-09-22 16:36             ` Daniel P. Berrangé
@ 2025-09-22 16:55               ` Paolo Bonzini
  0 siblings, 0 replies; 20+ messages in thread
From: Paolo Bonzini @ 2025-09-22 16:55 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Peter Maydell, qemu-devel, Alex Bennée, Markus Armbruster,
	Stefan Hajnoczi

On Mon, Sep 22, 2025 at 6:37 PM Daniel P. Berrangé <berrange@redhat.com> wrote:
> On Mon, Sep 22, 2025 at 05:10:24PM +0200, Paolo Bonzini wrote:
> > I have no QEMU example at hand, but let's look at a commit like
> > https://github.com/bonzini/meson/commit/09765594d.  Something like this
> > could be plausibly created with AI.  What I care about is:
>
> I'd agree it is something AI could likely come up with, given the
> right prompt, but in terms of defining policy it conceptually
> feels more like new functionality, mixed in with refactoring.
> [...]
> you wouldn't express the new class level properties, or the new
> get_or_create_cfg code as an algorithmic refactoring. Those
> are a case of creative coding.

Yes, I agree. Those are creative, and obviously not part of what the
LLM can produce with a pure "refactoring prompt". In that commit,
clearly, I hadn't made a strong attempt at splitting out new
functionality and refactoring; I might even do that now. :)

> When we talk about "limited / non-creative refactoring", my interpretation
> would be that it conceptually applies to changes which could be describe as
> an algorithmic transformation. This prompt and the resulting code feel like
> more than that. The prompt is expressing a creative change, and while the
> result includes some algorithmic refactoring it, includes other stuff too.
>
> Describing a policy that allows your meson example, in a way that will be
> interpreted in a reasonably consistent way by contributors looks like a
> challenge to me.

I agree with your reasoning that the commit goes beyond the "no
creative change" line, or at least parts of it do.

Inadvertently, this is also an example of how the policy helps AI
users follow our existing contribution standards.

> On the flip side, you might have written the new property / getter method
> manually and asked the agent to finish the conversion, and that would
> have been acceptable. This is a can or worms to express in a policy.

Yes, a better approach would have been to change the initializer and
ask AI to do the mechanical parts. Something like, in a commit
message:

Note: after changing the initializer, the bulk of the changes were
done with the following prompt: "finish this conversion - i want to
track the PackageConfiguration object per machine, with pkg.cfg
becoming pkg.cfg[MachineChoice.HOST]".

Still, putting the two together follows the exception text encouraging
"to not mix AI- and human-written code in the same commit, *as much as
possible*". Again, this is just an example, and in practice the amount
of non-creative refactoring would be much larger than the rest.

Paolo

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2025-09-22 16:56 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-22 11:32 [RFC PATCH 0/4] docs/code-provenance: make AI policy clearer and more practical Paolo Bonzini
2025-09-22 11:32 ` [RFC PATCH 1/4] docs/code-provenance: clarify scope very early Paolo Bonzini
2025-09-22 11:34   ` Daniel P. Berrangé
2025-09-22 12:52   ` Alex Bennée
2025-09-22 11:32 ` [RFC PATCH 2/4] docs/code-provenance: make the exception process more prominent Paolo Bonzini
2025-09-22 13:24   ` Daniel P. Berrangé
2025-09-22 13:56     ` Paolo Bonzini
2025-09-22 14:51       ` Daniel P. Berrangé
2025-09-22 11:32 ` [RFC PATCH 3/4] docs/code-provenance: clarify the scope of AI exceptions Paolo Bonzini
2025-09-22 13:02   ` Alex Bennée
2025-09-22 13:38     ` Daniel P. Berrangé
2025-09-22 11:32 ` [RFC PATCH 4/4] docs/code-provenance: make the exception process feasible Paolo Bonzini
2025-09-22 11:46   ` Peter Maydell
2025-09-22 12:06     ` Paolo Bonzini
2025-09-22 13:04     ` Daniel P. Berrangé
2025-09-22 13:26       ` Peter Maydell
2025-09-22 14:03         ` Daniel P. Berrangé
2025-09-22 15:10           ` Paolo Bonzini
2025-09-22 16:36             ` Daniel P. Berrangé
2025-09-22 16:55               ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).