on ai generated and code provenance

All of lore.kernel.org
 help / color / mirror / Atom feed

* on ai generated and code provenance
@ 2026-05-24 12:42 Michael S. Tsirkin
  2026-05-24 17:06 ` Alex Bennée
                   ` (2 more replies)
  0 siblings, 3 replies; 59+ messages in thread
From: Michael S. Tsirkin @ 2026-05-24 12:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: stefanha

So, I had to reject a perfectly reasonable patch:
https://lore.kernel.org/qemu-devel/20260320193746.242704-1-jinpu.wang@ionos.com/
just because of a tool used to make it.


	How contributors could comply with DCO terms (b) or (c) for the output of AI
	content generators commonly available today is unclear.  The QEMU project is
	not willing or able to accept the legal risks of non-compliance.


But, since this was written, Red Hat's Richard Fontana and Chris Wright
published this piece:
https://www.redhat.com/en/blog/ai-assisted-development-and-open-source-navigating-legal-issues


Saying, in particular "
	We understand this concern, but the DCO has never
	been interpreted to require that every line of a contribution must be
	the personal creative expression of the contributor or another human
	developer. 
"

I propose adopting linux's rules instead:
https://docs.kernel.org/process/coding-assistants.html

which boils down to attribution.


-- 
MST



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-24 12:42 on ai generated and code provenance Michael S. Tsirkin
@ 2026-05-24 17:06 ` Alex Bennée
  2026-05-24 17:42   ` Michael S. Tsirkin
                     ` (2 more replies)
  2026-05-25 16:32 ` Paolo Bonzini
  2026-05-26 17:43 ` Kevin Wolf
  2 siblings, 3 replies; 59+ messages in thread
From: Alex Bennée @ 2026-05-24 17:06 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel, stefanha

"Michael S. Tsirkin" <mst@redhat.com> writes:

> So, I had to reject a perfectly reasonable patch:
> https://lore.kernel.org/qemu-devel/20260320193746.242704-1-jinpu.wang@ionos.com/
> just because of a tool used to make it.
>
>
> 	How contributors could comply with DCO terms (b) or (c) for the output of AI
> 	content generators commonly available today is unclear.  The QEMU project is
> 	not willing or able to accept the legal risks of non-compliance.

In the linked case the LLM is basically doing a glorified search and
replace. There seems to be no danger of accidentally regurgitating any
training data which is where the worry about inadvertent copyright
infringement comes from.

That said in my experience generally any code that does come out from
these tools tends to match the local code style and patterns pretty
well. As a general purpose boilerplate generator they are probably
better than a lot of people at this point.

There has been some case law now that says LLM output could be
un-copyrightable depending on how involved the user was in the iteration
of the code. I suspect there is still more to come.

>
>
> But, since this was written, Red Hat's Richard Fontana and Chris Wright
> published this piece:
> https://www.redhat.com/en/blog/ai-assisted-development-and-open-source-navigating-legal-issues
>
>
> Saying, in particular "
> 	We understand this concern, but the DCO has never
> 	been interpreted to require that every line of a contribution must be
> 	the personal creative expression of the contributor or another human
> 	developer. 
> "
>
> I propose adopting linux's rules instead:
> https://docs.kernel.org/process/coding-assistants.html
>
> which boils down to attribution.

attribution and *ownership*. I think the key point of the policy is to
make the actual engineer signing the DCO the responsible one for
generating, testing and validating the code. It is strongly trying to
suggest that vibe-coded slop isn't wanted.

I still have concerns about the quality of the code and the
"understanding" these models have. They can generate very convincing
rationales for their decisions but they also are prone to being
over-verbose and over-complicating the solutions. They have a tendency
to chase down rabbit holes in the code and get lost while making wilder
and more invasive changes to try and get things working.

That said for personal scripts or random experiments the ability to
quickly get to a PoC is pretty great.

I think there is also scope for using LLMs for things that aren't
directly writing code:

  - code review
  - investigation
  - generating test cases
  - polishing documentation

and I wonder if we should spend some more time investigating the
performance and pitfalls of LLMs before we open the flood gates to the
code.

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-24 17:06 ` Alex Bennée
@ 2026-05-24 17:42   ` Michael S. Tsirkin
  2026-05-24 18:26   ` Warner Losh
  2026-05-24 20:11   ` Michael S. Tsirkin
  2 siblings, 0 replies; 59+ messages in thread
From: Michael S. Tsirkin @ 2026-05-24 17:42 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, stefanha

On Sun, May 24, 2026 at 06:06:46PM +0100, Alex Bennée wrote:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> 
> > So, I had to reject a perfectly reasonable patch:
> > https://lore.kernel.org/qemu-devel/20260320193746.242704-1-jinpu.wang@ionos.com/
> > just because of a tool used to make it.
> >
> >
> > 	How contributors could comply with DCO terms (b) or (c) for the output of AI
> > 	content generators commonly available today is unclear.  The QEMU project is
> > 	not willing or able to accept the legal risks of non-compliance.
> 
> In the linked case the LLM is basically doing a glorified search and
> replace. There seems to be no danger of accidentally regurgitating any
> training data which is where the worry about inadvertent copyright
> infringement comes from.
> 
> That said in my experience generally any code that does come out from
> these tools tends to match the local code style and patterns pretty
> well.

Making the code original, too.

> As a general purpose boilerplate generator they are probably
> better than a lot of people at this point.
> 
> There has been some case law now that says LLM output could be
> un-copyrightable depending on how involved the user was in the iteration
> of the code. I suspect there is still more to come.

Waiting for courts to settle anything means waiting years, while
the industry has mostly moved on.

> >
> >
> > But, since this was written, Red Hat's Richard Fontana and Chris Wright
> > published this piece:
> > https://www.redhat.com/en/blog/ai-assisted-development-and-open-source-navigating-legal-issues
> >
> >
> > Saying, in particular "
> > 	We understand this concern, but the DCO has never
> > 	been interpreted to require that every line of a contribution must be
> > 	the personal creative expression of the contributor or another human
> > 	developer. 
> > "
> >
> > I propose adopting linux's rules instead:
> > https://docs.kernel.org/process/coding-assistants.html
> >
> > which boils down to attribution.
> 
> attribution and *ownership*. I think the key point of the policy is to
> make the actual engineer signing the DCO the responsible one for
> generating, testing and validating the code. It is strongly trying to
> suggest that vibe-coded slop isn't wanted.
> 
> I still have concerns about the quality of the code and the
> "understanding" these models have. They can generate very convincing
> rationales for their decisions but they also are prone to being
> over-verbose and over-complicating the solutions. They have a tendency
> to chase down rabbit holes in the code and get lost while making wilder
> and more invasive changes to try and get things working.


That's up to maintainers though.

> That said for personal scripts or random experiments the ability to
> quickly get to a PoC is pretty great.

Patch above is beyond that.

> I think there is also scope for using LLMs for things that aren't
> directly writing code:
> 
>   - code review
>   - investigation
>   - generating test cases
>   - polishing documentation
> 
> and I wonder if we should spend some more time investigating the
> performance and pitfalls of LLMs before we open the flood gates to the
> code.

Who would do the investigating?

> -- 
> Alex Bennée
> Virtualisation Tech Lead @ Linaro



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-24 17:06 ` Alex Bennée
  2026-05-24 17:42   ` Michael S. Tsirkin
@ 2026-05-24 18:26   ` Warner Losh
  2026-05-24 20:04     ` Michael S. Tsirkin
  2026-05-24 20:11   ` Michael S. Tsirkin
  2 siblings, 1 reply; 59+ messages in thread
From: Warner Losh @ 2026-05-24 18:26 UTC (permalink / raw)
  To: Alex Bennée; +Cc: Michael S. Tsirkin, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 8247 bytes --]

On Sun, May 24, 2026 at 11:08 AM Alex Bennée <alex.bennee@linaro.org> wrote:

> "Michael S. Tsirkin" <mst@redhat.com> writes:
>
> > So, I had to reject a perfectly reasonable patch:
> >
> https://lore.kernel.org/qemu-devel/20260320193746.242704-1-jinpu.wang@ionos.com/
> > just because of a tool used to make it.
> >
> >
> >       How contributors could comply with DCO terms (b) or (c) for the
> output of AI
> >       content generators commonly available today is unclear.  The QEMU
> project is
> >       not willing or able to accept the legal risks of non-compliance.
>
> In the linked case the LLM is basically doing a glorified search and
> replace. There seems to be no danger of accidentally regurgitating any
> training data which is where the worry about inadvertent copyright
> infringement comes from.
>

Yes. The LLM copying code thing is so two years ago. LLMs don't do
this anymore. They are just glorified pattern matchers, and generate
based on the patterns they know. While there may be a tiny risk here,
there's a greater risk today from humans doing this w/o attribution.

> That said in my experience generally any code that does come out from
> these tools tends to match the local code style and patterns pretty
> well. As a general purpose boilerplate generator they are probably
> better than a lot of people at this point.
>
> There has been some case law now that says LLM output could be
> un-copyrightable depending on how involved the user was in the iteration
> of the code. I suspect there is still more to come.
>

So let's be clear here, because it matters. The output of LLMs is in the
public domain because there's not a human author. Why would that matter?
I ask because there's large parts of the linux kernel that cannot enjoy
copyright
protection because they are mere facts (like tables of register writes to
initialize
a device). That doesn't stop the author from including the public domain
code into
the linux kernel (or FreeBSD or whatever). There are elements that can be
protected
by copyright and elements that can't. However, it's perfect acceptable to
include public domain material in your copyrighted works. Adding LLM
generated
code, assuming it's unmodified, would be just that. Just like Disney did
with a
zillion movies. And most of the time when I use LLM output, I modify it a
bit
to be better. The LLM generation is close, but not quite right. It really
is a so-so
junior engineer that's a bit too keen on following rules.

But anyway, the public domain aspect doesn't matter for us. Either there's
no
copyright, in which case people can copy it w/o a license. Or there is, and
we grant
one that's very permissive in what it allows. Folding the public domain
code into
projects is a time-honored tradition. Why would LLMs change this dynamic?

> >
> >
> > But, since this was written, Red Hat's Richard Fontana and Chris Wright
> > published this piece:
> >
> https://www.redhat.com/en/blog/ai-assisted-development-and-open-source-navigating-legal-issues
> >
> >
> > Saying, in particular "
> >       We understand this concern, but the DCO has never
> >       been interpreted to require that every line of a contribution must
> be
> >       the personal creative expression of the contributor or another
> human
> >       developer.
> > "
> >
> > I propose adopting linux's rules instead:
> > https://docs.kernel.org/process/coding-assistants.html
> >
> > which boils down to attribution.
>
> attribution and *ownership*. I think the key point of the policy is to
> make the actual engineer signing the DCO the responsible one for
> generating, testing and validating the code. It is strongly trying to
> suggest that vibe-coded slop isn't wanted.
>

But the DCO is correct here. If I take public domain code, and hack
it I can still legitimately do a SOB.

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

So if I use LLM, and change it even a little, it's created in part by me.
And
if it's public domain, I have the right to submit it under any license I
like.
And the parts I created, I absolutely have the right to copyright and
contribute
under any terms I like. There's no new ground here.

> I still have concerns about the quality of the code and the
> "understanding" these models have. They can generate very convincing
> rationales for their decisions but they also are prone to being
> over-verbose and over-complicating the solutions. They have a tendency
> to chase down rabbit holes in the code and get lost while making wilder
> and more invasive changes to try and get things working.
>

Yes. One of the reasons that submitters need to explain (or be able to)
every line and justify it in a debate in the context of the larger project.
Though that's no different than today: we get submissions of varying
quality from people that have varing degrees of competence. The code
review process is supposed to set a minimum floor for code quality.
LLMs are no different: the originator has to be able to justify and explain
things here.

> That said for personal scripts or random experiments the ability to
> quickly get to a PoC is pretty great.
>
> I think there is also scope for using LLMs for things that aren't
> directly writing code:
>
>   - code review
>   - investigation
>   - generating test cases
>   - polishing documentation
>
> and I wonder if we should spend some more time investigating the
> performance and pitfalls of LLMs before we open the flood gates to the
> code.
>

I wouldn't open the floodgates. I would however expect the policy to
understand
that llm assist in generating code produces results that meet the minimum
quality expectations. But also understand that these tools can be a firehose
of information that's hard to filter. The problem with LLMs has always been
one
of verification. It takes a lot of time to know if they are right. Often
times a lot more
time than the traditional submission because LLM generated pull requests
that I've
seen in FreeBSD tend to be super verbose, with all kinds of irrelevant
detail
thrown in. And yet, the underlying changes are at least "close enough to
review".
We're struggling in that sister open source project on how to cope,
honestly,
and caution is likely called for, but bans when there's a sliding scale of
LLM
use likely aren't.

In my bsd-user upstreaming, Claude has been great at code review, and at
suggesting changes. I often do a change and then ask claude how it would
fix the issue. Quite often they are the same thing. And claude is good about
reviewing my fix for the issue. I'm sure, though, it's missing a lot of
bigger
picture things, but that's what I'm for.

So maybe a good middle ground might be to allow claude for things that
are low risk:
- Things sed or coccinelle  can do
- Minor bug fixes with human written commit messages
- Minor feature tweaks (say < 200 lines)
- All things test an CI (well, maybe not that wide, but much wider in the
CI space)
- Generation of tools that build the system, though with extra vetting
- other grunt tasks (like my upstreaming stuff, but I'm sure there's other
areas that don't involve generation of large amounts of creative works).
Coupled with a strong requirement for quality and standing behind the
patch. Maybe with extra scrutiny in the reviews (though, the reviews
I've gotten for bsd-user, while quite useful, have been tougher than
I've seen in many other places).

I like Linux's rules, generally, though we can have the door less open.
It's one reason I added the Assisted-by: claude lines in my bsd-user
reivews. Claude did the grunt work of git blame and slicing and dicing
the patches (which it got mostly right, after feedback, I have some
work to do to re-slice a few things). It also reviewed and I fixed
several real issues, as well as a bunch of "logical" issues where
I used host instaed of target things or vice versa.

Warner

> --
> Alex Bennée
> Virtualisation Tech Lead @ Linaro
>
>

[-- Attachment #2: Type: text/html, Size: 10948 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-24 18:26   ` Warner Losh
@ 2026-05-24 20:04     ` Michael S. Tsirkin
  0 siblings, 0 replies; 59+ messages in thread
From: Michael S. Tsirkin @ 2026-05-24 20:04 UTC (permalink / raw)
  To: Warner Losh; +Cc: Alex Bennée, qemu-devel, stefanha

On Sun, May 24, 2026 at 12:26:43PM -0600, Warner Losh wrote:
> So maybe a good middle ground might be to allow claude for things that
> are low risk:
> - Things sed or coccinelle  can do
> - Minor bug fixes with human written commit messages
> - Minor feature tweaks (say < 200 lines)

As far as I am concerned, if it's a reasonably split patchset of
multiple patches < 200 lines each, it the same.

-- 
MST



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-24 17:06 ` Alex Bennée
  2026-05-24 17:42   ` Michael S. Tsirkin
  2026-05-24 18:26   ` Warner Losh
@ 2026-05-24 20:11   ` Michael S. Tsirkin
  2026-05-24 20:44     ` Stefan Hajnoczi
  2 siblings, 1 reply; 59+ messages in thread
From: Michael S. Tsirkin @ 2026-05-24 20:11 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, stefanha

On Sun, May 24, 2026 at 06:06:46PM +0100, Alex Bennée wrote:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> 
> > So, I had to reject a perfectly reasonable patch:
> > https://lore.kernel.org/qemu-devel/20260320193746.242704-1-jinpu.wang@ionos.com/
> > just because of a tool used to make it.
> >
> >
> > 	How contributors could comply with DCO terms (b) or (c) for the output of AI
> > 	content generators commonly available today is unclear.  The QEMU project is
> > 	not willing or able to accept the legal risks of non-compliance.
> 
> In the linked case the LLM is basically doing a glorified search and
> replace. There seems to be no danger of accidentally regurgitating any
> training data which is where the worry about inadvertent copyright
> infringement comes from.

Does this mean I can merge it, in your view?

-- 
MST



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-24 20:11   ` Michael S. Tsirkin
@ 2026-05-24 20:44     ` Stefan Hajnoczi
  2026-05-25 15:27       ` Stefan Hajnoczi
  0 siblings, 1 reply; 59+ messages in thread
From: Stefan Hajnoczi @ 2026-05-24 20:44 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Alex Bennée, qemu-devel, stefanha

On Sun, May 24, 2026 at 4:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Sun, May 24, 2026 at 06:06:46PM +0100, Alex Bennée wrote:
> > "Michael S. Tsirkin" <mst@redhat.com> writes:
> >
> > > So, I had to reject a perfectly reasonable patch:
> > > https://lore.kernel.org/qemu-devel/20260320193746.242704-1-jinpu.wang@ionos.com/
> > > just because of a tool used to make it.
> > >
> > >
> > >     How contributors could comply with DCO terms (b) or (c) for the output of AI
> > >     content generators commonly available today is unclear.  The QEMU project is
> > >     not willing or able to accept the legal risks of non-compliance.
> >
> > In the linked case the LLM is basically doing a glorified search and
> > replace. There seems to be no danger of accidentally regurgitating any
> > training data which is where the worry about inadvertent copyright
> > infringement comes from.
>
> Does this mean I can merge it, in your view?

It would be a good time to revisit the AI policy. From the QEMU Summit
2026 minutes:

"- We plan to solicit feedback in spring next year on how the policy has
  worked out in practice."
(https://lore.kernel.org/qemu-devel/CAFEAcA-OmqRTqwYZ2WCeqFu=zxG65t6WSfKR=NthfpazrjzpzA@mail.gmail.com/)

That hasn't happened yet and it's almost summer, so now is a good time
to have that discussion.

The policy was written with the option of adding exceptions (see the
Exceptions section at the bottom of docs/devel/code-provenance.rst).
That is one place where it could be extended.

Another option is to say that the situation has changed since the
policy was written and to replace it with something that allows a
broader range of AI-generated content instead of just specific
exceptions.

Here is Software Freedom Conservancy's most recent blog post about
AI-generated content:
https://sfconservancy.org/blog/2026/apr/15/eternal-november-generative-ai-llm/

Stefan


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-24 20:44     ` Stefan Hajnoczi
@ 2026-05-25 15:27       ` Stefan Hajnoczi
  0 siblings, 0 replies; 59+ messages in thread
From: Stefan Hajnoczi @ 2026-05-25 15:27 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Michael S. Tsirkin, Alex Bennée, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1303 bytes --]

On Sun, May 24, 2026 at 04:44:41PM -0400, Stefan Hajnoczi wrote:
> On Sun, May 24, 2026 at 4:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Sun, May 24, 2026 at 06:06:46PM +0100, Alex Bennée wrote:
> > > "Michael S. Tsirkin" <mst@redhat.com> writes:
> > >
> > > > So, I had to reject a perfectly reasonable patch:
> > > > https://lore.kernel.org/qemu-devel/20260320193746.242704-1-jinpu.wang@ionos.com/
> > > > just because of a tool used to make it.
> > > >
> > > >
> > > >     How contributors could comply with DCO terms (b) or (c) for the output of AI
> > > >     content generators commonly available today is unclear.  The QEMU project is
> > > >     not willing or able to accept the legal risks of non-compliance.
> > >
> > > In the linked case the LLM is basically doing a glorified search and
> > > replace. There seems to be no danger of accidentally regurgitating any
> > > training data which is where the worry about inadvertent copyright
> > > infringement comes from.
> >
> > Does this mean I can merge it, in your view?
> 
> It would be a good time to revisit the AI policy. From the QEMU Summit
> 2026 minutes:

Oops, "2026" should have been "2025".

I think the policy should be updated if we're going to depart from the
policy.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-24 12:42 on ai generated and code provenance Michael S. Tsirkin
  2026-05-24 17:06 ` Alex Bennée
@ 2026-05-25 16:32 ` Paolo Bonzini
  2026-05-25 17:15   ` Warner Losh
  2026-05-26  8:23   ` Peter Maydell
  2026-05-26 17:43 ` Kevin Wolf
  2 siblings, 2 replies; 59+ messages in thread
From: Paolo Bonzini @ 2026-05-25 16:32 UTC (permalink / raw)
  To: Michael S. Tsirkin, qemu-devel; +Cc: stefanha

On 5/24/26 14:42, Michael S. Tsirkin wrote:
> 	How contributors could comply with DCO terms (b) or (c) for the output of AI
> 	content generators commonly available today is unclear.  The QEMU project is
> 	not willing or able to accept the legal risks of non-compliance.
> 
> But, since this was written, Red Hat's Richard Fontana and Chris Wright
> published this piece:
> https://www.redhat.com/en/blog/ai-assisted-development-and-open-source-navigating-legal-issues
> 
> Saying, in particular
> 	We understand this concern, but the DCO has never
> 	been interpreted to require that every line of a contribution must be
> 	the personal creative expression of the contributor or another human
> 	developer.
This is not the objection or the worry; rather the question is, what if 
the contribution is a creative expression of someone that could claim 
copyright in it.  In fact, looking at the Linux policy...

   Signed-off-by and Developer Certificate of Origin
   =================================================

   AI agents MUST NOT add Signed-off-by tags. Only humans can legally
   certify the Developer Certificate of Origin (DCO). The human submitter
   is responsible for:

   * Reviewing all AI-generated code
   * Ensuring compliance with licensing requirements
   * Adding their own Signed-off-by tag to certify the DCO
   * Taking full responsibility for the contribution

... the question is how humans can actually do the second step.  The 
piece you posted above says: "with disclosure and human attentiveness – 
and oversight – aided where possible by tools that check for code 
similarity, AI-assisted contributions can be entirely compatible with 
the spirit of the DCO".

This is not encouraging, in my opinion, because it leaves a lot of the 
mechanics undefined.  A while ago I suggested that in some scenarios 
this could actually be done[1][2]; another possible case is localized 
bugfixes (say, below 20 lines of code).  For more general contributions 
however, the role of maintainers is not clear.  Would we require to 
"check for code similarity"?  I sure don't want to open that can of worms.

> I propose adopting linux's rules instead:
> https://docs.kernel.org/process/coding-assistants.html

Replacing QEMU's policy with Linux's would be orthogonal to the topic of 
the DCO.  Maintainers would still have the option of rejecting 
AI-assisted patches if they don't believe they can apply their own sign-off.

Other projects have taken similar "no AI" policies for different 
reasons.  Zig has one because they believe AI code would make it harder 
to retain contributors[3][4]; Rust is working on one that is fairly 
restrictive[5] (discussion at [6]) and requires previous communications 
with reviewers about *any* generated PRs[7].  Personally I think QEMU's 
policy is fine but we should start introducing exceptions, possibly 
including large contributions with pre-authorization (but not 
pre-approval) from the maintainer.

Paolo

[1] 
https://lore.kernel.org/qemu-devel/20250925075630.352720-1-pbonzini@redhat.com
[2] 
https://lore.kernel.org/qemu-devel/20251008063546.376603-1-pbonzini@redhat.com/raw
[3] https://ziglang.org/code-of-conduct/
[4] 
https://ziggit.dev/t/bun-s-zig-fork-got-4x-faster-compilation-times/15183/19
[5] 
https://github.com/jyn514/rust-forge/blob/llm-policy/src/policies/llm-usage.md
[6] https://github.com/rust-lang/rust-forge/pull/1040
[7] 
https://github.com/jyn514/rust-forge/blob/llm-policy/src/policies/llm-usage.md#experiment-llm-created-code-changes

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-25 16:32 ` Paolo Bonzini
@ 2026-05-25 17:15   ` Warner Losh
  2026-05-25 19:44     ` Stefan Hajnoczi
                       ` (2 more replies)
  2026-05-26  8:23   ` Peter Maydell
  1 sibling, 3 replies; 59+ messages in thread
From: Warner Losh @ 2026-05-25 17:15 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Michael S. Tsirkin, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 5635 bytes --]

On Mon, May 25, 2026 at 10:34 AM Paolo Bonzini <pbonzini@redhat.com> wrote:

> On 5/24/26 14:42, Michael S. Tsirkin wrote:
> >       How contributors could comply with DCO terms (b) or (c) for the
> output of AI
> >       content generators commonly available today is unclear.  The QEMU
> project is
> >       not willing or able to accept the legal risks of non-compliance.
> >
> > But, since this was written, Red Hat's Richard Fontana and Chris Wright
> > published this piece:
> >
> https://www.redhat.com/en/blog/ai-assisted-development-and-open-source-navigating-legal-issues
> >
> > Saying, in particular
> >       We understand this concern, but the DCO has never
> >       been interpreted to require that every line of a contribution must
> be
> >       the personal creative expression of the contributor or another
> human
> >       developer.
> This is not the objection or the worry; rather the question is, what if
> the contribution is a creative expression of someone that could claim
> copyright in it.  In fact, looking at the Linux policy...
>
>    Signed-off-by and Developer Certificate of Origin
>    =================================================
>
>    AI agents MUST NOT add Signed-off-by tags. Only humans can legally
>    certify the Developer Certificate of Origin (DCO). The human submitter
>    is responsible for:
>
>    * Reviewing all AI-generated code
>    * Ensuring compliance with licensing requirements
>    * Adding their own Signed-off-by tag to certify the DCO
>    * Taking full responsibility for the contribution
>
> ... the question is how humans can actually do the second step.  The
> piece you posted above says: "with disclosure and human attentiveness –
> and oversight – aided where possible by tools that check for code
> similarity, AI-assisted contributions can be entirely compatible with
> the spirit of the DCO".
>

The code produced by AI agents has no copyright. You can incorporate
public domain code into your work and have the absolute right to license
it (see all the Diseny movies). The notion that LLMs wholesale copy
originates
from the earliest days of Copilot and turned out were contrived. No recent
evidence shows that plagiarism is a concern. To the extent that I modify
public domain code, I have a copyright that I can choose to license
however I want (and the SOB says it's compatible).

So I'm struggling to understand the hesitation here. Is it the uncertainty
around the copyright? Around the copying issue? Something else?
We already have some level of risk around these issues with human
coders: We have to take their word for it that they didn't copy, and if
they did, the project is still on the hook to remedy the situation if the
real rights holders show up.... There's always risk when submissions
are accepted from the general public. Also, I've softened this paragraph
several times, and it still comes across as more confrontational than I
intend.
I'm trying to understand.


> This is not encouraging, in my opinion, because it leaves a lot of the
> mechanics undefined.  A while ago I suggested that in some scenarios
> this could actually be done[1][2]; another possible case is localized
> bugfixes (say, below 20 lines of code).  For more general contributions
> however, the role of maintainers is not clear.  Would we require to
> "check for code similarity"?  I sure don't want to open that can of worms.
>
> > I propose adopting linux's rules instead:
> > https://docs.kernel.org/process/coding-assistants.html
>
> Replacing QEMU's policy with Linux's would be orthogonal to the topic of
> the DCO.  Maintainers would still have the option of rejecting
> AI-assisted patches if they don't believe they can apply their own
> sign-off.
>
> Other projects have taken similar "no AI" policies for different
> reasons.  Zig has one because they believe AI code would make it harder
> to retain contributors[3][4]; Rust is working on one that is fairly
> restrictive[5] (discussion at [6]) and requires previous communications
> with reviewers about *any* generated PRs[7].  Personally I think QEMU's
> policy is fine but we should start introducing exceptions, possibly
> including large contributions with pre-authorization (but not
> pre-approval) from the maintainer.
>

I agree with the thrust of the proposals you've submitted: AI is a tool, and
there are many ways to use it safely.

AI's primary issue is verification. The folks being flooded can't verify the
good from the bad in the flood and tend to have a knee jerk reaction to
protect themselves: ban it.

Unstated issue: How do we help people that want to contribute grow their
skills using AI so they make submissions whose quality if good enough to
be worth our time to verify and review. It's an industry wide problem, along
with how do junior engineers become senior in a world of AI doing the grunt
work they used to learn from.

Warner


> Paolo
>
> [1]
>
> https://lore.kernel.org/qemu-devel/20250925075630.352720-1-pbonzini@redhat.com
> [2]
>
> https://lore.kernel.org/qemu-devel/20251008063546.376603-1-pbonzini@redhat.com/raw
> [3] https://ziglang.org/code-of-conduct/
> [4]
>
> https://ziggit.dev/t/bun-s-zig-fork-got-4x-faster-compilation-times/15183/19
> [5]
>
> https://github.com/jyn514/rust-forge/blob/llm-policy/src/policies/llm-usage.md
> [6] https://github.com/rust-lang/rust-forge/pull/1040
> [7]
>
> https://github.com/jyn514/rust-forge/blob/llm-policy/src/policies/llm-usage.md#experiment-llm-created-code-changes
>
>
>

[-- Attachment #2: Type: text/html, Size: 8000 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-25 17:15   ` Warner Losh
@ 2026-05-25 19:44     ` Stefan Hajnoczi
  2026-05-25 22:36       ` Michael S. Tsirkin
  2026-05-25 19:56     ` Paolo Bonzini
  2026-05-26 21:48     ` Philippe Mathieu-Daudé
  2 siblings, 1 reply; 59+ messages in thread
From: Stefan Hajnoczi @ 2026-05-25 19:44 UTC (permalink / raw)
  To: Warner Losh; +Cc: Paolo Bonzini, Michael S. Tsirkin, qemu-devel, stefanha

On Mon, May 25, 2026 at 1:17 PM Warner Losh <imp@bsdimp.com> wrote:
> On Mon, May 25, 2026 at 10:34 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>> On 5/24/26 14:42, Michael S. Tsirkin wrote:
>> >       How contributors could comply with DCO terms (b) or (c) for the output of AI
>> >       content generators commonly available today is unclear.  The QEMU project is
>> >       not willing or able to accept the legal risks of non-compliance.
>> >
>> > But, since this was written, Red Hat's Richard Fontana and Chris Wright
>> > published this piece:
>> > https://www.redhat.com/en/blog/ai-assisted-development-and-open-source-navigating-legal-issues
>> >
>> > Saying, in particular
>> >       We understand this concern, but the DCO has never
>> >       been interpreted to require that every line of a contribution must be
>> >       the personal creative expression of the contributor or another human
>> >       developer.
>> This is not the objection or the worry; rather the question is, what if
>> the contribution is a creative expression of someone that could claim
>> copyright in it.  In fact, looking at the Linux policy...
>>
>>    Signed-off-by and Developer Certificate of Origin
>>    =================================================
>>
>>    AI agents MUST NOT add Signed-off-by tags. Only humans can legally
>>    certify the Developer Certificate of Origin (DCO). The human submitter
>>    is responsible for:
>>
>>    * Reviewing all AI-generated code
>>    * Ensuring compliance with licensing requirements
>>    * Adding their own Signed-off-by tag to certify the DCO
>>    * Taking full responsibility for the contribution
>>
>> ... the question is how humans can actually do the second step.  The
>> piece you posted above says: "with disclosure and human attentiveness –
>> and oversight – aided where possible by tools that check for code
>> similarity, AI-assisted contributions can be entirely compatible with
>> the spirit of the DCO".
>
>
> The code produced by AI agents has no copyright. You can incorporate
> public domain code into your work and have the absolute right to license
> it (see all the Diseny movies). The notion that LLMs wholesale copy originates
> from the earliest days of Copilot and turned out were contrived. No recent
> evidence shows that plagiarism is a concern. To the extent that I modify
> public domain code, I have a copyright that I can choose to license
> however I want (and the SOB says it's compatible).

There is an active field of research on memorization and the status is
that LLMs do memorize. A paper from 2026
(https://arxiv.org/pdf/2601.02671) shows that production models can
output significant chunks of Harry Potter, although the research
deliberately extracts training inputs rather than doing so
accidentally. I am sharing this because I don't think it's correct to
say that concerns about models outputting copyrighted code are
outdated.

I do think that the risk for coding use cases is low as long as LLMs
are used sensibly. If not, legal cases would have popped up by now.

The example of ext4 for OpenBSD (https://lwn.net/Articles/1064541/)
comes to mind as a case where LLMs were used in a risky way and
maintainers decided to reject the code. Even though the output of AI
has no copyright, when there is no suitably-licensed information to
generate the code from, then it is risky to assume AI generated code
is free from copyright, license, patent, etc effects.

As long as we keep the usual practices around intellectual property in
mind when merging code, then I think the risk of copyright issues is
low and not a blocker for accepting AI generated contributions.

Stefan


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-25 19:44     ` Stefan Hajnoczi
@ 2026-05-25 22:36       ` Michael S. Tsirkin
  2026-05-26 13:16         ` Stefan Hajnoczi
  0 siblings, 1 reply; 59+ messages in thread
From: Michael S. Tsirkin @ 2026-05-25 22:36 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Warner Losh, Paolo Bonzini, qemu-devel, stefanha

On Mon, May 25, 2026 at 03:44:02PM -0400, Stefan Hajnoczi wrote:
> On Mon, May 25, 2026 at 1:17 PM Warner Losh <imp@bsdimp.com> wrote:
> > On Mon, May 25, 2026 at 10:34 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
> >> On 5/24/26 14:42, Michael S. Tsirkin wrote:
> >> >       How contributors could comply with DCO terms (b) or (c) for the output of AI
> >> >       content generators commonly available today is unclear.  The QEMU project is
> >> >       not willing or able to accept the legal risks of non-compliance.
> >> >
> >> > But, since this was written, Red Hat's Richard Fontana and Chris Wright
> >> > published this piece:
> >> > https://www.redhat.com/en/blog/ai-assisted-development-and-open-source-navigating-legal-issues
> >> >
> >> > Saying, in particular
> >> >       We understand this concern, but the DCO has never
> >> >       been interpreted to require that every line of a contribution must be
> >> >       the personal creative expression of the contributor or another human
> >> >       developer.
> >> This is not the objection or the worry; rather the question is, what if
> >> the contribution is a creative expression of someone that could claim
> >> copyright in it.  In fact, looking at the Linux policy...
> >>
> >>    Signed-off-by and Developer Certificate of Origin
> >>    =================================================
> >>
> >>    AI agents MUST NOT add Signed-off-by tags. Only humans can legally
> >>    certify the Developer Certificate of Origin (DCO). The human submitter
> >>    is responsible for:
> >>
> >>    * Reviewing all AI-generated code
> >>    * Ensuring compliance with licensing requirements
> >>    * Adding their own Signed-off-by tag to certify the DCO
> >>    * Taking full responsibility for the contribution
> >>
> >> ... the question is how humans can actually do the second step.  The
> >> piece you posted above says: "with disclosure and human attentiveness –
> >> and oversight – aided where possible by tools that check for code
> >> similarity, AI-assisted contributions can be entirely compatible with
> >> the spirit of the DCO".
> >
> >
> > The code produced by AI agents has no copyright. You can incorporate
> > public domain code into your work and have the absolute right to license
> > it (see all the Diseny movies). The notion that LLMs wholesale copy originates
> > from the earliest days of Copilot and turned out were contrived. No recent
> > evidence shows that plagiarism is a concern. To the extent that I modify
> > public domain code, I have a copyright that I can choose to license
> > however I want (and the SOB says it's compatible).
> 
> There is an active field of research on memorization and the status is
> that LLMs do memorize. A paper from 2026
> (https://arxiv.org/pdf/2601.02671) shows that production models can
> output significant chunks of Harry Potter, although the research
> deliberately extracts training inputs rather than doing so
> accidentally. I am sharing this because I don't think it's correct to
> say that concerns about models outputting copyrighted code are
> outdated.

But the concern is with them doing it *accidentally*.
Because willful infringement was always possible.
And that does not seem to be happening.


> I do think that the risk for coding use cases is low as long as LLMs
> are used sensibly. If not, legal cases would have popped up by now.
> 
> The example of ext4 for OpenBSD (https://lwn.net/Articles/1064541/)
> comes to mind as a case where LLMs were used in a risky way and
> maintainers decided to reject the code. Even though the output of AI
> has no copyright, when there is no suitably-licensed information to
> generate the code from, then it is risky to assume AI generated code
> is free from copyright, license, patent, etc effects.
> 
> As long as we keep the usual practices around intellectual property in
> mind when merging code, then I think the risk of copyright issues is
> low and not a blocker for accepting AI generated contributions.
> 
> Stefan



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-25 22:36       ` Michael S. Tsirkin
@ 2026-05-26 13:16         ` Stefan Hajnoczi
  0 siblings, 0 replies; 59+ messages in thread
From: Stefan Hajnoczi @ 2026-05-26 13:16 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Warner Losh, Paolo Bonzini, qemu-devel, stefanha

On Mon, May 25, 2026 at 6:36 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> On Mon, May 25, 2026 at 03:44:02PM -0400, Stefan Hajnoczi wrote:
> > On Mon, May 25, 2026 at 1:17 PM Warner Losh <imp@bsdimp.com> wrote:
> > > On Mon, May 25, 2026 at 10:34 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
> > >> On 5/24/26 14:42, Michael S. Tsirkin wrote:
> > >> >       How contributors could comply with DCO terms (b) or (c) for the output of AI
> > >> >       content generators commonly available today is unclear.  The QEMU project is
> > >> >       not willing or able to accept the legal risks of non-compliance.
> > >> >
> > >> > But, since this was written, Red Hat's Richard Fontana and Chris Wright
> > >> > published this piece:
> > >> > https://www.redhat.com/en/blog/ai-assisted-development-and-open-source-navigating-legal-issues
> > >> >
> > >> > Saying, in particular
> > >> >       We understand this concern, but the DCO has never
> > >> >       been interpreted to require that every line of a contribution must be
> > >> >       the personal creative expression of the contributor or another human
> > >> >       developer.
> > >> This is not the objection or the worry; rather the question is, what if
> > >> the contribution is a creative expression of someone that could claim
> > >> copyright in it.  In fact, looking at the Linux policy...
> > >>
> > >>    Signed-off-by and Developer Certificate of Origin
> > >>    =================================================
> > >>
> > >>    AI agents MUST NOT add Signed-off-by tags. Only humans can legally
> > >>    certify the Developer Certificate of Origin (DCO). The human submitter
> > >>    is responsible for:
> > >>
> > >>    * Reviewing all AI-generated code
> > >>    * Ensuring compliance with licensing requirements
> > >>    * Adding their own Signed-off-by tag to certify the DCO
> > >>    * Taking full responsibility for the contribution
> > >>
> > >> ... the question is how humans can actually do the second step.  The
> > >> piece you posted above says: "with disclosure and human attentiveness –
> > >> and oversight – aided where possible by tools that check for code
> > >> similarity, AI-assisted contributions can be entirely compatible with
> > >> the spirit of the DCO".
> > >
> > >
> > > The code produced by AI agents has no copyright. You can incorporate
> > > public domain code into your work and have the absolute right to license
> > > it (see all the Diseny movies). The notion that LLMs wholesale copy originates
> > > from the earliest days of Copilot and turned out were contrived. No recent
> > > evidence shows that plagiarism is a concern. To the extent that I modify
> > > public domain code, I have a copyright that I can choose to license
> > > however I want (and the SOB says it's compatible).
> >
> > There is an active field of research on memorization and the status is
> > that LLMs do memorize. A paper from 2026
> > (https://arxiv.org/pdf/2601.02671) shows that production models can
> > output significant chunks of Harry Potter, although the research
> > deliberately extracts training inputs rather than doing so
> > accidentally. I am sharing this because I don't think it's correct to
> > say that concerns about models outputting copyrighted code are
> > outdated.
>
> But the concern is with them doing it *accidentally*.
> Because willful infringement was always possible.
> And that does not seem to be happening.

I agree. The chance of accidental copyright violations is too small to
ban AI usage in my opinion...

> > I do think that the risk for coding use cases is low as long as LLMs
> > are used sensibly. If not, legal cases would have popped up by now.
> >
> > The example of ext4 for OpenBSD (https://lwn.net/Articles/1064541/)
> > comes to mind as a case where LLMs were used in a risky way and
> > maintainers decided to reject the code. Even though the output of AI
> > has no copyright, when there is no suitably-licensed information to
> > generate the code from, then it is risky to assume AI generated code
> > is free from copyright, license, patent, etc effects.

...but here is a realistic example of where it might make sense to
reject an AI-generated contribution.

My point is that maintainers still need to consider whether
contributions are risky and in some cases it's easier to do something
reckless with AI because it may not feel like you are exposing
yourself to licensing issues when the AI generates the code for you.

Stefan


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-25 17:15   ` Warner Losh
  2026-05-25 19:44     ` Stefan Hajnoczi
@ 2026-05-25 19:56     ` Paolo Bonzini
  2026-05-26 21:48     ` Philippe Mathieu-Daudé
  2 siblings, 0 replies; 59+ messages in thread
From: Paolo Bonzini @ 2026-05-25 19:56 UTC (permalink / raw)
  To: Warner Losh; +Cc: Michael S. Tsirkin, qemu-devel, Hajnoczi, Stefan

[-- Attachment #1: Type: text/plain, Size: 1515 bytes --]

Il lun 25 mag 2026, 19:15 Warner Losh <imp@bsdimp.com> ha scritto:

> The code produced by AI agents has no copyright.
>

This is not entirely true. As models improve their capability to generate,
they also improve their ability to recall exactly. Stefan gave more
information.

The ability to search and reuse code found on the internet could also be a
problem. In that case the code is not produced by AI.

While this is *generally speaking* not an issue, it can be in specific
cases.
https://www.devclass.com/ai-ml/2025/11/27/ocaml-maintainers-reject-massive-ai-generated-pull-request/1728083
is only about six months old.

Also, I've softened this paragraph
> several times, and it still comes across as more confrontational than I
> intend.
>

No problem at all!

AI's primary issue is verification. The folks being flooded can't verify
> the good from the bad in the flood and tend to have a knee jerk reaction to
> protect themselves: ban it.
>

Being cautious and open minded at the same time is a good way to react, IMO.

Unstated issue: How do we help people that want to contribute grow their
> skills using AI so they make submissions whose quality if good enough to
> be worth our time to verify and review. It's an industry wide problem,
> along
> with how do junior engineers become senior in a world of AI doing the grunt
> work they used to learn from.
>

This is not our problem to solve. What we can do is participate to outreach
activities for students, such as Google Summer of Code.

Paolo

>

[-- Attachment #2: Type: text/html, Size: 3435 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-25 17:15   ` Warner Losh
  2026-05-25 19:44     ` Stefan Hajnoczi
  2026-05-25 19:56     ` Paolo Bonzini
@ 2026-05-26 21:48     ` Philippe Mathieu-Daudé
  2 siblings, 0 replies; 59+ messages in thread
From: Philippe Mathieu-Daudé @ 2026-05-26 21:48 UTC (permalink / raw)
  To: Warner Losh, Paolo Bonzini; +Cc: Michael S. Tsirkin, qemu-devel, stefanha

On 25/5/26 19:15, Warner Losh wrote:

> AI's primary issue is verification. The folks being flooded can't verify the
> good from the bad in the flood and tend to have a knee jerk reaction to
> protect themselves: ban it.
> 
> Unstated issue: How do we help people that want to contribute grow their
> skills using AI so they make submissions whose quality if good enough to
> be worth our time to verify and review. It's an industry wide problem, along
> with how do junior engineers become senior in a world of AI doing the grunt
> work they used to learn from.

While it seems easier to start contributing with new code rather than
contributing reviewing code, I strongly suggest junier engineers to
start reviewing before posting patches. That would help to unnarrow
the maintainer funnel problem. AI could help them there too. But
maybe I'm opening another can of worms by suggesting that direction.

> Warner


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-25 16:32 ` Paolo Bonzini
  2026-05-25 17:15   ` Warner Losh
@ 2026-05-26  8:23   ` Peter Maydell
  2026-05-26  9:28     ` Alex Bennée
                       ` (2 more replies)
  1 sibling, 3 replies; 59+ messages in thread
From: Peter Maydell @ 2026-05-26  8:23 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Michael S. Tsirkin, qemu-devel, stefanha

On Mon, 25 May 2026 at 17:33, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 5/24/26 14:42, Michael S. Tsirkin wrote:
> > I propose adopting linux's rules instead:
> > https://docs.kernel.org/process/coding-assistants.html
>
> Replacing QEMU's policy with Linux's would be orthogonal to the topic of
> the DCO.  Maintainers would still have the option of rejecting
> AI-assisted patches if they don't believe they can apply their own sign-off.
>
> Other projects have taken similar "no AI" policies for different
> reasons.  Zig has one because they believe AI code would make it harder
> to retain contributors[3][4]; Rust is working on one that is fairly
> restrictive[5] (discussion at [6]) and requires previous communications
> with reviewers about *any* generated PRs[7].  Personally I think QEMU's
> policy is fine but we should start introducing exceptions, possibly
> including large contributions with pre-authorization (but not
> pre-approval) from the maintainer.

If we revisit our AI policy (which we should, I think, in the sense
that it's been a while and the situation has changed), I want to
note that although our current policy essentially says "no, because
we don't want the legal risks", that doesn't imply that "if we
judge now that the legal risks are acceptable, that was the only
blocker and so we are now open to AI contributions of all sorts".
While we were essentially in the "blanket ban" state anyway, there was
no particular need to have the discussion about other reasons we might
also want to be restrictive or cautious about AI contributions, but
those other reasons and viewpoints don't go away automatically with
the legal one.

I have quite a lot of sympathy with the rationale behind the
Zig policy, for instance:
 https://kristoff.it/blog/contributor-poker-and-ai/
I spend quite a lot of time reviewing patches for things which
are features I don't necessarily personally care about. I'm
happy with doing that for other people who are hopefully
learning and gaining something from the process; I'm much
less interested in reviewing a mountain of LLM-generated patches.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-26  8:23   ` Peter Maydell
@ 2026-05-26  9:28     ` Alex Bennée
  2026-05-26  9:57     ` Paolo Bonzini
  2026-05-27  7:11     ` Philippe Mathieu-Daudé
  2 siblings, 0 replies; 59+ messages in thread
From: Alex Bennée @ 2026-05-26  9:28 UTC (permalink / raw)
  To: Peter Maydell; +Cc: Paolo Bonzini, Michael S. Tsirkin, qemu-devel, stefanha

Peter Maydell <peter.maydell@linaro.org> writes:

> On Mon, 25 May 2026 at 17:33, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> On 5/24/26 14:42, Michael S. Tsirkin wrote:
>> > I propose adopting linux's rules instead:
>> > https://docs.kernel.org/process/coding-assistants.html
>>
>> Replacing QEMU's policy with Linux's would be orthogonal to the topic of
>> the DCO.  Maintainers would still have the option of rejecting
>> AI-assisted patches if they don't believe they can apply their own sign-off.
>>
>> Other projects have taken similar "no AI" policies for different
>> reasons.  Zig has one because they believe AI code would make it harder
>> to retain contributors[3][4]; Rust is working on one that is fairly
>> restrictive[5] (discussion at [6]) and requires previous communications
>> with reviewers about *any* generated PRs[7].  Personally I think QEMU's
>> policy is fine but we should start introducing exceptions, possibly
>> including large contributions with pre-authorization (but not
>> pre-approval) from the maintainer.
>
> If we revisit our AI policy (which we should, I think, in the sense
> that it's been a while and the situation has changed), I want to
> note that although our current policy essentially says "no, because
> we don't want the legal risks", that doesn't imply that "if we
> judge now that the legal risks are acceptable, that was the only
> blocker and so we are now open to AI contributions of all sorts".

I think there are still potential legal risks but in the normal use case
they are pretty small. Prompts to re-factor QEMU code will likely be
fine because the LLM is acting as a fungible editor - if anyone prompted
"implement Rosetta's target code optimisation pass" we should be very
wary of accidental infringement.

> While we were essentially in the "blanket ban" state anyway, there was
> no particular need to have the discussion about other reasons we might
> also want to be restrictive or cautious about AI contributions, but
> those other reasons and viewpoints don't go away automatically with
> the legal one.
>
> I have quite a lot of sympathy with the rationale behind the
> Zig policy, for instance:
>  https://kristoff.it/blog/contributor-poker-and-ai/
> I spend quite a lot of time reviewing patches for things which
> are features I don't necessarily personally care about. I'm
> happy with doing that for other people who are hopefully
> learning and gaining something from the process; I'm much
> less interested in reviewing a mountain of LLM-generated patches.

I agree - I think we need to address the quality expectations expected
of series authored with the help of AI before we open the doors to even
a limited subset of exceptions. Otherwise I think we could see a similar
deluge of patches overloading reviewers the same way the issue tracker
has been recently. 

>
> thanks
> -- PMM

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-26  8:23   ` Peter Maydell
  2026-05-26  9:28     ` Alex Bennée
@ 2026-05-26  9:57     ` Paolo Bonzini
  2026-05-26 11:27       ` BALATON Zoltan
  2026-05-27  7:11     ` Philippe Mathieu-Daudé
  2 siblings, 1 reply; 59+ messages in thread
From: Paolo Bonzini @ 2026-05-26  9:57 UTC (permalink / raw)
  To: Peter Maydell; +Cc: Michael S. Tsirkin, qemu-devel, Hajnoczi, Stefan

[-- Attachment #1: Type: text/plain, Size: 1347 bytes --]

Il mar 26 mag 2026, 10:23 Peter Maydell <peter.maydell@linaro.org> ha
scritto:

> > Personally I think QEMU's
> > policy is fine but we should start introducing exceptions, possibly
> > including large contributions with pre-authorization (but not
> > pre-approval) from the maintainer.
>
> I want to note that [...] while we were essentially in the "blanket ban"
> state anyway, there was no particular need to have the discussion about
> other reasons we might also want to be restrictive or cautious about AI
> contributions, but those other reasons and viewpoints don't go away
> automatically with the legal one.
>
> I have quite a lot of sympathy with the rationale behind the Zig policy,
> for instance: https://kristoff.it/blog/contributor-poker-and-ai/ I spend
> quite a lot of time reviewing patches for things which are features I don't
> necessarily personally care about. I'm happy with doing that for other
> people who are hopefully learning and gaining something from the process;
> I'm much less interested in reviewing a mountain of LLM-generated patches.
>

I agree and that's a good argument for pre-discussion with the maintainers.
It would anyway be the right thing to do for large contributions, but it's
even more important with AI given the different balance between contributor
and reviewer.

Paolo


thanks
> -- PMM
>
>

[-- Attachment #2: Type: text/html, Size: 2197 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-26  9:57     ` Paolo Bonzini
@ 2026-05-26 11:27       ` BALATON Zoltan
  2026-05-26 12:30         ` Michael S. Tsirkin
  2026-05-26 13:22         ` Stefan Hajnoczi
  0 siblings, 2 replies; 59+ messages in thread
From: BALATON Zoltan @ 2026-05-26 11:27 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Maydell, Michael S. Tsirkin, qemu-devel, Hajnoczi, Stefan

On Tue, 26 May 2026, Paolo Bonzini wrote:
> Il mar 26 mag 2026, 10:23 Peter Maydell <peter.maydell@linaro.org> ha
> scritto:
>
>>> Personally I think QEMU's
>>> policy is fine but we should start introducing exceptions, possibly
>>> including large contributions with pre-authorization (but not
>>> pre-approval) from the maintainer.
>>
>> I want to note that [...] while we were essentially in the "blanket ban"
>> state anyway, there was no particular need to have the discussion about
>> other reasons we might also want to be restrictive or cautious about AI
>> contributions, but those other reasons and viewpoints don't go away
>> automatically with the legal one.
>>
>> I have quite a lot of sympathy with the rationale behind the Zig policy,
>> for instance: https://kristoff.it/blog/contributor-poker-and-ai/ I spend
>> quite a lot of time reviewing patches for things which are features I don't
>> necessarily personally care about. I'm happy with doing that for other
>> people who are hopefully learning and gaining something from the process;
>> I'm much less interested in reviewing a mountain of LLM-generated patches.
>>
>
> I agree and that's a good argument for pre-discussion with the maintainers.
> It would anyway be the right thing to do for large contributions, but it's
> even more important with AI given the different balance between contributor
> and reviewer.

I think the real problem is people who don't know what they are doing yet 
use an AI to generate a patch and submit it anyway. Reviewers are then 
flooded with nonsense that they have to look at to find out if there's 
anything useful in it which takes their time from doing more useful 
things. So the policy should make clear that we don't accept patches 
generated by AI that no human has read and understood before submission 
and adding a S-o-b should also mean (besides that the submitter made sure 
there's no copyright infringement) that that person has knowledge about 
the patch and is willing to correct it. Then reviewers can just bounce AI 
nonsense back to the conrtibutor or ignore it if they don't reply (or 
reply with more AI nonsense suggesting they don't know what the patch does 
so can't correct it). I think that's the real fear that has led to the AI 
ban and the copyright issues were just a convenient excuse. Maybe 
clarifying this in the policy could be done although there will always be 
people who ignore documents. So maybe what we want is no direct submission 
of AI generated patches without at least a human inbetween who has already 
reviewed tha patch before sending it to the list.

Regards,
BALATON Zoltan

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-26 11:27       ` BALATON Zoltan
@ 2026-05-26 12:30         ` Michael S. Tsirkin
  2026-05-26 12:37           ` Manos Pitsidianakis
  2026-05-26 13:22         ` Stefan Hajnoczi
  1 sibling, 1 reply; 59+ messages in thread
From: Michael S. Tsirkin @ 2026-05-26 12:30 UTC (permalink / raw)
  To: BALATON Zoltan; +Cc: Paolo Bonzini, Peter Maydell, qemu-devel, Hajnoczi, Stefan

On Tue, May 26, 2026 at 01:27:40PM +0200, BALATON Zoltan wrote:
> Maybe clarifying this in the policy could be
> done although there will always be people who ignore documents.

One advantage of the linux style tags is that at least they differ
from whatever ai's put in by default. So whoever does it can get
banned pretty quickly.



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-26 12:30         ` Michael S. Tsirkin
@ 2026-05-26 12:37           ` Manos Pitsidianakis
  2026-05-26 13:00             ` Michael S. Tsirkin
  0 siblings, 1 reply; 59+ messages in thread
From: Manos Pitsidianakis @ 2026-05-26 12:37 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: BALATON Zoltan, Paolo Bonzini, Peter Maydell, qemu-devel,
	Hajnoczi, Stefan

On Tue, May 26, 2026 at 3:31 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, May 26, 2026 at 01:27:40PM +0200, BALATON Zoltan wrote:
> > Maybe clarifying this in the policy could be
> > done although there will always be people who ignore documents.
>
> One advantage of the linux style tags is that at least they differ
> from whatever ai's put in by default. So whoever does it can get
> banned pretty quickly.
>

What would be the mechanism for that though? Getting the list
administrators involved to ban email addresses from the list?

If banning is to be a deterrent, the process and rules should be
codified in the docs so that it exists as a warning and there is
little room for abuse and ambiguity in both sides.

>

-- 
Manos Pitsidianakis
Emulation and Virtualization Engineer at Linaro Ltd


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-26 12:37           ` Manos Pitsidianakis
@ 2026-05-26 13:00             ` Michael S. Tsirkin
  0 siblings, 0 replies; 59+ messages in thread
From: Michael S. Tsirkin @ 2026-05-26 13:00 UTC (permalink / raw)
  To: Manos Pitsidianakis
  Cc: BALATON Zoltan, Paolo Bonzini, Peter Maydell, qemu-devel,
	Hajnoczi, Stefan

On Tue, May 26, 2026 at 03:37:50PM +0300, Manos Pitsidianakis wrote:
> On Tue, May 26, 2026 at 3:31 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Tue, May 26, 2026 at 01:27:40PM +0200, BALATON Zoltan wrote:
> > > Maybe clarifying this in the policy could be
> > > done although there will always be people who ignore documents.
> >
> > One advantage of the linux style tags is that at least they differ
> > from whatever ai's put in by default. So whoever does it can get
> > banned pretty quickly.
> >
> 
> What would be the mechanism for that though? Getting the list
> administrators involved to ban email addresses from the list?

maintainers learning to ignore patches from bad actors works well enough.

> If banning is to be a deterrent, the process and rules should be
> codified in the docs so that it exists as a warning and there is
> little room for abuse and ambiguity in both sides.
> 
> >
> 
> -- 
> Manos Pitsidianakis
> Emulation and Virtualization Engineer at Linaro Ltd



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-26 11:27       ` BALATON Zoltan
  2026-05-26 12:30         ` Michael S. Tsirkin
@ 2026-05-26 13:22         ` Stefan Hajnoczi
  2026-05-26 14:01           ` Warner Losh
  1 sibling, 1 reply; 59+ messages in thread
From: Stefan Hajnoczi @ 2026-05-26 13:22 UTC (permalink / raw)
  To: BALATON Zoltan
  Cc: Paolo Bonzini, Peter Maydell, Michael S. Tsirkin, qemu-devel,
	Hajnoczi, Stefan

On Tue, May 26, 2026 at 7:28 AM BALATON Zoltan <balaton@eik.bme.hu> wrote:
> On Tue, 26 May 2026, Paolo Bonzini wrote:
> > Il mar 26 mag 2026, 10:23 Peter Maydell <peter.maydell@linaro.org> ha
> > scritto:
> >
> >>> Personally I think QEMU's
> >>> policy is fine but we should start introducing exceptions, possibly
> >>> including large contributions with pre-authorization (but not
> >>> pre-approval) from the maintainer.
> >>
> >> I want to note that [...] while we were essentially in the "blanket ban"
> >> state anyway, there was no particular need to have the discussion about
> >> other reasons we might also want to be restrictive or cautious about AI
> >> contributions, but those other reasons and viewpoints don't go away
> >> automatically with the legal one.
> >>
> >> I have quite a lot of sympathy with the rationale behind the Zig policy,
> >> for instance: https://kristoff.it/blog/contributor-poker-and-ai/ I spend
> >> quite a lot of time reviewing patches for things which are features I don't
> >> necessarily personally care about. I'm happy with doing that for other
> >> people who are hopefully learning and gaining something from the process;
> >> I'm much less interested in reviewing a mountain of LLM-generated patches.
> >>
> >
> > I agree and that's a good argument for pre-discussion with the maintainers.
> > It would anyway be the right thing to do for large contributions, but it's
> > even more important with AI given the different balance between contributor
> > and reviewer.
>
> I think the real problem is people who don't know what they are doing yet
> use an AI to generate a patch and submit it anyway. Reviewers are then
> flooded with nonsense that they have to look at to find out if there's
> anything useful in it which takes their time from doing more useful
> things. So the policy should make clear that we don't accept patches
> generated by AI that no human has read and understood before submission
> and adding a S-o-b should also mean (besides that the submitter made sure
> there's no copyright infringement) that that person has knowledge about
> the patch and is willing to correct it. Then reviewers can just bounce AI
> nonsense back to the conrtibutor or ignore it if they don't reply (or
> reply with more AI nonsense suggesting they don't know what the patch does
> so can't correct it). I think that's the real fear that has led to the AI
> ban and the copyright issues were just a convenient excuse. Maybe
> clarifying this in the policy could be done although there will always be
> people who ignore documents. So maybe what we want is no direct submission
> of AI generated patches without at least a human inbetween who has already
> reviewed tha patch before sending it to the list.

That sounds reasonable.

Stefan


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-26 13:22         ` Stefan Hajnoczi
@ 2026-05-26 14:01           ` Warner Losh
  0 siblings, 0 replies; 59+ messages in thread
From: Warner Losh @ 2026-05-26 14:01 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: BALATON Zoltan, Paolo Bonzini, Peter Maydell, Michael S. Tsirkin,
	qemu-devel, Hajnoczi, Stefan

[-- Attachment #1: Type: text/plain, Size: 3174 bytes --]

On Tue, May 26, 2026, 7:24 AM Stefan Hajnoczi <stefanha@gmail.com> wrote:

> On Tue, May 26, 2026 at 7:28 AM BALATON Zoltan <balaton@eik.bme.hu> wrote:
> > On Tue, 26 May 2026, Paolo Bonzini wrote:
> > > Il mar 26 mag 2026, 10:23 Peter Maydell <peter.maydell@linaro.org> ha
> > > scritto:
> > >
> > >>> Personally I think QEMU's
> > >>> policy is fine but we should start introducing exceptions, possibly
> > >>> including large contributions with pre-authorization (but not
> > >>> pre-approval) from the maintainer.
> > >>
> > >> I want to note that [...] while we were essentially in the "blanket
> ban"
> > >> state anyway, there was no particular need to have the discussion
> about
> > >> other reasons we might also want to be restrictive or cautious about
> AI
> > >> contributions, but those other reasons and viewpoints don't go away
> > >> automatically with the legal one.
> > >>
> > >> I have quite a lot of sympathy with the rationale behind the Zig
> policy,
> > >> for instance: https://kristoff.it/blog/contributor-poker-and-ai/ I
> spend
> > >> quite a lot of time reviewing patches for things which are features I
> don't
> > >> necessarily personally care about. I'm happy with doing that for other
> > >> people who are hopefully learning and gaining something from the
> process;
> > >> I'm much less interested in reviewing a mountain of LLM-generated
> patches.
> > >>
> > >
> > > I agree and that's a good argument for pre-discussion with the
> maintainers.
> > > It would anyway be the right thing to do for large contributions, but
> it's
> > > even more important with AI given the different balance between
> contributor
> > > and reviewer.
> >
> > I think the real problem is people who don't know what they are doing yet
> > use an AI to generate a patch and submit it anyway. Reviewers are then
> > flooded with nonsense that they have to look at to find out if there's
> > anything useful in it which takes their time from doing more useful
> > things. So the policy should make clear that we don't accept patches
> > generated by AI that no human has read and understood before submission
> > and adding a S-o-b should also mean (besides that the submitter made sure
> > there's no copyright infringement) that that person has knowledge about
> > the patch and is willing to correct it. Then reviewers can just bounce AI
> > nonsense back to the conrtibutor or ignore it if they don't reply (or
> > reply with more AI nonsense suggesting they don't know what the patch
> does
> > so can't correct it). I think that's the real fear that has led to the AI
> > ban and the copyright issues were just a convenient excuse. Maybe
> > clarifying this in the policy could be done although there will always be
> > people who ignore documents. So maybe what we want is no direct
> submission
> > of AI generated patches without at least a human inbetween who has
> already
> > reviewed tha patch before sending it to the list.
>
> That sounds reasonable.
>

I agree.

And for large submission we can have a smaller limit for AI or other poorly
explained code.

Warner

>

[-- Attachment #2: Type: text/html, Size: 4651 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-26  8:23   ` Peter Maydell
  2026-05-26  9:28     ` Alex Bennée
  2026-05-26  9:57     ` Paolo Bonzini
@ 2026-05-27  7:11     ` Philippe Mathieu-Daudé
  2 siblings, 0 replies; 59+ messages in thread
From: Philippe Mathieu-Daudé @ 2026-05-27  7:11 UTC (permalink / raw)
  To: Peter Maydell, Paolo Bonzini; +Cc: Michael S. Tsirkin, qemu-devel, stefanha

On 26/5/26 10:23, Peter Maydell wrote:

> I have quite a lot of sympathy with the rationale behind the
> Zig policy, for instance:
>   https://kristoff.it/blog/contributor-poker-and-ai/

Thanks for sharing this link!

> I spend quite a lot of time reviewing patches for things which
> are features I don't necessarily personally care about. I'm
> happy with doing that for other people who are hopefully
> learning and gaining something from the process; I'm much
> less interested in reviewing a mountain of LLM-generated patches.
> 
> thanks
> -- PMM
> 



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-24 12:42 on ai generated and code provenance Michael S. Tsirkin
  2026-05-24 17:06 ` Alex Bennée
  2026-05-25 16:32 ` Paolo Bonzini
@ 2026-05-26 17:43 ` Kevin Wolf
  2026-05-26 18:03   ` Michael S. Tsirkin
  2 siblings, 1 reply; 59+ messages in thread
From: Kevin Wolf @ 2026-05-26 17:43 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel, stefanha

Am 24.05.2026 um 14:42 hat Michael S. Tsirkin geschrieben:
> So, I had to reject a perfectly reasonable patch:
> https://lore.kernel.org/qemu-devel/20260320193746.242704-1-jinpu.wang@ionos.com/
> just because of a tool used to make it.
> 
> 
> 	How contributors could comply with DCO terms (b) or (c) for the output of AI
> 	content generators commonly available today is unclear.  The QEMU project is
> 	not willing or able to accept the legal risks of non-compliance.
> 
> 
> But, since this was written, Red Hat's Richard Fontana and Chris Wright
> published this piece:
> https://www.redhat.com/en/blog/ai-assisted-development-and-open-source-navigating-legal-issues
> 
> 
> Saying, in particular "
> 	We understand this concern, but the DCO has never
> 	been interpreted to require that every line of a contribution must be
> 	the personal creative expression of the contributor or another human
> 	developer. 
> "

I never found that blog post particularly convincing, especially because
they acknowledge a concern:

    There are two versions of this concern. The first is practical: that
    an AI tool could covertly insert excerpts of proprietary (or
    license-incompatible) code into an open source project, potentially
    creating legal risk for maintainers and users. The second is broader
    and more philosophical: that large language models, trained on vast
    amounts of open source software, are essentially misappropriating
    the community’s work, producing outputs stripped of the obligations
    that open source licenses require.

    We think these concerns deserve to be taken seriously.

The second one is essentially what I understood the QEMU policy to be
about. Unfortunately, the blog post then goes on to only ever deal with
the first one and ignore the second one that seems more relevant for us.

So yes, the DCO isn't about "personal creative expression" or whatever
(and nobody suggested it is, this is a strawman), but it's about whether
the submitter has the legal rights to submit the code. And that's
exactly the question we decided we don't want to take a risk on.

So if that part isn't helpful, what has changed since we introduced the
AI policy? It's a few points:

1. While AI has been in use for a while now, we haven't seen projects
   accepting AI generated code/content get into big trouble. While it
   could still happen in the future, it might be an indication that the
   probability of the risk hitting us is not that high.

2. The useful part of the blog post is that it tells us that Red Hat
   considers the risk acceptable. This can inform our assessment of the
   risks, though of course there might be a significant difference in
   the impact of the risk for a company with a legal department and an
   open source community consisting mainly of developers acting as
   individuals.

   I think it's obvious that if the QEMU project gets involved in a
   legal case, we have a problem (at the very least long lasting
   distraction from actual work on QEMU), even if we didn't do anything
   wrong and a good lawyer would easily win the case.

3. It was easy to just outright ban AI while its results were usually
   not really usable anyway. This has changed meanwhile, so it's much
   harder to maintain an absolute ban.

   It's not really the best use of my time to look at the idea in
   AI-generated test cases and then rewrite them from scratch so I can
   actually submit them. (On the other hand, I think my rewritten
   submissions were always better and more maintainable than what AI
   produced initially, so there's that.)

So while my perspective is a lot more nuanced than yours, I do see a
shift in the balance and was actually thinking of suggesting a change of
the policy myself.

What I was thinking of was allowing AI-generated content in places where
it's at least easy to revert if there is ever a problem with it: Tests,
documentation etc., but not core code that lots of other things depend
on and that will have evolved a lot when we notice a problem and for
which throwing away is simply not an option.

> I propose adopting linux's rules instead:
> https://docs.kernel.org/process/coding-assistants.html
> 
> which boils down to attribution.

What would we actually do with the detailed information? Why do we care
which model was used? Is this helpful commit metadata or is it just free
advertising for a handful of companies?

I think I would see more use in a tag like (better name welcome):

    AI-used-for: [code|tests|docs|commit message]...

Kevin

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-26 17:43 ` Kevin Wolf
@ 2026-05-26 18:03   ` Michael S. Tsirkin
  2026-05-26 18:59     ` Kevin Wolf
  0 siblings, 1 reply; 59+ messages in thread
From: Michael S. Tsirkin @ 2026-05-26 18:03 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, stefanha

On Tue, May 26, 2026 at 07:43:35PM +0200, Kevin Wolf wrote:
> Am 24.05.2026 um 14:42 hat Michael S. Tsirkin geschrieben:
> > So, I had to reject a perfectly reasonable patch:
> > https://lore.kernel.org/qemu-devel/20260320193746.242704-1-jinpu.wang@ionos.com/
> > just because of a tool used to make it.
> > 
> > 
> > 	How contributors could comply with DCO terms (b) or (c) for the output of AI
> > 	content generators commonly available today is unclear.  The QEMU project is
> > 	not willing or able to accept the legal risks of non-compliance.
> > 
> > 
> > But, since this was written, Red Hat's Richard Fontana and Chris Wright
> > published this piece:
> > https://www.redhat.com/en/blog/ai-assisted-development-and-open-source-navigating-legal-issues
> > 
> > 
> > Saying, in particular "
> > 	We understand this concern, but the DCO has never
> > 	been interpreted to require that every line of a contribution must be
> > 	the personal creative expression of the contributor or another human
> > 	developer. 
> > "
> 
> I never found that blog post particularly convincing, especially because
> they acknowledge a concern:
> 
>     There are two versions of this concern. The first is practical: that
>     an AI tool could covertly insert excerpts of proprietary (or
>     license-incompatible) code into an open source project, potentially
>     creating legal risk for maintainers and users. The second is broader
>     and more philosophical: that large language models, trained on vast
>     amounts of open source software, are essentially misappropriating
>     the community’s work, producing outputs stripped of the obligations
>     that open source licenses require.
> 
>     We think these concerns deserve to be taken seriously.
> 
> The second one is essentially what I understood the QEMU policy to be
> about. Unfortunately, the blog post then goes on to only ever deal with
> the first one and ignore the second one that seems more relevant for us.
> 
> So yes, the DCO isn't about "personal creative expression" or whatever
> (and nobody suggested it is, this is a strawman), but it's about whether
> the submitter has the legal rights to submit the code. And that's
> exactly the question we decided we don't want to take a risk on.
> 
> 
> So if that part isn't helpful, what has changed since we introduced the
> AI policy? It's a few points:
> 
> 1. While AI has been in use for a while now, we haven't seen projects
>    accepting AI generated code/content get into big trouble. While it
>    could still happen in the future, it might be an indication that the
>    probability of the risk hitting us is not that high.
> 
> 2. The useful part of the blog post is that it tells us that Red Hat
>    considers the risk acceptable. This can inform our assessment of the
>    risks, though of course there might be a significant difference in
>    the impact of the risk for a company with a legal department and an
>    open source community consisting mainly of developers acting as
>    individuals.
> 
>    I think it's obvious that if the QEMU project gets involved in a
>    legal case, we have a problem (at the very least long lasting
>    distraction from actual work on QEMU), even if we didn't do anything
>    wrong and a good lawyer would easily win the case.
> 
> 3. It was easy to just outright ban AI while its results were usually
>    not really usable anyway. This has changed meanwhile, so it's much
>    harder to maintain an absolute ban.
> 
>    It's not really the best use of my time to look at the idea in
>    AI-generated test cases and then rewrite them from scratch so I can
>    actually submit them. (On the other hand, I think my rewritten
>    submissions were always better and more maintainable than what AI
>    produced initially, so there's that.)
> 
> So while my perspective is a lot more nuanced than yours, I do see a
> shift in the balance and was actually thinking of suggesting a change of
> the policy myself.
> 
> What I was thinking of was allowing AI-generated content in places where
> it's at least easy to revert if there is ever a problem with it: Tests,
> documentation etc., but not core code that lots of other things depend
> on and that will have evolved a lot when we notice a problem and for
> which throwing away is simply not an option.

OK. what about trivial changes? Using AI as a better sed?

> > I propose adopting linux's rules instead:
> > https://docs.kernel.org/process/coding-assistants.html
> > 
> > which boils down to attribution.
> 
> What would we actually do with the detailed information? Why do we care
> which model was used? Is this helpful commit metadata or is it just free
> advertising for a handful of companies?

I presume, if a specific model is somehow declared "contaminated" so we
can locate its output?

> I think I would see more use in a tag like (better name welcome):
> 
>     AI-used-for: [code|tests|docs|commit message]...
> 
> Kevin

I surely don't mind.

-- 
MST



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-26 18:03   ` Michael S. Tsirkin
@ 2026-05-26 18:59     ` Kevin Wolf
  2026-05-26 19:30       ` Michael S. Tsirkin
  2026-05-26 19:50       ` Michael S. Tsirkin
  0 siblings, 2 replies; 59+ messages in thread
From: Kevin Wolf @ 2026-05-26 18:59 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel, stefanha

Am 26.05.2026 um 20:03 hat Michael S. Tsirkin geschrieben:
> On Tue, May 26, 2026 at 07:43:35PM +0200, Kevin Wolf wrote:
> > Am 24.05.2026 um 14:42 hat Michael S. Tsirkin geschrieben:
> > > So, I had to reject a perfectly reasonable patch:
> > > https://lore.kernel.org/qemu-devel/20260320193746.242704-1-jinpu.wang@ionos.com/
> > > just because of a tool used to make it.
> > > 
> > > 
> > > 	How contributors could comply with DCO terms (b) or (c) for the output of AI
> > > 	content generators commonly available today is unclear.  The QEMU project is
> > > 	not willing or able to accept the legal risks of non-compliance.
> > > 
> > > 
> > > But, since this was written, Red Hat's Richard Fontana and Chris Wright
> > > published this piece:
> > > https://www.redhat.com/en/blog/ai-assisted-development-and-open-source-navigating-legal-issues
> > > 
> > > 
> > > Saying, in particular "
> > > 	We understand this concern, but the DCO has never
> > > 	been interpreted to require that every line of a contribution must be
> > > 	the personal creative expression of the contributor or another human
> > > 	developer. 
> > > "
> > 
> > I never found that blog post particularly convincing, especially because
> > they acknowledge a concern:
> > 
> >     There are two versions of this concern. The first is practical: that
> >     an AI tool could covertly insert excerpts of proprietary (or
> >     license-incompatible) code into an open source project, potentially
> >     creating legal risk for maintainers and users. The second is broader
> >     and more philosophical: that large language models, trained on vast
> >     amounts of open source software, are essentially misappropriating
> >     the community’s work, producing outputs stripped of the obligations
> >     that open source licenses require.
> > 
> >     We think these concerns deserve to be taken seriously.
> > 
> > The second one is essentially what I understood the QEMU policy to be
> > about. Unfortunately, the blog post then goes on to only ever deal with
> > the first one and ignore the second one that seems more relevant for us.
> > 
> > So yes, the DCO isn't about "personal creative expression" or whatever
> > (and nobody suggested it is, this is a strawman), but it's about whether
> > the submitter has the legal rights to submit the code. And that's
> > exactly the question we decided we don't want to take a risk on.
> > 
> > 
> > So if that part isn't helpful, what has changed since we introduced the
> > AI policy? It's a few points:
> > 
> > 1. While AI has been in use for a while now, we haven't seen projects
> >    accepting AI generated code/content get into big trouble. While it
> >    could still happen in the future, it might be an indication that the
> >    probability of the risk hitting us is not that high.
> > 
> > 2. The useful part of the blog post is that it tells us that Red Hat
> >    considers the risk acceptable. This can inform our assessment of the
> >    risks, though of course there might be a significant difference in
> >    the impact of the risk for a company with a legal department and an
> >    open source community consisting mainly of developers acting as
> >    individuals.
> > 
> >    I think it's obvious that if the QEMU project gets involved in a
> >    legal case, we have a problem (at the very least long lasting
> >    distraction from actual work on QEMU), even if we didn't do anything
> >    wrong and a good lawyer would easily win the case.
> > 
> > 3. It was easy to just outright ban AI while its results were usually
> >    not really usable anyway. This has changed meanwhile, so it's much
> >    harder to maintain an absolute ban.
> > 
> >    It's not really the best use of my time to look at the idea in
> >    AI-generated test cases and then rewrite them from scratch so I can
> >    actually submit them. (On the other hand, I think my rewritten
> >    submissions were always better and more maintainable than what AI
> >    produced initially, so there's that.)
> > 
> > So while my perspective is a lot more nuanced than yours, I do see a
> > shift in the balance and was actually thinking of suggesting a change of
> > the policy myself.
> > 
> > What I was thinking of was allowing AI-generated content in places where
> > it's at least easy to revert if there is ever a problem with it: Tests,
> > documentation etc., but not core code that lots of other things depend
> > on and that will have evolved a lot when we notice a problem and for
> > which throwing away is simply not an option.
> 
> OK. what about trivial changes? Using AI as a better sed?

The above is just what I was thinking of suggesting myself. I didn't
mean to imply that I'm opposed to anything else, but just thought I'd
post it as an example of fairly obvious things we could allow.

Of course, it also shows my own pain points. I don't see that much use
in it for generating code for QEMU proper, because these changes tend to
be few lines and I have an opinion on each of the lines - tests are the
opposite, lots of boilerplate and I don't care much how elegant they
are because nothing else will build on them anyway.

So yes, trivial patches is another obvious starting point. The challenge
there is defining the line where a patch stops being trivial. So I'm not
completely sure if making this distinction in a policy is a good idea;
maybe practically speaking it has to be all or nothing in terms of
creativity (for lack of a better word).

As an aside, personally, I'm not convinced that AI can be a "better
sed". If it's really about mechanical changes, I think the resulting
patch is much more reviewable if the agent doesn't modify the code, but
just generate the sed command line or the Coccinelle patch and that is
included in the commit message. Reviewers can then just review that and
then reproduce the result themselves for comparison. This is impossible
with AI prompts and agents do tend to forget an instance of something to
replace here and there, so you do have to review the result carefully.

But none of these "better sed" problems need to handled in an AI policy.
If a patch is hard to review, the maintainer will already reject it on
those grounds.

> > > I propose adopting linux's rules instead:
> > > https://docs.kernel.org/process/coding-assistants.html
> > > 
> > > which boils down to attribution.
> > 
> > What would we actually do with the detailed information? Why do we care
> > which model was used? Is this helpful commit metadata or is it just free
> > advertising for a handful of companies?
> 
> I presume, if a specific model is somehow declared "contaminated" so we
> can locate its output?

Contaminated in what respect?

Quality? Might be because of malicious intentions or just because the
model happens to be bad at a specific question. Review and testing must
be able to catch quality problems. I don't think this is different from
any other contributions.

Copyright? If so, then we're back to "can you really sign the DCO?"

Something completely different?

> > I think I would see more use in a tag like (better name welcome):
> > 
> >     AI-used-for: [code|tests|docs|commit message]...
> > 
> > Kevin
> 
> I surely don't mind.

Great. Let's see what others think.

Kevin



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-26 18:59     ` Kevin Wolf
@ 2026-05-26 19:30       ` Michael S. Tsirkin
  2026-05-26 19:52         ` Warner Losh
  2026-05-26 19:50       ` Michael S. Tsirkin
  1 sibling, 1 reply; 59+ messages in thread
From: Michael S. Tsirkin @ 2026-05-26 19:30 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, stefanha

On Tue, May 26, 2026 at 08:59:55PM +0200, Kevin Wolf wrote:
> Am 26.05.2026 um 20:03 hat Michael S. Tsirkin geschrieben:
> > On Tue, May 26, 2026 at 07:43:35PM +0200, Kevin Wolf wrote:
> > > Am 24.05.2026 um 14:42 hat Michael S. Tsirkin geschrieben:
> > > > So, I had to reject a perfectly reasonable patch:
> > > > https://lore.kernel.org/qemu-devel/20260320193746.242704-1-jinpu.wang@ionos.com/
> > > > just because of a tool used to make it.
> > > > 
> > > > 
> > > > 	How contributors could comply with DCO terms (b) or (c) for the output of AI
> > > > 	content generators commonly available today is unclear.  The QEMU project is
> > > > 	not willing or able to accept the legal risks of non-compliance.
> > > > 
> > > > 
> > > > But, since this was written, Red Hat's Richard Fontana and Chris Wright
> > > > published this piece:
> > > > https://www.redhat.com/en/blog/ai-assisted-development-and-open-source-navigating-legal-issues
> > > > 
> > > > 
> > > > Saying, in particular "
> > > > 	We understand this concern, but the DCO has never
> > > > 	been interpreted to require that every line of a contribution must be
> > > > 	the personal creative expression of the contributor or another human
> > > > 	developer. 
> > > > "
> > > 
> > > I never found that blog post particularly convincing, especially because
> > > they acknowledge a concern:
> > > 
> > >     There are two versions of this concern. The first is practical: that
> > >     an AI tool could covertly insert excerpts of proprietary (or
> > >     license-incompatible) code into an open source project, potentially
> > >     creating legal risk for maintainers and users. The second is broader
> > >     and more philosophical: that large language models, trained on vast
> > >     amounts of open source software, are essentially misappropriating
> > >     the community’s work, producing outputs stripped of the obligations
> > >     that open source licenses require.
> > > 
> > >     We think these concerns deserve to be taken seriously.
> > > 
> > > The second one is essentially what I understood the QEMU policy to be
> > > about. Unfortunately, the blog post then goes on to only ever deal with
> > > the first one and ignore the second one that seems more relevant for us.
> > > 
> > > So yes, the DCO isn't about "personal creative expression" or whatever
> > > (and nobody suggested it is, this is a strawman), but it's about whether
> > > the submitter has the legal rights to submit the code. And that's
> > > exactly the question we decided we don't want to take a risk on.
> > > 
> > > 
> > > So if that part isn't helpful, what has changed since we introduced the
> > > AI policy? It's a few points:
> > > 
> > > 1. While AI has been in use for a while now, we haven't seen projects
> > >    accepting AI generated code/content get into big trouble. While it
> > >    could still happen in the future, it might be an indication that the
> > >    probability of the risk hitting us is not that high.
> > > 
> > > 2. The useful part of the blog post is that it tells us that Red Hat
> > >    considers the risk acceptable. This can inform our assessment of the
> > >    risks, though of course there might be a significant difference in
> > >    the impact of the risk for a company with a legal department and an
> > >    open source community consisting mainly of developers acting as
> > >    individuals.
> > > 
> > >    I think it's obvious that if the QEMU project gets involved in a
> > >    legal case, we have a problem (at the very least long lasting
> > >    distraction from actual work on QEMU), even if we didn't do anything
> > >    wrong and a good lawyer would easily win the case.
> > > 
> > > 3. It was easy to just outright ban AI while its results were usually
> > >    not really usable anyway. This has changed meanwhile, so it's much
> > >    harder to maintain an absolute ban.
> > > 
> > >    It's not really the best use of my time to look at the idea in
> > >    AI-generated test cases and then rewrite them from scratch so I can
> > >    actually submit them. (On the other hand, I think my rewritten
> > >    submissions were always better and more maintainable than what AI
> > >    produced initially, so there's that.)
> > > 
> > > So while my perspective is a lot more nuanced than yours, I do see a
> > > shift in the balance and was actually thinking of suggesting a change of
> > > the policy myself.
> > > 
> > > What I was thinking of was allowing AI-generated content in places where
> > > it's at least easy to revert if there is ever a problem with it: Tests,
> > > documentation etc., but not core code that lots of other things depend
> > > on and that will have evolved a lot when we notice a problem and for
> > > which throwing away is simply not an option.
> > 
> > OK. what about trivial changes? Using AI as a better sed?
> 
> The above is just what I was thinking of suggesting myself. I didn't
> mean to imply that I'm opposed to anything else, but just thought I'd
> post it as an example of fairly obvious things we could allow.
> 
> Of course, it also shows my own pain points. I don't see that much use
> in it for generating code for QEMU proper, because these changes tend to
> be few lines and I have an opinion on each of the lines - tests are the
> opposite, lots of boilerplate and I don't care much how elegant they
> are because nothing else will build on them anyway.
> 
> So yes, trivial patches is another obvious starting point. The challenge
> there is defining the line where a patch stops being trivial. So I'm not
> completely sure if making this distinction in a policy is a good idea;
> maybe practically speaking it has to be all or nothing in terms of
> creativity (for lack of a better word).

Let the maintainers decide?

Or we can enumerate things:
- fixing tool (compiler/checkpatch/smatch) errors/warnings in obvious ways (e.g. suggested by the
  tools itself, such as initializing an uninitialized variable)
- propagating API changes (e.g. rebasing a patch after an API change)
- anything that could be done by a perl/sed/coccinelle script
- adding or fixing code comments



> As an aside, personally, I'm not convinced that AI can be a "better
> sed". If it's really about mechanical changes, I think the resulting
> patch is much more reviewable if the agent doesn't modify the code, but
> just generate the sed command line or the Coccinelle patch and that is
> included in the commit message. Reviewers can then just review that and
> then reproduce the result themselves for comparison. This is impossible
> with AI prompts and agents do tend to forget an instance of something to
> replace here and there, so you do have to review the result carefully.
> 
> But none of these "better sed" problems need to handled in an AI policy.
> If a patch is hard to review, the maintainer will already reject it on
> those grounds.

Absolutely.

> > > > I propose adopting linux's rules instead:
> > > > https://docs.kernel.org/process/coding-assistants.html
> > > > 
> > > > which boils down to attribution.
> > > 
> > > What would we actually do with the detailed information? Why do we care
> > > which model was used? Is this helpful commit metadata or is it just free
> > > advertising for a handful of companies?
> > 
> > I presume, if a specific model is somehow declared "contaminated" so we
> > can locate its output?
> 
> Contaminated in what respect?
> 
> Quality? Might be because of malicious intentions or just because the
> model happens to be bad at a specific question. Review and testing must
> be able to catch quality problems. I don't think this is different from
> any other contributions.
> 
> Copyright? If so, then we're back to "can you really sign the DCO?"
> 
> Something completely different?
> 
> > > I think I would see more use in a tag like (better name welcome):
> > > 
> > >     AI-used-for: [code|tests|docs|commit message]...
> > > 
> > > Kevin
> > 
> > I surely don't mind.
> 
> Great. Let's see what others think.
> 
> Kevin



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-26 19:30       ` Michael S. Tsirkin
@ 2026-05-26 19:52         ` Warner Losh
  2026-05-27  8:41           ` Kevin Wolf
  0 siblings, 1 reply; 59+ messages in thread
From: Warner Losh @ 2026-05-26 19:52 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Kevin Wolf, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 10033 bytes --]

On Tue, May 26, 2026 at 1:32 PM Michael S. Tsirkin <mst@redhat.com> wrote:

> On Tue, May 26, 2026 at 08:59:55PM +0200, Kevin Wolf wrote:
> > Am 26.05.2026 um 20:03 hat Michael S. Tsirkin geschrieben:
> > > On Tue, May 26, 2026 at 07:43:35PM +0200, Kevin Wolf wrote:
> > > > Am 24.05.2026 um 14:42 hat Michael S. Tsirkin geschrieben:
> > > > > So, I had to reject a perfectly reasonable patch:
> > > > >
> https://lore.kernel.org/qemu-devel/20260320193746.242704-1-jinpu.wang@ionos.com/
> > > > > just because of a tool used to make it.
> > > > >
> > > > >
> > > > >         How contributors could comply with DCO terms (b) or (c)
> for the output of AI
> > > > >         content generators commonly available today is unclear.
> The QEMU project is
> > > > >         not willing or able to accept the legal risks of
> non-compliance.
> > > > >
> > > > >
> > > > > But, since this was written, Red Hat's Richard Fontana and Chris
> Wright
> > > > > published this piece:
> > > > >
> https://www.redhat.com/en/blog/ai-assisted-development-and-open-source-navigating-legal-issues
> > > > >
> > > > >
> > > > > Saying, in particular "
> > > > >         We understand this concern, but the DCO has never
> > > > >         been interpreted to require that every line of a
> contribution must be
> > > > >         the personal creative expression of the contributor or
> another human
> > > > >         developer.
> > > > > "
> > > >
> > > > I never found that blog post particularly convincing, especially
> because
> > > > they acknowledge a concern:
> > > >
> > > >     There are two versions of this concern. The first is practical:
> that
> > > >     an AI tool could covertly insert excerpts of proprietary (or
> > > >     license-incompatible) code into an open source project,
> potentially
> > > >     creating legal risk for maintainers and users. The second is
> broader
> > > >     and more philosophical: that large language models, trained on
> vast
> > > >     amounts of open source software, are essentially misappropriating
> > > >     the community’s work, producing outputs stripped of the
> obligations
> > > >     that open source licenses require.
> > > >
> > > >     We think these concerns deserve to be taken seriously.
> > > >
> > > > The second one is essentially what I understood the QEMU policy to be
> > > > about. Unfortunately, the blog post then goes on to only ever deal
> with
> > > > the first one and ignore the second one that seems more relevant for
> us.
> > > >
> > > > So yes, the DCO isn't about "personal creative expression" or
> whatever
> > > > (and nobody suggested it is, this is a strawman), but it's about
> whether
> > > > the submitter has the legal rights to submit the code. And that's
> > > > exactly the question we decided we don't want to take a risk on.
> > > >
> > > >
> > > > So if that part isn't helpful, what has changed since we introduced
> the
> > > > AI policy? It's a few points:
> > > >
> > > > 1. While AI has been in use for a while now, we haven't seen projects
> > > >    accepting AI generated code/content get into big trouble. While it
> > > >    could still happen in the future, it might be an indication that
> the
> > > >    probability of the risk hitting us is not that high.
> > > >
> > > > 2. The useful part of the blog post is that it tells us that Red Hat
> > > >    considers the risk acceptable. This can inform our assessment of
> the
> > > >    risks, though of course there might be a significant difference in
> > > >    the impact of the risk for a company with a legal department and
> an
> > > >    open source community consisting mainly of developers acting as
> > > >    individuals.
> > > >
> > > >    I think it's obvious that if the QEMU project gets involved in a
> > > >    legal case, we have a problem (at the very least long lasting
> > > >    distraction from actual work on QEMU), even if we didn't do
> anything
> > > >    wrong and a good lawyer would easily win the case.
> > > >
> > > > 3. It was easy to just outright ban AI while its results were usually
> > > >    not really usable anyway. This has changed meanwhile, so it's much
> > > >    harder to maintain an absolute ban.
> > > >
> > > >    It's not really the best use of my time to look at the idea in
> > > >    AI-generated test cases and then rewrite them from scratch so I
> can
> > > >    actually submit them. (On the other hand, I think my rewritten
> > > >    submissions were always better and more maintainable than what AI
> > > >    produced initially, so there's that.)
> > > >
> > > > So while my perspective is a lot more nuanced than yours, I do see a
> > > > shift in the balance and was actually thinking of suggesting a
> change of
> > > > the policy myself.
> > > >
> > > > What I was thinking of was allowing AI-generated content in places
> where
> > > > it's at least easy to revert if there is ever a problem with it:
> Tests,
> > > > documentation etc., but not core code that lots of other things
> depend
> > > > on and that will have evolved a lot when we notice a problem and for
> > > > which throwing away is simply not an option.
> > >
> > > OK. what about trivial changes? Using AI as a better sed?
> >
> > The above is just what I was thinking of suggesting myself. I didn't
> > mean to imply that I'm opposed to anything else, but just thought I'd
> > post it as an example of fairly obvious things we could allow.
> >
> > Of course, it also shows my own pain points. I don't see that much use
> > in it for generating code for QEMU proper, because these changes tend to
> > be few lines and I have an opinion on each of the lines - tests are the
> > opposite, lots of boilerplate and I don't care much how elegant they
> > are because nothing else will build on them anyway.
> >
> > So yes, trivial patches is another obvious starting point. The challenge
> > there is defining the line where a patch stops being trivial. So I'm not
> > completely sure if making this distinction in a policy is a good idea;
> > maybe practically speaking it has to be all or nothing in terms of
> > creativity (for lack of a better word).
>
> Let the maintainers decide?
>
> Or we can enumerate things:
> - fixing tool (compiler/checkpatch/smatch) errors/warnings in obvious ways
> (e.g. suggested by the
>   tools itself, such as initializing an uninitialized variable)
> - propagating API changes (e.g. rebasing a patch after an API change)
> - anything that could be done by a perl/sed/coccinelle script
> - adding or fixing code comments
>

Those are good examples. Perhaps the following words are good place to start
to frame what I've seen expressed here:

The QEMU Project currently may accept limited uses of AI that produce high
quality patches that are limited in the creative content added. While
maintainers
will ultimately decide, changes like the following fall within this policy
1. Fixing obvious warnings in the obvious ways suggested by the tool
2. Tree wide API changes, and other similar mechanical changes done today
with
     perl/python/sed/coccinelle
3. Limited, small changes to fix bugs or add a small new feature whose
scope is
    less than about 100 lines and the originator can explain them all or
the meta
    issues about the patch.
Maintainers are free to accept or reject changes outside these guidelines,
but
please check with the maintainers before sending to keep the load from AI
content
to something they can manage. Large and Very Large patches, especailly
ones that have not been deeply analyised and tested by humans, should be
avoided.

Though maybe the list of 'exceptions' needs work. But the basic framing is
that we will accept some, high quality patches. Maintainers have some
discression
for larger pieces to a point, and we still don't want to drown in AI slop.

Warner


>
> > As an aside, personally, I'm not convinced that AI can be a "better
> > sed". If it's really about mechanical changes, I think the resulting
> > patch is much more reviewable if the agent doesn't modify the code, but
> > just generate the sed command line or the Coccinelle patch and that is
> > included in the commit message. Reviewers can then just review that and
> > then reproduce the result themselves for comparison. This is impossible
> > with AI prompts and agents do tend to forget an instance of something to
> > replace here and there, so you do have to review the result carefully.
> >
> > But none of these "better sed" problems need to handled in an AI policy.
> > If a patch is hard to review, the maintainer will already reject it on
> > those grounds.
>
> Absolutely.
>
> > > > > I propose adopting linux's rules instead:
> > > > > https://docs.kernel.org/process/coding-assistants.html
> > > > >
> > > > > which boils down to attribution.
> > > >
> > > > What would we actually do with the detailed information? Why do we
> care
> > > > which model was used? Is this helpful commit metadata or is it just
> free
> > > > advertising for a handful of companies?
> > >
> > > I presume, if a specific model is somehow declared "contaminated" so we
> > > can locate its output?
> >
> > Contaminated in what respect?
> >
> > Quality? Might be because of malicious intentions or just because the
> > model happens to be bad at a specific question. Review and testing must
> > be able to catch quality problems. I don't think this is different from
> > any other contributions.
> >
> > Copyright? If so, then we're back to "can you really sign the DCO?"
> >
> > Something completely different?
> >
> > > > I think I would see more use in a tag like (better name welcome):
> > > >
> > > >     AI-used-for: [code|tests|docs|commit message]...
> > > >
> > > > Kevin
> > >
> > > I surely don't mind.
> >
> > Great. Let's see what others think.
> >
> > Kevin
>
>
>

[-- Attachment #2: Type: text/html, Size: 12827 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-26 19:52         ` Warner Losh
@ 2026-05-27  8:41           ` Kevin Wolf
  2026-05-27 10:01             ` Paolo Bonzini
  0 siblings, 1 reply; 59+ messages in thread
From: Kevin Wolf @ 2026-05-27  8:41 UTC (permalink / raw)
  To: Warner Losh; +Cc: Michael S. Tsirkin, qemu-devel, stefanha

Am 26.05.2026 um 21:52 hat Warner Losh geschrieben:
> On Tue, May 26, 2026 at 1:32 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> 
> > On Tue, May 26, 2026 at 08:59:55PM +0200, Kevin Wolf wrote:
> > > So yes, trivial patches is another obvious starting point. The challenge
> > > there is defining the line where a patch stops being trivial. So I'm not
> > > completely sure if making this distinction in a policy is a good idea;
> > > maybe practically speaking it has to be all or nothing in terms of
> > > creativity (for lack of a better word).
> >
> > Let the maintainers decide?
> >
> > Or we can enumerate things:
> > - fixing tool (compiler/checkpatch/smatch) errors/warnings in obvious ways
> > (e.g. suggested by the
> >   tools itself, such as initializing an uninitialized variable)
> > - propagating API changes (e.g. rebasing a patch after an API change)
> > - anything that could be done by a perl/sed/coccinelle script
> > - adding or fixing code comments
> >
> 
> Those are good examples. Perhaps the following words are good place to start
> to frame what I've seen expressed here:
> 
> The QEMU Project currently may accept limited uses of AI that produce
> high quality patches that are limited in the creative content added.
> While maintainers will ultimately decide, changes like the following
> fall within this policy
> 1. Fixing obvious warnings in the obvious ways suggested by the tool
> 2. Tree wide API changes, and other similar mechanical changes done
>    today with perl/python/sed/coccinelle

As I said in the paragraph you quoted below, I don't think we should
encourage using AI for tasks that a deterministic tool could do. If you
can use a deterministic tool like sed or Coccinelle for the job, you
should. I know that writing Coccinelle spatches can be challenging; that
is the part that you can ask AI to help with. (Perl and Python follow
the same logic as long as the script is simple, but obviously you have
to stop when the helper script becomes almost as complex as the change
itself.)

Letting AI perform the change directly instead may be an acceptable
shortcut for a one-man hobby project that nobody else will ever look at,
but in the context of a community project like QEMU in which your
changes have to be reviewed and understood by others, it matters a lot
that the output of the tool is reproducible. Otherwise, you're creating
unnecessary work for others, and that isn't acceptable.

So maybe we should even explicitly mention a recommendation like the
following:

    If you can use a deterministic tool, don't use AI instead. If you
    don't know how to use the deterministic tool, use the AI to tell you
    how to use it instead of trying to replace it.

> 3. Limited, small changes to fix bugs or add a small new feature whose
>    scope is less than about 100 lines and the originator can explain
>    them all or the meta issues about the patch.

Not sure if mentioning a number of lines is wise. 100 lines can be
mostly boilerplate and simple sequential code or they can be a deeply
nested complex algorithm.

> Maintainers are free to accept or reject changes outside these
> guidelines, but please check with the maintainers before sending to
> keep the load from AI content to something they can manage. Large and
> Very Large patches, especailly ones that have not been deeply
> analyised and tested by humans, should be avoided.
> 
> Though maybe the list of 'exceptions' needs work. But the basic
> framing is that we will accept some, high quality patches. Maintainers
> have some discression for larger pieces to a point, and we still don't
> want to drown in AI slop.

Yes, if we decide that we do want to make patch complexity/creative
expression/whatever you may call it part of the criteria, then having a
list like this looks like a possible approach. The details of what
exactly should be in it would certainly lead to more discussion, though.

Kevin

> Warner
> 
> 
> >
> > > As an aside, personally, I'm not convinced that AI can be a "better
> > > sed". If it's really about mechanical changes, I think the resulting
> > > patch is much more reviewable if the agent doesn't modify the code, but
> > > just generate the sed command line or the Coccinelle patch and that is
> > > included in the commit message. Reviewers can then just review that and
> > > then reproduce the result themselves for comparison. This is impossible
> > > with AI prompts and agents do tend to forget an instance of something to
> > > replace here and there, so you do have to review the result carefully.
> > >
> > > But none of these "better sed" problems need to handled in an AI policy.
> > > If a patch is hard to review, the maintainer will already reject it on
> > > those grounds.
> >
> > Absolutely.



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-27  8:41           ` Kevin Wolf
@ 2026-05-27 10:01             ` Paolo Bonzini
  2026-05-27 10:43               ` Alex Bennée
                                 ` (5 more replies)
  0 siblings, 6 replies; 59+ messages in thread
From: Paolo Bonzini @ 2026-05-27 10:01 UTC (permalink / raw)
  To: Kevin Wolf, Warner Losh; +Cc: Michael S. Tsirkin, qemu-devel, stefanha

On 5/27/26 10:41, Kevin Wolf wrote:
> Am 26.05.2026 um 21:52 hat Warner Losh geschrieben:
>> The QEMU Project currently may accept limited uses of AI that produce
>> high quality patches that are limited in the creative content added.
>> While maintainers will ultimately decide, changes like the following
>> fall within this policy
>> 1. Fixing obvious warnings in the obvious ways suggested by the tool
>> 2. Tree wide API changes, and other similar mechanical changes done
>>     today with perl/python/sed/coccinelle
> 
> As I said in the paragraph you quoted below, I don't think we should
> encourage using AI for tasks that a deterministic tool could do.

In some cases such a tool does not exist.  Much to my surprise, there is 
no tool to do static type inference on Python code, but AI is very good 
at doing it.

> Letting AI perform the change directly instead may be an acceptable
> shortcut for a one-man hobby project that nobody else will ever look at,
> but in the context of a community project like QEMU in which your
> changes have to be reviewed and understood by others, it matters a lot
> that the output of the tool is reproducible. Otherwise, you're creating
> unnecessary work for others, and that isn't acceptable.

When applicable, going through coccinelle (with the aid of AI if needed! 
is indeed a good middle ground as it helps reviewers for large changes. 
If you have many slightly different but easily separated changes (e.g. 
you can split the patch by struct field), it may make things worse.

Its also worth noting that in other cases even sed or coccinelle, while 
deterministic, cannot produce 100% of the patch.

> So maybe we should even explicitly mention a recommendation like the
> following:
> 
>      If you can use a deterministic tool, don't use AI instead. If you
>      don't know how to use the deterministic tool, use the AI to tell you
>      how to use it instead of trying to replace it.

I like it.

>> 3. Limited, small changes to fix bugs or add a small new feature whose
>>     scope is less than about 100 lines and the originator can explain
>>     them all or the meta issues about the patch.
> 
> Not sure if mentioning a number of lines is wise. 100 lines can be
> mostly boilerplate and simple sequential code or they can be a deeply
> nested complex algorithm.

I'd put the threshold at 20-50 at most.

> I think I would see more use in a tag like (better name welcome):
> 
>     AI-used-for: [code|tests|docs|commit message]...

I like this *a lot*.  No need for free advertisement, but some 
traceability is useful.

For tools such as sed or coccinelle, having the exact script in the 
patch or commit message useful.  Plus, the execution of the script more 
or lesss delimits the commit by itself (or 90%+ of it).  For LLMs it's a 
bit less clear cut because separating docs makes little sense.  And the 
exact model is pointless, it will be obsolete in 6 months and provide no 
useful information.

So, something like:

------------------- 8< -------------------
Use of AI-generated content
~~~~~~~~~~~~~~~~~~~~~~~~~~~

The QEMU project currently allows using AI/LLM tools to produce patches 
in scenarios with limited creative content:

Mechanical changes
   If you can use a deterministic tool or a script, don't use AI instead.
   If you don't know how to do the change deterministically, you may
   ask the AI for help, rather than having it stand in for the tools.

Small bug fixes
   These should be limited to 20 lines of code or less, not including
   tests.  You are still expected to understand and explain your changes
   and the rationale behind them.

These boundaries do not apply to other uses of AI, such as researching
APIs or algorithms, static analysis, or debugging, provided their output
is not included in contributions.  Larger uses of AI are allowed as an 
experiment, but they should be agreed upon with the maintainer prior to 
submission.

Use of AI does not remove the need for authors to comply with all other
requirements for contribution.  In particular, the "Signed-off-by"
label in a patch submission is a statement that the author takes
responsibility for the entire contents of the patch, certifying that
their patch submission is made in accordance with the rules of the
`Developer's Certificate of Origin (DCO) <dco>`.

Commit messages for AI-assisted changes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

When AI/LLM tools produce or substantively shape your patch, add an 
``AI-used-for:`` trailer.  The text of the trailer could be one or more 
of ``code``, ``tests``, ``docs``, ``research``, possibly followed by an 
explanation in parentheses::

     AI-used-for: tests, docs
     AI-used-for: code
     AI-used-for: code (refactoring)
     AI-used-for: code (prototype)
     AI-used-for: research

The trailer is intended as a clarification of your DCO obligations as 
well as to guide reviewers.  It is not intended for minimal presence 
such as autocomplete or asking for a pre-review of the patch, and it 
does not remove your responsibility to understand the changes that you 
are submitting.

Include the prompt in the commit message if it helps a reviewer judge 
the result:

* yes: "move field ``foo`` from ``struct aa`` to ``struct bb``.  If a 
function already has a local variable or parameter of type ``struct 
bb``, use it instead of accessing ``aa.bb``."

* yes: "add an implementation of the trait for ``Mutex<T: MyTrait>``, 
forwarding the member functions to ``T`` while taking the lock around 
the calls".

* no: "write user-facing documentation for the new tool"

* no: "write testcases for the new functions"

Deterministic tooling (sed, coccinelle, formatters) is out of scope for 
the trailer, but should be mentioned in the commit message.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-27 10:01             ` Paolo Bonzini
@ 2026-05-27 10:43               ` Alex Bennée
  2026-05-27 12:49                 ` Kevin Wolf
  2026-05-27 10:53               ` Kevin Wolf
                                 ` (4 subsequent siblings)
  5 siblings, 1 reply; 59+ messages in thread
From: Alex Bennée @ 2026-05-27 10:43 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kevin Wolf, Warner Losh, Michael S. Tsirkin, qemu-devel, stefanha

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 5/27/26 10:41, Kevin Wolf wrote:
>> Am 26.05.2026 um 21:52 hat Warner Losh geschrieben:
>>> The QEMU Project currently may accept limited uses of AI that produce
>>> high quality patches that are limited in the creative content added.
>>> While maintainers will ultimately decide, changes like the following
>>> fall within this policy
>>> 1. Fixing obvious warnings in the obvious ways suggested by the tool
>>> 2. Tree wide API changes, and other similar mechanical changes done
>>>     today with perl/python/sed/coccinelle
>> As I said in the paragraph you quoted below, I don't think we should
>> encourage using AI for tasks that a deterministic tool could do.
>
> In some cases such a tool does not exist.  Much to my surprise, there
> is no tool to do static type inference on Python code, but AI is very
> good at doing it.
>
>> Letting AI perform the change directly instead may be an acceptable
>> shortcut for a one-man hobby project that nobody else will ever look at,
>> but in the context of a community project like QEMU in which your
>> changes have to be reviewed and understood by others, it matters a lot
>> that the output of the tool is reproducible. Otherwise, you're creating
>> unnecessary work for others, and that isn't acceptable.
>
> When applicable, going through coccinelle (with the aid of AI if
> needed! is indeed a good middle ground as it helps reviewers for large
> changes. If you have many slightly different but easily separated
> changes (e.g. you can split the patch by struct field), it may make
> things worse.
>
> Its also worth noting that in other cases even sed or coccinelle,
> while deterministic, cannot produce 100% of the patch.
>
>> So maybe we should even explicitly mention a recommendation like the
>> following:
>>      If you can use a deterministic tool, don't use AI instead. If
>> you
>>      don't know how to use the deterministic tool, use the AI to tell you
>>      how to use it instead of trying to replace it.
>
> I like it.
>
>>> 3. Limited, small changes to fix bugs or add a small new feature whose
>>>     scope is less than about 100 lines and the originator can explain
>>>     them all or the meta issues about the patch.
>> Not sure if mentioning a number of lines is wise. 100 lines can be
>> mostly boilerplate and simple sequential code or they can be a deeply
>> nested complex algorithm.
>
> I'd put the threshold at 20-50 at most.
>
>> I think I would see more use in a tag like (better name welcome):
>>     AI-used-for: [code|tests|docs|commit message]...
>
> I like this *a lot*.  No need for free advertisement, but some
> traceability is useful.
>
> For tools such as sed or coccinelle, having the exact script in the
> patch or commit message useful.  Plus, the execution of the script
> more or lesss delimits the commit by itself (or 90%+ of it).  For LLMs
> it's a bit less clear cut because separating docs makes little sense.
> And the exact model is pointless, it will be obsolete in 6 months and
> provide no useful information.
>
> So, something like:
>
> ------------------- 8< -------------------
> Use of AI-generated content
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> The QEMU project currently allows using AI/LLM tools to produce
> patches in scenarios with limited creative content:
>
> Mechanical changes
>   If you can use a deterministic tool or a script, don't use AI instead.
>   If you don't know how to do the change deterministically, you may
>   ask the AI for help, rather than having it stand in for the tools.

I like the idea of pointing people towards tools but I wouldn't be quite
so prescriptive. The series MST referred to was easily eyeball-able and
I suspect the extra steps would generate friction for contributions.
That said the wider the change to the code base the more likely a random
hallucination can get lost in the noise.

Maybe:

  Mechanical changes
    Using AI tools to make simple mechanical changes is allowed. For larger
    tree-wide changes it is strongly recommended to use a deterministic
    tool like `sed` or `coccinelle`. You can use AI to help you craft the
    invocation for you.

?

> Small bug fixes
>   These should be limited to 20 lines of code or less, not including
>   tests.  You are still expected to understand and explain your changes
>   and the rationale behind them.
>
> These boundaries do not apply to other uses of AI, such as researching
> APIs or algorithms, static analysis, or debugging, provided their output
> is not included in contributions.  Larger uses of AI are allowed as an
> experiment, but they should be agreed upon with the maintainer prior
> to submission.
>
> Use of AI does not remove the need for authors to comply with all other
> requirements for contribution.  In particular, the "Signed-off-by"
> label in a patch submission is a statement that the author takes
> responsibility for the entire contents of the patch, certifying that
> their patch submission is made in accordance with the rules of the
> `Developer's Certificate of Origin (DCO) <dco>`.
>
> Commit messages for AI-assisted changes
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> When AI/LLM tools produce or substantively shape your patch, add an
> ``AI-used-for:`` trailer.  The text of the trailer could be one or
> more of ``code``, ``tests``, ``docs``, ``research``, possibly followed
> by an explanation in parentheses::
>
>     AI-used-for: tests, docs
>     AI-used-for: code
>     AI-used-for: code (refactoring)
>     AI-used-for: code (prototype)
>     AI-used-for: research
>
> The trailer is intended as a clarification of your DCO obligations as
> well as to guide reviewers.  It is not intended for minimal presence
> such as autocomplete or asking for a pre-review of the patch, and it
> does not remove your responsibility to understand the changes that you
> are submitting.
>
> Include the prompt in the commit message if it helps a reviewer judge
> the result:
>
> * yes: "move field ``foo`` from ``struct aa`` to ``struct bb``.  If a
>   function already has a local variable or parameter of type ``struct
>   bb``, use it instead of accessing ``aa.bb``."
>
> * yes: "add an implementation of the trait for ``Mutex<T: MyTrait>``,
>   forwarding the member functions to ``T`` while taking the lock
>   around the calls".
>
> * no: "write user-facing documentation for the new tool"
>
> * no: "write testcases for the new functions"
>
> Deterministic tooling (sed, coccinelle, formatters) is out of scope
> for the trailer, but should be mentioned in the commit message.

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-27 10:43               ` Alex Bennée
@ 2026-05-27 12:49                 ` Kevin Wolf
  0 siblings, 0 replies; 59+ messages in thread
From: Kevin Wolf @ 2026-05-27 12:49 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Paolo Bonzini, Warner Losh, Michael S. Tsirkin, qemu-devel,
	stefanha

Am 27.05.2026 um 12:43 hat Alex Bennée geschrieben:
> Paolo Bonzini <pbonzini@redhat.com> writes:
> 
> > On 5/27/26 10:41, Kevin Wolf wrote:
> >> Am 26.05.2026 um 21:52 hat Warner Losh geschrieben:
> >>> The QEMU Project currently may accept limited uses of AI that produce
> >>> high quality patches that are limited in the creative content added.
> >>> While maintainers will ultimately decide, changes like the following
> >>> fall within this policy
> >>> 1. Fixing obvious warnings in the obvious ways suggested by the tool
> >>> 2. Tree wide API changes, and other similar mechanical changes done
> >>>     today with perl/python/sed/coccinelle
> >> As I said in the paragraph you quoted below, I don't think we should
> >> encourage using AI for tasks that a deterministic tool could do.
> >
> > In some cases such a tool does not exist.  Much to my surprise, there
> > is no tool to do static type inference on Python code, but AI is very
> > good at doing it.
> >
> >> Letting AI perform the change directly instead may be an acceptable
> >> shortcut for a one-man hobby project that nobody else will ever look at,
> >> but in the context of a community project like QEMU in which your
> >> changes have to be reviewed and understood by others, it matters a lot
> >> that the output of the tool is reproducible. Otherwise, you're creating
> >> unnecessary work for others, and that isn't acceptable.
> >
> > When applicable, going through coccinelle (with the aid of AI if
> > needed! is indeed a good middle ground as it helps reviewers for large
> > changes. If you have many slightly different but easily separated
> > changes (e.g. you can split the patch by struct field), it may make
> > things worse.
> >
> > Its also worth noting that in other cases even sed or coccinelle,
> > while deterministic, cannot produce 100% of the patch.
> >
> >> So maybe we should even explicitly mention a recommendation like the
> >> following:
> >>      If you can use a deterministic tool, don't use AI instead. If
> >> you
> >>      don't know how to use the deterministic tool, use the AI to tell you
> >>      how to use it instead of trying to replace it.
> >
> > I like it.
> >
> >>> 3. Limited, small changes to fix bugs or add a small new feature whose
> >>>     scope is less than about 100 lines and the originator can explain
> >>>     them all or the meta issues about the patch.
> >> Not sure if mentioning a number of lines is wise. 100 lines can be
> >> mostly boilerplate and simple sequential code or they can be a deeply
> >> nested complex algorithm.
> >
> > I'd put the threshold at 20-50 at most.
> >
> >> I think I would see more use in a tag like (better name welcome):
> >>     AI-used-for: [code|tests|docs|commit message]...
> >
> > I like this *a lot*.  No need for free advertisement, but some
> > traceability is useful.
> >
> > For tools such as sed or coccinelle, having the exact script in the
> > patch or commit message useful.  Plus, the execution of the script
> > more or lesss delimits the commit by itself (or 90%+ of it).  For LLMs
> > it's a bit less clear cut because separating docs makes little sense.
> > And the exact model is pointless, it will be obsolete in 6 months and
> > provide no useful information.
> >
> > So, something like:
> >
> > ------------------- 8< -------------------
> > Use of AI-generated content
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > The QEMU project currently allows using AI/LLM tools to produce
> > patches in scenarios with limited creative content:
> >
> > Mechanical changes
> >   If you can use a deterministic tool or a script, don't use AI instead.
> >   If you don't know how to do the change deterministically, you may
> >   ask the AI for help, rather than having it stand in for the tools.
> 
> I like the idea of pointing people towards tools but I wouldn't be quite
> so prescriptive. The series MST referred to was easily eyeball-able and
> I suspect the extra steps would generate friction for contributions.
> That said the wider the change to the code base the more likely a random
> hallucination can get lost in the noise.
> 
> Maybe:
> 
>   Mechanical changes
>     Using AI tools to make simple mechanical changes is allowed. For larger
>     tree-wide changes it is strongly recommended to use a deterministic
>     tool like `sed` or `coccinelle`. You can use AI to help you craft the
>     invocation for you.

I think we do want to discourage the direct use of AI in such cases,
while not outright banning it. So maybe just a minor tweak to Paolo's
wording?

    Mechanical changes
      If you can use a deterministic tool or a script, it is preferred
      that you use it and not replace it with AI. If you don't know how
      to do the change deterministically, you can ask the AI for help,
      rather than having it stand in for the tools.

Kevin



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-27 10:01             ` Paolo Bonzini
  2026-05-27 10:43               ` Alex Bennée
@ 2026-05-27 10:53               ` Kevin Wolf
  2026-05-27 12:33                 ` Paolo Bonzini
  2026-05-27 10:54               ` Alistair Francis
                                 ` (3 subsequent siblings)
  5 siblings, 1 reply; 59+ messages in thread
From: Kevin Wolf @ 2026-05-27 10:53 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Warner Losh, Michael S. Tsirkin, qemu-devel, stefanha

Am 27.05.2026 um 12:01 hat Paolo Bonzini geschrieben:
> On 5/27/26 10:41, Kevin Wolf wrote:
> > Am 26.05.2026 um 21:52 hat Warner Losh geschrieben:
> > > The QEMU Project currently may accept limited uses of AI that produce
> > > high quality patches that are limited in the creative content added.
> > > While maintainers will ultimately decide, changes like the following
> > > fall within this policy
> > > 1. Fixing obvious warnings in the obvious ways suggested by the tool
> > > 2. Tree wide API changes, and other similar mechanical changes done
> > >     today with perl/python/sed/coccinelle
> > 
> > As I said in the paragraph you quoted below, I don't think we should
> > encourage using AI for tasks that a deterministic tool could do.
> 
> In some cases such a tool does not exist.

Then it's not a task that a deterministic tool could do.

Of course, you can always write a new tool that does the exact thing you
want to change. But that's not what I was talking about here, I was
really talking about existing common tools.

> Much to my surprise, there is no tool to do static type inference on
> Python code, but AI is very good at doing it.

I think this is a special case that has a different balance anyway. When
reviewing such a patch, I would skim the change for the general approach
and if I like it, but checking for consistency and completeness is
something I would use mypy for - that is, a deterministic tool that can
verify the change. So I'd still use one, just at a different time.

(It actually also might be a rare instance where someone (TM) should
actually write the tool because it would be generally useful.)

> > Letting AI perform the change directly instead may be an acceptable
> > shortcut for a one-man hobby project that nobody else will ever look at,
> > but in the context of a community project like QEMU in which your
> > changes have to be reviewed and understood by others, it matters a lot
> > that the output of the tool is reproducible. Otherwise, you're creating
> > unnecessary work for others, and that isn't acceptable.
> 
> When applicable, going through coccinelle (with the aid of AI if needed! is
> indeed a good middle ground as it helps reviewers for large changes. If you
> have many slightly different but easily separated changes (e.g. you can
> split the patch by struct field), it may make things worse.
> 
> Its also worth noting that in other cases even sed or coccinelle, while
> deterministic, cannot produce 100% of the patch.

Agreed, it's all a case of "if possible, prefer this", not "you have to
do this 100% of the time".

> > So maybe we should even explicitly mention a recommendation like the
> > following:
> > 
> >      If you can use a deterministic tool, don't use AI instead. If you
> >      don't know how to use the deterministic tool, use the AI to tell you
> >      how to use it instead of trying to replace it.
> 
> I like it.
> 
> > > 3. Limited, small changes to fix bugs or add a small new feature whose
> > >     scope is less than about 100 lines and the originator can explain
> > >     them all or the meta issues about the patch.
> > 
> > Not sure if mentioning a number of lines is wise. 100 lines can be
> > mostly boilerplate and simple sequential code or they can be a deeply
> > nested complex algorithm.
> 
> I'd put the threshold at 20-50 at most.
> 
> > I think I would see more use in a tag like (better name welcome):
> > 
> >     AI-used-for: [code|tests|docs|commit message]...
> 
> I like this *a lot*.  No need for free advertisement, but some traceability
> is useful.
> 
> For tools such as sed or coccinelle, having the exact script in the patch or
> commit message useful.  Plus, the execution of the script more or lesss
> delimits the commit by itself (or 90%+ of it).  For LLMs it's a bit less
> clear cut because separating docs makes little sense.  And the exact model
> is pointless, it will be obsolete in 6 months and provide no useful
> information.
> 
> So, something like:
> 
> ------------------- 8< -------------------
> Use of AI-generated content
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> The QEMU project currently allows using AI/LLM tools to produce patches in
> scenarios with limited creative content:
> 
> Mechanical changes
>   If you can use a deterministic tool or a script, don't use AI instead.
>   If you don't know how to do the change deterministically, you may
>   ask the AI for help, rather than having it stand in for the tools.
> 
> Small bug fixes
>   These should be limited to 20 lines of code or less, not including
>   tests.  You are still expected to understand and explain your changes
>   and the rationale behind them.

I agree with "not including tests". But I think this would be more
consistent if we also add new tests (that come without a small bug fix
at the same time; either because the problem is already fixed or because
the fix is too complex to qualify) as another allowed category.

(To be honest, I'm a bit biased here because allowing tests is my single
biggest wish from an AI policy update.)

> These boundaries do not apply to other uses of AI, such as researching
> APIs or algorithms, static analysis, or debugging, provided their output
> is not included in contributions.  Larger uses of AI are allowed as an
> experiment, but they should be agreed upon with the maintainer prior to
> submission.
> 
> Use of AI does not remove the need for authors to comply with all other
> requirements for contribution.  In particular, the "Signed-off-by"
> label in a patch submission is a statement that the author takes
> responsibility for the entire contents of the patch, certifying that
> their patch submission is made in accordance with the rules of the
> `Developer's Certificate of Origin (DCO) <dco>`.
> 
> Commit messages for AI-assisted changes
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> When AI/LLM tools produce or substantively shape your patch, add an
> ``AI-used-for:`` trailer.  The text of the trailer could be one or more of
> ``code``, ``tests``, ``docs``, ``research``, possibly followed by an
> explanation in parentheses::

Include a category for commit messages, or are we expecting that commit
messages are always written by a human? If so, that should be explicit.

>     AI-used-for: tests, docs
>     AI-used-for: code
>     AI-used-for: code (refactoring)
>     AI-used-for: code (prototype)
>     AI-used-for: research
> 
> The trailer is intended as a clarification of your DCO obligations as well
> as to guide reviewers.  It is not intended for minimal presence such as
> autocomplete or asking for a pre-review of the patch, and it does not remove
> your responsibility to understand the changes that you are submitting.
> 
> Include the prompt in the commit message if it helps a reviewer judge the
> result:
> 
> * yes: "move field ``foo`` from ``struct aa`` to ``struct bb``.  If a
> function already has a local variable or parameter of type ``struct bb``,
> use it instead of accessing ``aa.bb``."
> 
> * yes: "add an implementation of the trait for ``Mutex<T: MyTrait>``,
> forwarding the member functions to ``T`` while taking the lock around the
> calls".
> 
> * no: "write user-facing documentation for the new tool"
> 
> * no: "write testcases for the new functions"
> 
> Deterministic tooling (sed, coccinelle, formatters) is out of scope for the
> trailer, but should be mentioned in the commit message.

Apart from the above comments, this looks good to me.

Kevin



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-27 10:53               ` Kevin Wolf
@ 2026-05-27 12:33                 ` Paolo Bonzini
  2026-05-27 12:43                   ` Michael S. Tsirkin
  0 siblings, 1 reply; 59+ messages in thread
From: Paolo Bonzini @ 2026-05-27 12:33 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Warner Losh, Michael S. Tsirkin, qemu-devel, stefanha

On 5/27/26 12:53, Kevin Wolf wrote:
> Am 27.05.2026 um 12:01 hat Paolo Bonzini geschrieben:
>> On 5/27/26 10:41, Kevin Wolf wrote:
>>> Am 26.05.2026 um 21:52 hat Warner Losh geschrieben:
>>>> The QEMU Project currently may accept limited uses of AI that produce
>>>> high quality patches that are limited in the creative content added.
>>>> While maintainers will ultimately decide, changes like the following
>>>> fall within this policy
>>>> 1. Fixing obvious warnings in the obvious ways suggested by the tool
>>>> 2. Tree wide API changes, and other similar mechanical changes done
>>>>      today with perl/python/sed/coccinelle
>>>
>>> As I said in the paragraph you quoted below, I don't think we should
>>> encourage using AI for tasks that a deterministic tool could do.
>>
>> In some cases such a tool does not exist.
> 
> Then it's not a task that a deterministic tool could do.

You have a point. :)

> [type annotations] might be a rare instance where someone (TM) should
> actually write the tool because it would be generally useful.

Agreed, especially the "someone" part.

>> Small bug fixes
>>    These should be limited to 20 lines of code or less, not including
>>    tests.  You are still expected to understand and explain your changes
>>    and the rationale behind them.
> 
> I agree with "not including tests". But I think this would be more
> consistent if we also add new tests (that come without a small bug fix
> at the same time; either because the problem is already fixed or because
> the fix is too complex to qualify) as another allowed category.

Yes, absolutely.  Can you propose a wording?

>> These boundaries do not apply to other uses of AI, such as researching
>> APIs or algorithms, static analysis, or debugging, provided their output
>> is not included in contributions.  Larger uses of AI are allowed as an
>> experiment, but they should be agreed upon with the maintainer prior
>> to submission.

Taking into account Alistair's input I'd rephrase as

The intention of these boundaries is to reduce the risk of maintainer 
burnout from AI contributions, as well as the risk to the project from 
unintentional copyright violations.  They do not apply to other uses of 
AI, such as researching APIs or algorithms, static analysis, or 
debugging, provided the model's output is not included in contributions.

If you wish to send large amounts of AI-generated changes, or any other 
contribution not in the above categories, please get in touch with the 
maintainer beforehand.

>> When AI/LLM tools produce or substantively shape your patch, add an
>> ``AI-used-for:`` trailer.  The text of the trailer could be one or more of
>> ``code``, ``tests``, ``docs``, ``research``, possibly followed by an
>> explanation in parentheses::
> 
> Include a category for commit messages, or are we expecting that commit
> messages are always written by a human? If so, that should be explicit.

Mostly, I don't think it matters.  A commit message written purely by an 
LLM is usually very bad.  A commit message edited with an LLM falls 
under this:

>> It is not intended for minimal presence such as
>> autocomplete or asking for a pre-review of the patch, and it does not remove
>> your responsibility to understand the changes that you are submitting.

Technically "research" shouldn't matter for the policy either, but it 
may be interesting to write it out, if AI usage was important enough to 
mention in the commit message.  Perhaps coccinelle scripts would fall 
under that as well.

Paolo



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-27 12:33                 ` Paolo Bonzini
@ 2026-05-27 12:43                   ` Michael S. Tsirkin
  0 siblings, 0 replies; 59+ messages in thread
From: Michael S. Tsirkin @ 2026-05-27 12:43 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Kevin Wolf, Warner Losh, qemu-devel, stefanha

On Wed, May 27, 2026 at 02:33:03PM +0200, Paolo Bonzini wrote:
> The intention of these boundaries is to reduce the risk of maintainer
> burnout from AI contributions, as well as the risk to the project from
> unintentional copyright violations.  They do not apply to other uses of AI,
> such as researching APIs or algorithms, static analysis, or debugging,
> provided the model's output is not included in contributions.

Although I will be frank, "static analysis" can induce maintainer
burnout just as easily) But I don't see what we can do about that.




^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-27 10:01             ` Paolo Bonzini
  2026-05-27 10:43               ` Alex Bennée
  2026-05-27 10:53               ` Kevin Wolf
@ 2026-05-27 10:54               ` Alistair Francis
  2026-05-27 14:21                 ` Warner Losh
  2026-05-27 14:11               ` Michael S. Tsirkin
                                 ` (2 subsequent siblings)
  5 siblings, 1 reply; 59+ messages in thread
From: Alistair Francis @ 2026-05-27 10:54 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kevin Wolf, Warner Losh, Michael S. Tsirkin, qemu-devel, stefanha

On Wed, May 27, 2026 at 8:02 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 5/27/26 10:41, Kevin Wolf wrote:
> > Am 26.05.2026 um 21:52 hat Warner Losh geschrieben:
> >> The QEMU Project currently may accept limited uses of AI that produce
> >> high quality patches that are limited in the creative content added.
> >> While maintainers will ultimately decide, changes like the following
> >> fall within this policy
> >> 1. Fixing obvious warnings in the obvious ways suggested by the tool
> >> 2. Tree wide API changes, and other similar mechanical changes done
> >>     today with perl/python/sed/coccinelle
> >
> > As I said in the paragraph you quoted below, I don't think we should
> > encourage using AI for tasks that a deterministic tool could do.
>
> In some cases such a tool does not exist.  Much to my surprise, there is
> no tool to do static type inference on Python code, but AI is very good
> at doing it.
>
> > Letting AI perform the change directly instead may be an acceptable
> > shortcut for a one-man hobby project that nobody else will ever look at,
> > but in the context of a community project like QEMU in which your
> > changes have to be reviewed and understood by others, it matters a lot
> > that the output of the tool is reproducible. Otherwise, you're creating
> > unnecessary work for others, and that isn't acceptable.
>
> When applicable, going through coccinelle (with the aid of AI if needed!
> is indeed a good middle ground as it helps reviewers for large changes.
> If you have many slightly different but easily separated changes (e.g.
> you can split the patch by struct field), it may make things worse.
>
> Its also worth noting that in other cases even sed or coccinelle, while
> deterministic, cannot produce 100% of the patch.
>
> > So maybe we should even explicitly mention a recommendation like the
> > following:
> >
> >      If you can use a deterministic tool, don't use AI instead. If you
> >      don't know how to use the deterministic tool, use the AI to tell you
> >      how to use it instead of trying to replace it.
>
> I like it.
>
> >> 3. Limited, small changes to fix bugs or add a small new feature whose
> >>     scope is less than about 100 lines and the originator can explain
> >>     them all or the meta issues about the patch.
> >
> > Not sure if mentioning a number of lines is wise. 100 lines can be
> > mostly boilerplate and simple sequential code or they can be a deeply
> > nested complex algorithm.
>
> I'd put the threshold at 20-50 at most.
>
> > I think I would see more use in a tag like (better name welcome):
> >
> >     AI-used-for: [code|tests|docs|commit message]...
>
> I like this *a lot*.  No need for free advertisement, but some
> traceability is useful.
>
> For tools such as sed or coccinelle, having the exact script in the
> patch or commit message useful.  Plus, the execution of the script more
> or lesss delimits the commit by itself (or 90%+ of it).  For LLMs it's a
> bit less clear cut because separating docs makes little sense.  And the
> exact model is pointless, it will be obsolete in 6 months and provide no
> useful information.
>
> So, something like:
>
> ------------------- 8< -------------------
> Use of AI-generated content
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> The QEMU project currently allows using AI/LLM tools to produce patches
> in scenarios with limited creative content:
>
> Mechanical changes
>    If you can use a deterministic tool or a script, don't use AI instead.
>    If you don't know how to do the change deterministically, you may
>    ask the AI for help, rather than having it stand in for the tools.
>
> Small bug fixes
>    These should be limited to 20 lines of code or less, not including
>    tests.  You are still expected to understand and explain your changes
>    and the rationale behind them.

Coming back to Peter's earlier comments and the Zig policy, one thing
we have in RISC-V is people are running AI tools against QEMU and the
RISC-V spec to identify places where we don't meet the spec. They then
write patches and submit them upstream. The patches appear human
written, so have been accepted.

A lot of the bugs found are corner cases that people aren't actually
hitting. From my use of AI review systems in the past, they do tend to
be very nit-picky. So it's not too hard to catch issues that users
won't actually hit and fix them. It's a valid fix, but easy to
inundate reviewers. If this process was entirely run by an LLM it
could be way too much.

So maybe we should add something here about don't send large numbers
of "small bug fix" patches. So someone doesn't point an AI at QEMU and
a spec and generate huge numbers of patches, all of which are just
small bug fixes.

Alistair

>
> These boundaries do not apply to other uses of AI, such as researching
> APIs or algorithms, static analysis, or debugging, provided their output
> is not included in contributions.  Larger uses of AI are allowed as an
> experiment, but they should be agreed upon with the maintainer prior to
> submission.
>
> Use of AI does not remove the need for authors to comply with all other
> requirements for contribution.  In particular, the "Signed-off-by"
> label in a patch submission is a statement that the author takes
> responsibility for the entire contents of the patch, certifying that
> their patch submission is made in accordance with the rules of the
> `Developer's Certificate of Origin (DCO) <dco>`.
>
> Commit messages for AI-assisted changes
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> When AI/LLM tools produce or substantively shape your patch, add an
> ``AI-used-for:`` trailer.  The text of the trailer could be one or more
> of ``code``, ``tests``, ``docs``, ``research``, possibly followed by an
> explanation in parentheses::
>
>      AI-used-for: tests, docs
>      AI-used-for: code
>      AI-used-for: code (refactoring)
>      AI-used-for: code (prototype)
>      AI-used-for: research
>
> The trailer is intended as a clarification of your DCO obligations as
> well as to guide reviewers.  It is not intended for minimal presence
> such as autocomplete or asking for a pre-review of the patch, and it
> does not remove your responsibility to understand the changes that you
> are submitting.
>
> Include the prompt in the commit message if it helps a reviewer judge
> the result:
>
> * yes: "move field ``foo`` from ``struct aa`` to ``struct bb``.  If a
> function already has a local variable or parameter of type ``struct
> bb``, use it instead of accessing ``aa.bb``."
>
> * yes: "add an implementation of the trait for ``Mutex<T: MyTrait>``,
> forwarding the member functions to ``T`` while taking the lock around
> the calls".
>
> * no: "write user-facing documentation for the new tool"
>
> * no: "write testcases for the new functions"
>
> Deterministic tooling (sed, coccinelle, formatters) is out of scope for
> the trailer, but should be mentioned in the commit message.
>
>


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-27 10:54               ` Alistair Francis
@ 2026-05-27 14:21                 ` Warner Losh
  2026-05-28  1:59                   ` Alistair Francis
  0 siblings, 1 reply; 59+ messages in thread
From: Warner Losh @ 2026-05-27 14:21 UTC (permalink / raw)
  To: Alistair Francis
  Cc: Paolo Bonzini, Kevin Wolf, Michael S. Tsirkin, qemu-devel,
	stefanha

[-- Attachment #1: Type: text/plain, Size: 8329 bytes --]

On Wed, May 27, 2026 at 4:54 AM Alistair Francis <alistair23@gmail.com>
wrote:

> On Wed, May 27, 2026 at 8:02 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
> >
> > On 5/27/26 10:41, Kevin Wolf wrote:
> > > Am 26.05.2026 um 21:52 hat Warner Losh geschrieben:
> > >> The QEMU Project currently may accept limited uses of AI that produce
> > >> high quality patches that are limited in the creative content added.
> > >> While maintainers will ultimately decide, changes like the following
> > >> fall within this policy
> > >> 1. Fixing obvious warnings in the obvious ways suggested by the tool
> > >> 2. Tree wide API changes, and other similar mechanical changes done
> > >>     today with perl/python/sed/coccinelle
> > >
> > > As I said in the paragraph you quoted below, I don't think we should
> > > encourage using AI for tasks that a deterministic tool could do.
> >
> > In some cases such a tool does not exist.  Much to my surprise, there is
> > no tool to do static type inference on Python code, but AI is very good
> > at doing it.
> >
> > > Letting AI perform the change directly instead may be an acceptable
> > > shortcut for a one-man hobby project that nobody else will ever look
> at,
> > > but in the context of a community project like QEMU in which your
> > > changes have to be reviewed and understood by others, it matters a lot
> > > that the output of the tool is reproducible. Otherwise, you're creating
> > > unnecessary work for others, and that isn't acceptable.
> >
> > When applicable, going through coccinelle (with the aid of AI if needed!
> > is indeed a good middle ground as it helps reviewers for large changes.
> > If you have many slightly different but easily separated changes (e.g.
> > you can split the patch by struct field), it may make things worse.
> >
> > Its also worth noting that in other cases even sed or coccinelle, while
> > deterministic, cannot produce 100% of the patch.
> >
> > > So maybe we should even explicitly mention a recommendation like the
> > > following:
> > >
> > >      If you can use a deterministic tool, don't use AI instead. If you
> > >      don't know how to use the deterministic tool, use the AI to tell
> you
> > >      how to use it instead of trying to replace it.
> >
> > I like it.
> >
> > >> 3. Limited, small changes to fix bugs or add a small new feature whose
> > >>     scope is less than about 100 lines and the originator can explain
> > >>     them all or the meta issues about the patch.
> > >
> > > Not sure if mentioning a number of lines is wise. 100 lines can be
> > > mostly boilerplate and simple sequential code or they can be a deeply
> > > nested complex algorithm.
> >
> > I'd put the threshold at 20-50 at most.
> >
> > > I think I would see more use in a tag like (better name welcome):
> > >
> > >     AI-used-for: [code|tests|docs|commit message]...
> >
> > I like this *a lot*.  No need for free advertisement, but some
> > traceability is useful.
> >
> > For tools such as sed or coccinelle, having the exact script in the
> > patch or commit message useful.  Plus, the execution of the script more
> > or lesss delimits the commit by itself (or 90%+ of it).  For LLMs it's a
> > bit less clear cut because separating docs makes little sense.  And the
> > exact model is pointless, it will be obsolete in 6 months and provide no
> > useful information.
> >
> > So, something like:
> >
> > ------------------- 8< -------------------
> > Use of AI-generated content
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > The QEMU project currently allows using AI/LLM tools to produce patches
> > in scenarios with limited creative content:
> >
> > Mechanical changes
> >    If you can use a deterministic tool or a script, don't use AI instead.
> >    If you don't know how to do the change deterministically, you may
> >    ask the AI for help, rather than having it stand in for the tools.
> >
> > Small bug fixes
> >    These should be limited to 20 lines of code or less, not including
> >    tests.  You are still expected to understand and explain your changes
> >    and the rationale behind them.
>
> Coming back to Peter's earlier comments and the Zig policy, one thing
> we have in RISC-V is people are running AI tools against QEMU and the
> RISC-V spec to identify places where we don't meet the spec. They then
> write patches and submit them upstream. The patches appear human
> written, so have been accepted.
>

I have a checklist for bsd-user changes that checks the common mistakes
around the lock_user family of interfaces. I've generated patches from
what claude found, but claude's patches were generally identical to
what I produced.


> A lot of the bugs found are corner cases that people aren't actually
> hitting. From my use of AI review systems in the past, they do tend to
> be very nit-picky. So it's not too hard to catch issues that users
> won't actually hit and fix them. It's a valid fix, but easy to
> inundate reviewers. If this process was entirely run by an LLM it
> could be way too much.
>

My experience is that claude found 150 or so "bugs". All were "legit" in
the sense the APIs were used wrong, but maybe 10 were actual critical
bugs that explained some, but not all, of the mysterious hangs we see.
My worry has been one of testing: how do I test it all? Or do I continue
to use the 'just build thousands of packages' as the acid test?


> So maybe we should add something here about don't send large numbers
> of "small bug fix" patches. So someone doesn't point an AI at QEMU and
> a spec and generate huge numbers of patches, all of which are just
> small bug fixes.
>

Wouldn't this concern fall under the general requirement to not send more
than a manageable number of patches at a time (like 50)? Or do you think
a lower number is warranted?

Warner


> Alistair
>
> >
> > These boundaries do not apply to other uses of AI, such as researching
> > APIs or algorithms, static analysis, or debugging, provided their output
> > is not included in contributions.  Larger uses of AI are allowed as an
> > experiment, but they should be agreed upon with the maintainer prior to
> > submission.
> >
> > Use of AI does not remove the need for authors to comply with all other
> > requirements for contribution.  In particular, the "Signed-off-by"
> > label in a patch submission is a statement that the author takes
> > responsibility for the entire contents of the patch, certifying that
> > their patch submission is made in accordance with the rules of the
> > `Developer's Certificate of Origin (DCO) <dco>`.
> >
> > Commit messages for AI-assisted changes
> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >
> > When AI/LLM tools produce or substantively shape your patch, add an
> > ``AI-used-for:`` trailer.  The text of the trailer could be one or more
> > of ``code``, ``tests``, ``docs``, ``research``, possibly followed by an
> > explanation in parentheses::
> >
> >      AI-used-for: tests, docs
> >      AI-used-for: code
> >      AI-used-for: code (refactoring)
> >      AI-used-for: code (prototype)
> >      AI-used-for: research
> >
> > The trailer is intended as a clarification of your DCO obligations as
> > well as to guide reviewers.  It is not intended for minimal presence
> > such as autocomplete or asking for a pre-review of the patch, and it
> > does not remove your responsibility to understand the changes that you
> > are submitting.
> >
> > Include the prompt in the commit message if it helps a reviewer judge
> > the result:
> >
> > * yes: "move field ``foo`` from ``struct aa`` to ``struct bb``.  If a
> > function already has a local variable or parameter of type ``struct
> > bb``, use it instead of accessing ``aa.bb``."
> >
> > * yes: "add an implementation of the trait for ``Mutex<T: MyTrait>``,
> > forwarding the member functions to ``T`` while taking the lock around
> > the calls".
> >
> > * no: "write user-facing documentation for the new tool"
> >
> > * no: "write testcases for the new functions"
> >
> > Deterministic tooling (sed, coccinelle, formatters) is out of scope for
> > the trailer, but should be mentioned in the commit message.
> >
> >
>

[-- Attachment #2: Type: text/html, Size: 10458 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-27 14:21                 ` Warner Losh
@ 2026-05-28  1:59                   ` Alistair Francis
  2026-05-28  5:06                     ` Michael S. Tsirkin
  0 siblings, 1 reply; 59+ messages in thread
From: Alistair Francis @ 2026-05-28  1:59 UTC (permalink / raw)
  To: Warner Losh
  Cc: Paolo Bonzini, Kevin Wolf, Michael S. Tsirkin, qemu-devel,
	stefanha

On Thu, May 28, 2026 at 12:21 AM Warner Losh <imp@bsdimp.com> wrote:
>
>
>
> On Wed, May 27, 2026 at 4:54 AM Alistair Francis <alistair23@gmail.com> wrote:
>>
>> On Wed, May 27, 2026 at 8:02 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>> >
>> > On 5/27/26 10:41, Kevin Wolf wrote:
>> > > Am 26.05.2026 um 21:52 hat Warner Losh geschrieben:
>> > >> The QEMU Project currently may accept limited uses of AI that produce
>> > >> high quality patches that are limited in the creative content added.
>> > >> While maintainers will ultimately decide, changes like the following
>> > >> fall within this policy
>> > >> 1. Fixing obvious warnings in the obvious ways suggested by the tool
>> > >> 2. Tree wide API changes, and other similar mechanical changes done
>> > >>     today with perl/python/sed/coccinelle
>> > >
>> > > As I said in the paragraph you quoted below, I don't think we should
>> > > encourage using AI for tasks that a deterministic tool could do.
>> >
>> > In some cases such a tool does not exist.  Much to my surprise, there is
>> > no tool to do static type inference on Python code, but AI is very good
>> > at doing it.
>> >
>> > > Letting AI perform the change directly instead may be an acceptable
>> > > shortcut for a one-man hobby project that nobody else will ever look at,
>> > > but in the context of a community project like QEMU in which your
>> > > changes have to be reviewed and understood by others, it matters a lot
>> > > that the output of the tool is reproducible. Otherwise, you're creating
>> > > unnecessary work for others, and that isn't acceptable.
>> >
>> > When applicable, going through coccinelle (with the aid of AI if needed!
>> > is indeed a good middle ground as it helps reviewers for large changes.
>> > If you have many slightly different but easily separated changes (e.g.
>> > you can split the patch by struct field), it may make things worse.
>> >
>> > Its also worth noting that in other cases even sed or coccinelle, while
>> > deterministic, cannot produce 100% of the patch.
>> >
>> > > So maybe we should even explicitly mention a recommendation like the
>> > > following:
>> > >
>> > >      If you can use a deterministic tool, don't use AI instead. If you
>> > >      don't know how to use the deterministic tool, use the AI to tell you
>> > >      how to use it instead of trying to replace it.
>> >
>> > I like it.
>> >
>> > >> 3. Limited, small changes to fix bugs or add a small new feature whose
>> > >>     scope is less than about 100 lines and the originator can explain
>> > >>     them all or the meta issues about the patch.
>> > >
>> > > Not sure if mentioning a number of lines is wise. 100 lines can be
>> > > mostly boilerplate and simple sequential code or they can be a deeply
>> > > nested complex algorithm.
>> >
>> > I'd put the threshold at 20-50 at most.
>> >
>> > > I think I would see more use in a tag like (better name welcome):
>> > >
>> > >     AI-used-for: [code|tests|docs|commit message]...
>> >
>> > I like this *a lot*.  No need for free advertisement, but some
>> > traceability is useful.
>> >
>> > For tools such as sed or coccinelle, having the exact script in the
>> > patch or commit message useful.  Plus, the execution of the script more
>> > or lesss delimits the commit by itself (or 90%+ of it).  For LLMs it's a
>> > bit less clear cut because separating docs makes little sense.  And the
>> > exact model is pointless, it will be obsolete in 6 months and provide no
>> > useful information.
>> >
>> > So, something like:
>> >
>> > ------------------- 8< -------------------
>> > Use of AI-generated content
>> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> >
>> > The QEMU project currently allows using AI/LLM tools to produce patches
>> > in scenarios with limited creative content:
>> >
>> > Mechanical changes
>> >    If you can use a deterministic tool or a script, don't use AI instead.
>> >    If you don't know how to do the change deterministically, you may
>> >    ask the AI for help, rather than having it stand in for the tools.
>> >
>> > Small bug fixes
>> >    These should be limited to 20 lines of code or less, not including
>> >    tests.  You are still expected to understand and explain your changes
>> >    and the rationale behind them.
>>
>> Coming back to Peter's earlier comments and the Zig policy, one thing
>> we have in RISC-V is people are running AI tools against QEMU and the
>> RISC-V spec to identify places where we don't meet the spec. They then
>> write patches and submit them upstream. The patches appear human
>> written, so have been accepted.
>
>
> I have a checklist for bsd-user changes that checks the common mistakes
> around the lock_user family of interfaces. I've generated patches from
> what claude found, but claude's patches were generally identical to
> what I produced.
>
>>
>> A lot of the bugs found are corner cases that people aren't actually
>> hitting. From my use of AI review systems in the past, they do tend to
>> be very nit-picky. So it's not too hard to catch issues that users
>> won't actually hit and fix them. It's a valid fix, but easy to
>> inundate reviewers. If this process was entirely run by an LLM it
>> could be way too much.
>
>
> My experience is that claude found 150 or so "bugs". All were "legit" in
> the sense the APIs were used wrong, but maybe 10 were actual critical
> bugs that explained some, but not all, of the mysterious hangs we see.

Yeah, that's exactly what I don't want, someone drive by sending all
150 "legit" bug fix patches.

> My worry has been one of testing: how do I test it all? Or do I continue
> to use the 'just build thousands of packages' as the acid test?
>
>>
>> So maybe we should add something here about don't send large numbers
>> of "small bug fix" patches. So someone doesn't point an AI at QEMU and
>> a spec and generate huge numbers of patches, all of which are just
>> small bug fixes.
>
>
> Wouldn't this concern fall under the general requirement to not send more
> than a manageable number of patches at a time (like 50)? Or do you think
> a lower number is warranted?

I could see someone reading the proposed wording and then sending one
20 line patch, then another, then another, then another, then another,
then another and then more. All while only sending "small patches",
but the ability to generate a large number.

It's not clear to me at least from the proposed wording that we should
discourage that.

Alistair


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-28  1:59                   ` Alistair Francis
@ 2026-05-28  5:06                     ` Michael S. Tsirkin
  2026-05-28  7:32                       ` Paolo Bonzini
  0 siblings, 1 reply; 59+ messages in thread
From: Michael S. Tsirkin @ 2026-05-28  5:06 UTC (permalink / raw)
  To: Alistair Francis
  Cc: Warner Losh, Paolo Bonzini, Kevin Wolf, qemu-devel, stefanha

On Thu, May 28, 2026 at 11:59:35AM +1000, Alistair Francis wrote:
> > My worry has been one of testing: how do I test it all? Or do I continue
> > to use the 'just build thousands of packages' as the acid test?
> >
> >>
> >> So maybe we should add something here about don't send large numbers
> >> of "small bug fix" patches. So someone doesn't point an AI at QEMU and
> >> a spec and generate huge numbers of patches, all of which are just
> >> small bug fixes.
> >
> >
> > Wouldn't this concern fall under the general requirement to not send more
> > than a manageable number of patches at a time (like 50)? Or do you think
> > a lower number is warranted?
> 
> I could see someone reading the proposed wording and then sending one
> 20 line patch, then another, then another, then another, then another,
> then another and then more. All while only sending "small patches",
> but the ability to generate a large number.
> 
> It's not clear to me at least from the proposed wording that we should
> discourage that.
> 
> Alistair


Maybe we shouldn't? It's far from trivial to split up functionality even
in 100 line self contained chunks, let alone 20. And reviewing
such small patches is *easy*.


-- 
MST



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-28  5:06                     ` Michael S. Tsirkin
@ 2026-05-28  7:32                       ` Paolo Bonzini
  0 siblings, 0 replies; 59+ messages in thread
From: Paolo Bonzini @ 2026-05-28  7:32 UTC (permalink / raw)
  To: Michael S. Tsirkin, Alistair Francis
  Cc: Warner Losh, Kevin Wolf, qemu-devel, stefanha

On 5/28/26 07:06, Michael S. Tsirkin wrote:
>> I could see someone reading the proposed wording and then sending one
>> 20 line patch, then another, then another, then another, then another,
>> then another and then more. All while only sending "small patches",
>> but the ability to generate a large number.
>>
>> It's not clear to me at least from the proposed wording that we should
>> discourage that.
> 
> Maybe we shouldn't? It's far from trivial to split up functionality even
> in 100 line self contained chunks, let alone 20. And reviewing
> such small patches is *easy*.

Bugfixes for something like TCG front-ends are usually well within 20 
lines, or even less:

     target/i386/tcg: fix decoding of MOVBE and CRC32 in 16-bit mode
  1 file changed, 10 insertions(+), 6 deletions(-)

     target/i386/tcg: fix typo in dpps/dppd instructions
  2 files changed, 4 insertions(+), 4 deletions(-)

     target/i386/tcg: fix a few instructions that do not support VEX.L=1
  1 file changed, 4 insertions(+), 4 deletions(-)

     target/i386/tcg: allow VEX in 16-bit protected mode
  1 file changed, 3 insertions(+), 7 deletions(-)

     target/i386/tcg: do not mark all SSE instructions as unaligned
  2 files changed, 9 insertions(+), 4 deletions(-)

     target/i386/tcg: do not leave non-arithmetic flags in CC_SRC after 
PUSHF
  1 file changed, 1 insertion(+), 2 deletions(-)

     target/i386/tcg: mark more instructions that are invalid in 64-bit mode
  1 file changed, 4 insertions(+), 4 deletions(-)

     target/i386/tcg: ignore V3 in 32-bit mode
  1 file changed, 1 insertion(+), 1 deletion(-)

     target/i386: Fix #GP error code for INT instructions
  1 file changed, 1 insertion(+), 1 deletion(-)

     target/i386/tcg: validate segment registers
  1 file changed, 6 insertions(+), 1 deletion(-)

     target/i386: Mark VPERMILPS as not valid with prefix 0
1 file changed, 1 insertion(+), 1 deletion(-)

     target/x86: Correctly handle invalid 0x0f 0xc7 0xxx insns
  1 file changed, 2 insertions(+)

     target/i386: fix x86_64 pushw op
  1 file changed, 1 insertion(+), 1 deletion(-)

     target/i386: fix width of third operand of VINSERTx128
  1 file changed, 2 insertions(+), 2 deletions(-)

     target/i386: fix TB exit logic in gen_movl_seg() when writing to SS
  1 file changed, 5 insertions(+), 2 deletions(-)

And reviewing them is not necessarily easy if they touch weird corner 
cases of the architecture.  The code change might match the intention 
but you still need to check the manual.  Or on the contrary, the error 
is apparent but the fix may be obscure, as in commit 5a2faa0a0a's 
single-line change:

-        [0x0e] = X86_OP_ENTRYr(PUSH,    E,f64),
+        [0x0e] = X86_OP_ENTRYr(PUSH,    E,d64),


These changes could still be welcome, but I suppose the maintainer would 
also prefer a heads-up about them.

Paolo



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-27 10:01             ` Paolo Bonzini
                                 ` (2 preceding siblings ...)
  2026-05-27 10:54               ` Alistair Francis
@ 2026-05-27 14:11               ` Michael S. Tsirkin
  2026-05-27 14:14               ` Warner Losh
  2026-05-27 16:39               ` Michael S. Tsirkin
  5 siblings, 0 replies; 59+ messages in thread
From: Michael S. Tsirkin @ 2026-05-27 14:11 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Kevin Wolf, Warner Losh, qemu-devel, stefanha

On Wed, May 27, 2026 at 12:01:10PM +0200, Paolo Bonzini wrote:
> > > 3. Limited, small changes to fix bugs or add a small new feature whose
> > >     scope is less than about 100 lines and the originator can explain
> > >     them all or the meta issues about the patch.
> > 
> > Not sure if mentioning a number of lines is wise. 100 lines can be
> > mostly boilerplate and simple sequential code or they can be a deeply
> > nested complex algorithm.
> 
> I'd put the threshold at 20-50 at most.

At most 50 lines added, right? OK.

-- 
MST



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-27 10:01             ` Paolo Bonzini
                                 ` (3 preceding siblings ...)
  2026-05-27 14:11               ` Michael S. Tsirkin
@ 2026-05-27 14:14               ` Warner Losh
  2026-05-27 14:51                 ` Kevin Wolf
  2026-05-27 16:05                 ` Paolo Bonzini
  2026-05-27 16:39               ` Michael S. Tsirkin
  5 siblings, 2 replies; 59+ messages in thread
From: Warner Losh @ 2026-05-27 14:14 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Kevin Wolf, Michael S. Tsirkin, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 6852 bytes --]

On Wed, May 27, 2026 at 4:01 AM Paolo Bonzini <pbonzini@redhat.com> wrote:

> On 5/27/26 10:41, Kevin Wolf wrote:
> > Am 26.05.2026 um 21:52 hat Warner Losh geschrieben:
> >> The QEMU Project currently may accept limited uses of AI that produce
> >> high quality patches that are limited in the creative content added.
> >> While maintainers will ultimately decide, changes like the following
> >> fall within this policy
> >> 1. Fixing obvious warnings in the obvious ways suggested by the tool
> >> 2. Tree wide API changes, and other similar mechanical changes done
> >>     today with perl/python/sed/coccinelle
> >
> > As I said in the paragraph you quoted below, I don't think we should
> > encourage using AI for tasks that a deterministic tool could do.
>
> In some cases such a tool does not exist.  Much to my surprise, there is
> no tool to do static type inference on Python code, but AI is very good
> at doing it.
>
> > Letting AI perform the change directly instead may be an acceptable
> > shortcut for a one-man hobby project that nobody else will ever look at,
> > but in the context of a community project like QEMU in which your
> > changes have to be reviewed and understood by others, it matters a lot
> > that the output of the tool is reproducible. Otherwise, you're creating
> > unnecessary work for others, and that isn't acceptable.
>
> When applicable, going through coccinelle (with the aid of AI if needed!
> is indeed a good middle ground as it helps reviewers for large changes.
> If you have many slightly different but easily separated changes (e.g.
> you can split the patch by struct field), it may make things worse.
>
> Its also worth noting that in other cases even sed or coccinelle, while
> deterministic, cannot produce 100% of the patch.
>
> > So maybe we should even explicitly mention a recommendation like the
> > following:
> >
> >      If you can use a deterministic tool, don't use AI instead. If you
> >      don't know how to use the deterministic tool, use the AI to tell you
> >      how to use it instead of trying to replace it.
>
> I like it.
>
> >> 3. Limited, small changes to fix bugs or add a small new feature whose
> >>     scope is less than about 100 lines and the originator can explain
> >>     them all or the meta issues about the patch.
> >
> > Not sure if mentioning a number of lines is wise. 100 lines can be
> > mostly boilerplate and simple sequential code or they can be a deeply
> > nested complex algorithm.
>
> I'd put the threshold at 20-50 at most.
>
> > I think I would see more use in a tag like (better name welcome):
> >
> >     AI-used-for: [code|tests|docs|commit message]...
>
> I like this *a lot*.  No need for free advertisement, but some
> traceability is useful.
>
> For tools such as sed or coccinelle, having the exact script in the
> patch or commit message useful.  Plus, the execution of the script more
> or lesss delimits the commit by itself (or 90%+ of it).  For LLMs it's a
> bit less clear cut because separating docs makes little sense.  And the
> exact model is pointless, it will be obsolete in 6 months and provide no
> useful information.
>
> So, something like:
>
> ------------------- 8< -------------------
> Use of AI-generated content
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> The QEMU project currently allows using AI/LLM tools to produce patches
> in scenarios with limited creative content:
>
> Mechanical changes
>    If you can use a deterministic tool or a script, don't use AI instead.
>    If you don't know how to do the change deterministically, you may
>    ask the AI for help, rather than having it stand in for the tools.
>
> Small bug fixes
>    These should be limited to 20 lines of code or less, not including
>    tests.  You are still expected to understand and explain your changes
>    and the rationale behind them.
>
> These boundaries do not apply to other uses of AI, such as researching
> APIs or algorithms, static analysis, or debugging, provided their output
> is not included in contributions.  Larger uses of AI are allowed as an
> experiment, but they should be agreed upon with the maintainer prior to
> submission.
>
> Use of AI does not remove the need for authors to comply with all other
> requirements for contribution.  In particular, the "Signed-off-by"
> label in a patch submission is a statement that the author takes
> responsibility for the entire contents of the patch, certifying that
> their patch submission is made in accordance with the rules of the
> `Developer's Certificate of Origin (DCO) <dco>`.
>
> Commit messages for AI-assisted changes
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> When AI/LLM tools produce or substantively shape your patch, add an
> ``AI-used-for:`` trailer.  The text of the trailer could be one or more
> of ``code``, ``tests``, ``docs``, ``research``, possibly followed by an
> explanation in parentheses::
>
>      AI-used-for: tests, docs
>      AI-used-for: code
>      AI-used-for: code (refactoring)
>      AI-used-for: code (prototype)
>      AI-used-for: research
>
> The trailer is intended as a clarification of your DCO obligations as
> well as to guide reviewers.  It is not intended for minimal presence
> such as autocomplete or asking for a pre-review of the patch, and it
> does not remove your responsibility to understand the changes that you
> are submitting.
>

Why invent something new here when Assisted-by: is used elsewhere
and is likely more familiar to other users.


> Include the prompt in the commit message if it helps a reviewer judge
> the result:
>
> * yes: "move field ``foo`` from ``struct aa`` to ``struct bb``.  If a
> function already has a local variable or parameter of type ``struct
> bb``, use it instead of accessing ``aa.bb``."
>
> * yes: "add an implementation of the trait for ``Mutex<T: MyTrait>``,
> forwarding the member functions to ``T`` while taking the lock around
> the calls".
>
> * no: "write user-facing documentation for the new tool"
>
> * no: "write testcases for the new functions"
>

I think this fundamentally misunderstands how AI tends to be use. It
usually is a long, iterative process that's become impossible to capture
"THE" prompt. The bsd-user changes under review now are the result
of months of memories, hundreds of interactions with claude, including
one argument about how things worked. It's OK for people trying to
"one shot" things, but it's been my experience having worked with
these tools extensively that "one shot" is cool for demos, but not
cool for code you have to use in anger.


> Deterministic tooling (sed, coccinelle, formatters) is out of scope for
> the trailer, but should be mentioned in the commit message.
>
>

[-- Attachment #2: Type: text/html, Size: 8291 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-27 14:14               ` Warner Losh
@ 2026-05-27 14:51                 ` Kevin Wolf
  2026-05-27 16:41                   ` Michael S. Tsirkin
  2026-05-27 16:05                 ` Paolo Bonzini
  1 sibling, 1 reply; 59+ messages in thread
From: Kevin Wolf @ 2026-05-27 14:51 UTC (permalink / raw)
  To: Warner Losh; +Cc: Paolo Bonzini, Michael S. Tsirkin, qemu-devel, stefanha

Am 27.05.2026 um 16:14 hat Warner Losh geschrieben:
> On Wed, May 27, 2026 at 4:01 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
> > Commit messages for AI-assisted changes
> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >
> > When AI/LLM tools produce or substantively shape your patch, add an
> > ``AI-used-for:`` trailer.  The text of the trailer could be one or more
> > of ``code``, ``tests``, ``docs``, ``research``, possibly followed by an
> > explanation in parentheses::
> >
> >      AI-used-for: tests, docs
> >      AI-used-for: code
> >      AI-used-for: code (refactoring)
> >      AI-used-for: code (prototype)
> >      AI-used-for: research
> >
> > The trailer is intended as a clarification of your DCO obligations as
> > well as to guide reviewers.  It is not intended for minimal presence
> > such as autocomplete or asking for a pre-review of the patch, and it
> > does not remove your responsibility to understand the changes that you
> > are submitting.
> 
> Why invent something new here when Assisted-by: is used elsewhere
> and is likely more familiar to other users.

Because Assisted-by: gives different information, which at least to me
isn't really interesting at all. It's much more interesting to me if the
code I'm looking at is generated, or if you only generated the tests.

Kevin



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-27 14:51                 ` Kevin Wolf
@ 2026-05-27 16:41                   ` Michael S. Tsirkin
  2026-05-27 16:50                     ` Kevin Wolf
  0 siblings, 1 reply; 59+ messages in thread
From: Michael S. Tsirkin @ 2026-05-27 16:41 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Warner Losh, Paolo Bonzini, qemu-devel, stefanha

On Wed, May 27, 2026 at 04:51:38PM +0200, Kevin Wolf wrote:
> Am 27.05.2026 um 16:14 hat Warner Losh geschrieben:
> > On Wed, May 27, 2026 at 4:01 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
> > > Commit messages for AI-assisted changes
> > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > >
> > > When AI/LLM tools produce or substantively shape your patch, add an
> > > ``AI-used-for:`` trailer.  The text of the trailer could be one or more
> > > of ``code``, ``tests``, ``docs``, ``research``, possibly followed by an
> > > explanation in parentheses::
> > >
> > >      AI-used-for: tests, docs
> > >      AI-used-for: code
> > >      AI-used-for: code (refactoring)
> > >      AI-used-for: code (prototype)
> > >      AI-used-for: research
> > >
> > > The trailer is intended as a clarification of your DCO obligations as
> > > well as to guide reviewers.  It is not intended for minimal presence
> > > such as autocomplete or asking for a pre-review of the patch, and it
> > > does not remove your responsibility to understand the changes that you
> > > are submitting.
> > 
> > Why invent something new here when Assisted-by: is used elsewhere
> > and is likely more familiar to other users.
> 
> Because Assisted-by: gives different information, which at least to me
> isn't really interesting at all. It's much more interesting to me if the
> code I'm looking at is generated, or if you only generated the tests.
> 
> Kevin

I personally am interested to know which models work better than others.
Contributions are about reputation not just code.  I'll learn which
models produce better output, just like I learn to trust specific
contributors better.

-- 
MST



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-27 16:41                   ` Michael S. Tsirkin
@ 2026-05-27 16:50                     ` Kevin Wolf
  2026-05-27 16:56                       ` Michael S. Tsirkin
                                         ` (2 more replies)
  0 siblings, 3 replies; 59+ messages in thread
From: Kevin Wolf @ 2026-05-27 16:50 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Warner Losh, Paolo Bonzini, qemu-devel, stefanha

Am 27.05.2026 um 18:41 hat Michael S. Tsirkin geschrieben:
> On Wed, May 27, 2026 at 04:51:38PM +0200, Kevin Wolf wrote:
> > Am 27.05.2026 um 16:14 hat Warner Losh geschrieben:
> > > On Wed, May 27, 2026 at 4:01 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
> > > > Commit messages for AI-assisted changes
> > > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > > >
> > > > When AI/LLM tools produce or substantively shape your patch, add an
> > > > ``AI-used-for:`` trailer.  The text of the trailer could be one or more
> > > > of ``code``, ``tests``, ``docs``, ``research``, possibly followed by an
> > > > explanation in parentheses::
> > > >
> > > >      AI-used-for: tests, docs
> > > >      AI-used-for: code
> > > >      AI-used-for: code (refactoring)
> > > >      AI-used-for: code (prototype)
> > > >      AI-used-for: research
> > > >
> > > > The trailer is intended as a clarification of your DCO obligations as
> > > > well as to guide reviewers.  It is not intended for minimal presence
> > > > such as autocomplete or asking for a pre-review of the patch, and it
> > > > does not remove your responsibility to understand the changes that you
> > > > are submitting.
> > > 
> > > Why invent something new here when Assisted-by: is used elsewhere
> > > and is likely more familiar to other users.
> > 
> > Because Assisted-by: gives different information, which at least to me
> > isn't really interesting at all. It's much more interesting to me if the
> > code I'm looking at is generated, or if you only generated the tests.
> 
> I personally am interested to know which models work better than others.
> Contributions are about reputation not just code.  I'll learn which
> models produce better output, just like I learn to trust specific
> contributors better.

You don't see how well the model worked. What you see is filtered by the
submitter, and the policy we're discussing is specifically made to make
sure that bad results never reach the list.

Even for things that do reach the list, Assisted-by: doesn't tell you
how much of the submission is AI-generated and it also doesn't tell you
if it's "I used model X and a simple prompt gave me the perfect result
in the first attempt" or "I used model X and it took me two days of back
and forth and eventually I just rewrote most of it, but there are a few
AI-generated lines left".

So what you should trust is the contributor, not an Assisted-by: tag.

Kevin



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-27 16:50                     ` Kevin Wolf
@ 2026-05-27 16:56                       ` Michael S. Tsirkin
  2026-05-27 17:06                       ` Michael S. Tsirkin
  2026-05-27 17:07                       ` Warner Losh
  2 siblings, 0 replies; 59+ messages in thread
From: Michael S. Tsirkin @ 2026-05-27 16:56 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Warner Losh, Paolo Bonzini, qemu-devel, stefanha

On Wed, May 27, 2026 at 06:50:14PM +0200, Kevin Wolf wrote:
> Am 27.05.2026 um 18:41 hat Michael S. Tsirkin geschrieben:
> > On Wed, May 27, 2026 at 04:51:38PM +0200, Kevin Wolf wrote:
> > > Am 27.05.2026 um 16:14 hat Warner Losh geschrieben:
> > > > On Wed, May 27, 2026 at 4:01 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
> > > > > Commit messages for AI-assisted changes
> > > > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > > > >
> > > > > When AI/LLM tools produce or substantively shape your patch, add an
> > > > > ``AI-used-for:`` trailer.  The text of the trailer could be one or more
> > > > > of ``code``, ``tests``, ``docs``, ``research``, possibly followed by an
> > > > > explanation in parentheses::
> > > > >
> > > > >      AI-used-for: tests, docs
> > > > >      AI-used-for: code
> > > > >      AI-used-for: code (refactoring)
> > > > >      AI-used-for: code (prototype)
> > > > >      AI-used-for: research
> > > > >
> > > > > The trailer is intended as a clarification of your DCO obligations as
> > > > > well as to guide reviewers.  It is not intended for minimal presence
> > > > > such as autocomplete or asking for a pre-review of the patch, and it
> > > > > does not remove your responsibility to understand the changes that you
> > > > > are submitting.
> > > > 
> > > > Why invent something new here when Assisted-by: is used elsewhere
> > > > and is likely more familiar to other users.
> > > 
> > > Because Assisted-by: gives different information, which at least to me
> > > isn't really interesting at all. It's much more interesting to me if the
> > > code I'm looking at is generated, or if you only generated the tests.
> > 
> > I personally am interested to know which models work better than others.
> > Contributions are about reputation not just code.  I'll learn which
> > models produce better output, just like I learn to trust specific
> > contributors better.
> 
> You don't see how well the model worked. What you see is filtered by the
> submitter, and the policy we're discussing is specifically made to make
> sure that bad results never reach the list.
> 
> Even for things that do reach the list, Assisted-by: doesn't tell you
> how much of the submission is AI-generated and it also doesn't tell you
> if it's "I used model X and a simple prompt gave me the perfect result
> in the first attempt" or "I used model X and it took me two days of back
> and forth and eventually I just rewrote most of it, but there are a few
> AI-generated lines left".
> 
> So what you should trust is the contributor, not an Assisted-by: tag.
> 
> Kevin

Well, AI-used-for research isn't really useful to me at all then.
Why do I care about research?

-- 
MST



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-27 16:50                     ` Kevin Wolf
  2026-05-27 16:56                       ` Michael S. Tsirkin
@ 2026-05-27 17:06                       ` Michael S. Tsirkin
  2026-05-27 17:15                         ` Warner Losh
  2026-05-27 17:07                       ` Warner Losh
  2 siblings, 1 reply; 59+ messages in thread
From: Michael S. Tsirkin @ 2026-05-27 17:06 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Warner Losh, Paolo Bonzini, qemu-devel, stefanha

On Wed, May 27, 2026 at 06:50:14PM +0200, Kevin Wolf wrote:
> Am 27.05.2026 um 18:41 hat Michael S. Tsirkin geschrieben:
> > On Wed, May 27, 2026 at 04:51:38PM +0200, Kevin Wolf wrote:
> > > Am 27.05.2026 um 16:14 hat Warner Losh geschrieben:
> > > > On Wed, May 27, 2026 at 4:01 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
> > > > > Commit messages for AI-assisted changes
> > > > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > > > >
> > > > > When AI/LLM tools produce or substantively shape your patch, add an
> > > > > ``AI-used-for:`` trailer.  The text of the trailer could be one or more
> > > > > of ``code``, ``tests``, ``docs``, ``research``, possibly followed by an
> > > > > explanation in parentheses::
> > > > >
> > > > >      AI-used-for: tests, docs
> > > > >      AI-used-for: code
> > > > >      AI-used-for: code (refactoring)
> > > > >      AI-used-for: code (prototype)
> > > > >      AI-used-for: research
> > > > >
> > > > > The trailer is intended as a clarification of your DCO obligations as
> > > > > well as to guide reviewers.  It is not intended for minimal presence
> > > > > such as autocomplete or asking for a pre-review of the patch, and it
> > > > > does not remove your responsibility to understand the changes that you
> > > > > are submitting.
> > > > 
> > > > Why invent something new here when Assisted-by: is used elsewhere
> > > > and is likely more familiar to other users.
> > > 
> > > Because Assisted-by: gives different information, which at least to me
> > > isn't really interesting at all. It's much more interesting to me if the
> > > code I'm looking at is generated, or if you only generated the tests.
> > 
> > I personally am interested to know which models work better than others.
> > Contributions are about reputation not just code.  I'll learn which
> > models produce better output, just like I learn to trust specific
> > contributors better.
> 
> You don't see how well the model worked. What you see is filtered by the
> submitter, and the policy we're discussing is specifically made to make
> sure that bad results never reach the list.
> 
> Even for things that do reach the list, Assisted-by: doesn't tell you
> how much of the submission is AI-generated and it also doesn't tell you
> if it's "I used model X and a simple prompt gave me the perfect result
> in the first attempt" or "I used model X and it took me two days of back
> and forth and eventually I just rewrote most of it, but there are a few
> AI-generated lines left".

I am capable of observing trends over multiple contributions from
multiple people.

> So what you should trust is the contributor, not an Assisted-by: tag.
> 
> Kevin

Both.

-- 
MST



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-27 17:06                       ` Michael S. Tsirkin
@ 2026-05-27 17:15                         ` Warner Losh
  0 siblings, 0 replies; 59+ messages in thread
From: Warner Losh @ 2026-05-27 17:15 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Kevin Wolf, Paolo Bonzini, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 3054 bytes --]

On Wed, May 27, 2026 at 11:06 AM Michael S. Tsirkin <mst@redhat.com> wrote:

> On Wed, May 27, 2026 at 06:50:14PM +0200, Kevin Wolf wrote:
> > Am 27.05.2026 um 18:41 hat Michael S. Tsirkin geschrieben:
> > > On Wed, May 27, 2026 at 04:51:38PM +0200, Kevin Wolf wrote:
> > > > Am 27.05.2026 um 16:14 hat Warner Losh geschrieben:
> > > > > On Wed, May 27, 2026 at 4:01 AM Paolo Bonzini <pbonzini@redhat.com>
> wrote:
> > > > > > Commit messages for AI-assisted changes
> > > > > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > > > > >
> > > > > > When AI/LLM tools produce or substantively shape your patch, add
> an
> > > > > > ``AI-used-for:`` trailer.  The text of the trailer could be one
> or more
> > > > > > of ``code``, ``tests``, ``docs``, ``research``, possibly
> followed by an
> > > > > > explanation in parentheses::
> > > > > >
> > > > > >      AI-used-for: tests, docs
> > > > > >      AI-used-for: code
> > > > > >      AI-used-for: code (refactoring)
> > > > > >      AI-used-for: code (prototype)
> > > > > >      AI-used-for: research
> > > > > >
> > > > > > The trailer is intended as a clarification of your DCO
> obligations as
> > > > > > well as to guide reviewers.  It is not intended for minimal
> presence
> > > > > > such as autocomplete or asking for a pre-review of the patch,
> and it
> > > > > > does not remove your responsibility to understand the changes
> that you
> > > > > > are submitting.
> > > > >
> > > > > Why invent something new here when Assisted-by: is used elsewhere
> > > > > and is likely more familiar to other users.
> > > >
> > > > Because Assisted-by: gives different information, which at least to
> me
> > > > isn't really interesting at all. It's much more interesting to me if
> the
> > > > code I'm looking at is generated, or if you only generated the tests.
> > >
> > > I personally am interested to know which models work better than
> others.
> > > Contributions are about reputation not just code.  I'll learn which
> > > models produce better output, just like I learn to trust specific
> > > contributors better.
> >
> > You don't see how well the model worked. What you see is filtered by the
> > submitter, and the policy we're discussing is specifically made to make
> > sure that bad results never reach the list.
> >
> > Even for things that do reach the list, Assisted-by: doesn't tell you
> > how much of the submission is AI-generated and it also doesn't tell you
> > if it's "I used model X and a simple prompt gave me the perfect result
> > in the first attempt" or "I used model X and it took me two days of back
> > and forth and eventually I just rewrote most of it, but there are a few
> > AI-generated lines left".
>
> I am capable of observing trends over multiple contributions from
> multiple people.
>

As the primary person landing commits on the FreeBSD github experiment,
I can say that I have observed trends over multiple committer and can
spot the ones using Claude + Opus 4.5 or 4.6.

Warner

[-- Attachment #2: Type: text/html, Size: 4094 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-27 16:50                     ` Kevin Wolf
  2026-05-27 16:56                       ` Michael S. Tsirkin
  2026-05-27 17:06                       ` Michael S. Tsirkin
@ 2026-05-27 17:07                       ` Warner Losh
  2 siblings, 0 replies; 59+ messages in thread
From: Warner Losh @ 2026-05-27 17:07 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Michael S. Tsirkin, Paolo Bonzini, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 4722 bytes --]

On Wed, May 27, 2026 at 10:50 AM Kevin Wolf <kwolf@redhat.com> wrote:

> Am 27.05.2026 um 18:41 hat Michael S. Tsirkin geschrieben:
> > On Wed, May 27, 2026 at 04:51:38PM +0200, Kevin Wolf wrote:
> > > Am 27.05.2026 um 16:14 hat Warner Losh geschrieben:
> > > > On Wed, May 27, 2026 at 4:01 AM Paolo Bonzini <pbonzini@redhat.com>
> wrote:
> > > > > Commit messages for AI-assisted changes
> > > > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > > > >
> > > > > When AI/LLM tools produce or substantively shape your patch, add an
> > > > > ``AI-used-for:`` trailer.  The text of the trailer could be one or
> more
> > > > > of ``code``, ``tests``, ``docs``, ``research``, possibly followed
> by an
> > > > > explanation in parentheses::
> > > > >
> > > > >      AI-used-for: tests, docs
> > > > >      AI-used-for: code
> > > > >      AI-used-for: code (refactoring)
> > > > >      AI-used-for: code (prototype)
> > > > >      AI-used-for: research
> > > > >
> > > > > The trailer is intended as a clarification of your DCO obligations
> as
> > > > > well as to guide reviewers.  It is not intended for minimal
> presence
> > > > > such as autocomplete or asking for a pre-review of the patch, and
> it
> > > > > does not remove your responsibility to understand the changes that
> you
> > > > > are submitting.
> > > >
> > > > Why invent something new here when Assisted-by: is used elsewhere
> > > > and is likely more familiar to other users.
> > >
> > > Because Assisted-by: gives different information, which at least to me
> > > isn't really interesting at all. It's much more interesting to me if
> the
> > > code I'm looking at is generated, or if you only generated the tests.
> >
> > I personally am interested to know which models work better than others.
> > Contributions are about reputation not just code.  I'll learn which
> > models produce better output, just like I learn to trust specific
> > contributors better.
>
> You don't see how well the model worked. What you see is filtered by the
> submitter, and the policy we're discussing is specifically made to make
> sure that bad results never reach the list.
>

You actually do. Bad results will 100% hit the list. Guaranteed. No policy
will stop that. Having the right tag, like the model used, will help train
reviewers
which ones work and which ones don't for their subset of the tree.

> Even for things that do reach the list, Assisted-by: doesn't tell you
> how much of the submission is AI-generated and it also doesn't tell you
> if it's "I used model X and a simple prompt gave me the perfect result
> in the first attempt" or "I used model X and it took me two days of back
> and forth and eventually I just rewrote most of it, but there are a few
> AI-generated lines left".
>

I covered that in my bsd-user cover letter. But in that case I used it
mostly
to move the code from upstream with proper attribution (yes, deterministic
tools could do that, but I wasted a week on trying to write them years ago
and AI is just "move this over" for me now). But then, after I started, I
had
Claude reivew them in thee style of reivews I'd gotten in the past, so it
developed a checklist to do the reviews and at first I had those separate
since they were a joint effort (mostly Claude finding the issue which was
obvious to fix, but I had it re-review my fixes and/or suggest its own). I
merged them after advice that said basically "do what you do for human
reviews: fold them back" so I did that in subsequent reviews.

Also, I've started seeing claude generated submissions for FreeBSD
pull requests. I recognize its style since I've interacted with it so much.
I also recognize other, unknown to me, styles that have different quirks
that I know Claude doesn't do, but that other models do do. I wish I
had the raw data to know which AI tools and models were used since
that cues me to look for certain things (with claude it tends to be it's
insane insistance for printf'ing what's going on to an annoying degree).
And there people have been trying to sneak things in since a preliminary
"no ai" policy leaked out that was never formally approved. I have to
guess at it.

> So what you should trust is the contributor, not an Assisted-by: tag.
>

You should use both to form your opinion. It's not an either-or situation.
I've had conversations with the AI contributors in FreeBSD about how
they use it and tend to ask more questions at review than when AI isn't
used. Knowing what AI was used will help me delve into different things
that I might not otherwise poke at since I know the quirks of at least
a couple tools by now.

Warner

[-- Attachment #2: Type: text/html, Size: 6247 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-27 14:14               ` Warner Losh
  2026-05-27 14:51                 ` Kevin Wolf
@ 2026-05-27 16:05                 ` Paolo Bonzini
  2026-05-27 16:48                   ` Michael S. Tsirkin
  1 sibling, 1 reply; 59+ messages in thread
From: Paolo Bonzini @ 2026-05-27 16:05 UTC (permalink / raw)
  To: Warner Losh; +Cc: Kevin Wolf, Michael S. Tsirkin, qemu-devel, stefanha

On 5/27/26 16:14, Warner Losh wrote:
> Why invent something new here when Assisted-by: is used elsewhere
> and is likely more familiar to other users.

Because Assisted-by was invented by AI companies to get free 
advertisement.  (It's not just me; see 
https://akselmo.dev/posts/stop-advertising-in-your-commits/ for an 
example).  Also, it does not answer any interesting question.

Not using it is also a good way to sieve people who didn't bother to 
read the policy, now that I think about it.

>     * yes: "move field ``foo`` from ``struct aa`` to ``struct bb``.  If a
>     function already has a local variable or parameter of type ``struct
>     bb``, use it instead of accessing ``aa.bb <http://aa.bb>``."
> 
>     * yes: "add an implementation of the trait for ``Mutex<T: MyTrait>``,
>     forwarding the member functions to ``T`` while taking the lock around
>     the calls".
> 
>     * no: "write user-facing documentation for the new tool"
> 
>     * no: "write testcases for the new functions"
> 
> I think this fundamentally misunderstands how AI tends to be use. It
> usually is a long, iterative process that's become impossible to capture
> "THE" prompt.

Sure, but remember that these rules are for the cases listed above: 
tests, mechanical changes, <20 line fixes.  Even for mechanical changes, 
using AI does not remove the need to separate commits, therefore having 
small "one shottable" prompts (50 prompts for planning + 20-ish patches) 
is a plausible way to proceed.

For example large parts of 
https://lore.kernel.org/kvm/20260511150648.685374-1-pbonzini@redhat.com/ 
were done as a ~30 minutes plan mode conversation followed by ~1.5 hours 
of mostly one-shotted prompts.  Here are some real examples of the prompts:

"ok, i think we're done. double check, then prepare for changing 
walk_mmu to struct kvm_pagewalk (initializing it with .w)"

"rename walk_mmu to cpu_walk"

"next step is changing nested_mmu from `struct kvm_mmu` to `struct 
kvm_pagewalk"

"ok! now we have to change cpu_walk to not be a pointer. 
init_kvm_nested_cpu_walk becomes init_kvm_cpu_walk and is called always 
in kvm_init_mmu. init_kvm_tdp_mmu stops initializing context->w.  this 
won't be all of it but it's a start"

"move towards removing explicit access to w when the mmu is known, for 
example when initializing shadow EPT/NPT you want to use tdp_walk 
instead of w. later on we will figure out whether to 1) remove w 2) add 
it back to struct kvm_mmu but this time as a pointer 3) do something 
like mmu == guest_mmu ? tdp_walk : cpu_walk"

"pull all the permissions stuff into a separate struct kvm_page_format"
[here the LLM wanted to clarify what's the "permissions stuff" :)]
"good analysis - leave cpu_role aside, and put the others (which are 
used by e.g. permission_fault()) in the new struct"

"great, merge struct rsvd_bits_validate into kvm_page_format (changing 
`struct rsvd_bits_validate shadow_zero_check` to `struct 
kvm_page_format` while (for now) ignoring the initialization of other 
fields)"

If you don't work like that fine, it's not mandatory.

Paolo

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-27 16:05                 ` Paolo Bonzini
@ 2026-05-27 16:48                   ` Michael S. Tsirkin
  2026-05-27 16:57                     ` Warner Losh
  0 siblings, 1 reply; 59+ messages in thread
From: Michael S. Tsirkin @ 2026-05-27 16:48 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Warner Losh, Kevin Wolf, qemu-devel, stefanha

On Wed, May 27, 2026 at 06:05:47PM +0200, Paolo Bonzini wrote:
> On 5/27/26 16:14, Warner Losh wrote:
> > Why invent something new here when Assisted-by: is used elsewhere
> > and is likely more familiar to other users.
> 
> Because Assisted-by was invented by AI companies to get free advertisement.

Jonathan Corbet is in the pocket of AI companies? Or Sasha Levin?
https://lore.kernel.org/lkml/20251223122110.2496946-1-sashal@kernel.org/
https://lore.kernel.org/lkml/877bqtlzug.fsf@trenco.lwn.net/

> (It's not just me; see
> https://akselmo.dev/posts/stop-advertising-in-your-commits/ for an example).

Does not impress me as being either super informed or super
professional.

> Also, it does not answer any interesting question.

It does for me - I will learn which models are more likely to produce
bad slop.

> Not using it is also a good way to sieve people who didn't bother to read
> the policy, now that I think about it.

In my experience, AI tools add "Co-developed-by:" tags by default. One
has to read the policy to add Assisted-by.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-27 16:48                   ` Michael S. Tsirkin
@ 2026-05-27 16:57                     ` Warner Losh
  2026-05-27 17:05                       ` Michael S. Tsirkin
  2026-05-27 17:48                       ` Paolo Bonzini
  0 siblings, 2 replies; 59+ messages in thread
From: Warner Losh @ 2026-05-27 16:57 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Paolo Bonzini, Kevin Wolf, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 1737 bytes --]

On Wed, May 27, 2026 at 10:48 AM Michael S. Tsirkin <mst@redhat.com> wrote:

> On Wed, May 27, 2026 at 06:05:47PM +0200, Paolo Bonzini wrote:
> > On 5/27/26 16:14, Warner Losh wrote:
> > > Why invent something new here when Assisted-by: is used elsewhere
> > > and is likely more familiar to other users.
> >
> > Because Assisted-by was invented by AI companies to get free
> advertisement.
>
> Jonathan Corbet is in the pocket of AI companies? Or Sasha Levin?
> https://lore.kernel.org/lkml/20251223122110.2496946-1-sashal@kernel.org/
> https://lore.kernel.org/lkml/877bqtlzug.fsf@trenco.lwn.net/


Yea. Assisted-by was not invented by AI companies. That's just rubbish.

See https://github.com/anthropics/claude-code/issues/36105 for the open
issue with Claude.

> (It's not just me; see
> > https://akselmo.dev/posts/stop-advertising-in-your-commits/ for an
> example).
>
> Does not impress me as being either super informed or super
> professional.
>

Yea, seems crazy.


> > Also, it does not answer any interesting question.
>
> It does for me - I will learn which models are more likely to produce
> bad slop.
>

Same. I can tell the difference in the degree of slop between the different
models
and have sometimes selected the older one over the newer one because it does
a better job.


> > Not using it is also a good way to sieve people who didn't bother to read
> > the policy, now that I think about it.
>
> In my experience, AI tools add "Co-developed-by:" tags by default. One
> has to read the policy to add Assisted-by.
>

Co-authored-by: you mean? That's what claude adds for me. I had to take
extra effort to add the new Assisted-by: to my patch review.

Warner

[-- Attachment #2: Type: text/html, Size: 3174 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-27 16:57                     ` Warner Losh
@ 2026-05-27 17:05                       ` Michael S. Tsirkin
  2026-05-27 17:48                       ` Paolo Bonzini
  1 sibling, 0 replies; 59+ messages in thread
From: Michael S. Tsirkin @ 2026-05-27 17:05 UTC (permalink / raw)
  To: Warner Losh; +Cc: Paolo Bonzini, Kevin Wolf, qemu-devel, stefanha

On Wed, May 27, 2026 at 10:57:09AM -0600, Warner Losh wrote:
> 
> 
> On Wed, May 27, 2026 at 10:48 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> 
>     On Wed, May 27, 2026 at 06:05:47PM +0200, Paolo Bonzini wrote:
>     > On 5/27/26 16:14, Warner Losh wrote:
>     > > Why invent something new here when Assisted-by: is used elsewhere
>     > > and is likely more familiar to other users.
>     >
>     > Because Assisted-by was invented by AI companies to get free
>     advertisement.
> 
>     Jonathan Corbet is in the pocket of AI companies? Or Sasha Levin?
>     https://lore.kernel.org/lkml/20251223122110.2496946-1-sashal@kernel.org/
>     https://lore.kernel.org/lkml/877bqtlzug.fsf@trenco.lwn.net/
> 
> 
> Yea. Assisted-by was not invented by AI companies. That's just rubbish.
> 
> See https://github.com/anthropics/claude-code/issues/36105 for the open
> issue with Claude.
> 
> 
>     > (It's not just me; see
>     > https://akselmo.dev/posts/stop-advertising-in-your-commits/ for an
>     example).
> 
>     Does not impress me as being either super informed or super
>     professional.
> 
> 
> Yea, seems crazy.
>  
> 
>     > Also, it does not answer any interesting question.
> 
>     It does for me - I will learn which models are more likely to produce
>     bad slop.
> 
> 
> Same. I can tell the difference in the degree of slop between the different
> models
> and have sometimes selected the older one over the newer one because it does
> a better job.
>  
> 
>     > Not using it is also a good way to sieve people who didn't bother to read
>     > the policy, now that I think about it.
> 
>     In my experience, AI tools add "Co-developed-by:" tags by default. One
>     has to read the policy to add Assisted-by.
> 
> 
> Co-authored-by: you mean? That's what claude adds for me. I had to take
> extra effort to add the new Assisted-by: to my patch review. 
> 
> Warner

Right.



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-27 16:57                     ` Warner Losh
  2026-05-27 17:05                       ` Michael S. Tsirkin
@ 2026-05-27 17:48                       ` Paolo Bonzini
  1 sibling, 0 replies; 59+ messages in thread
From: Paolo Bonzini @ 2026-05-27 17:48 UTC (permalink / raw)
  To: Warner Losh; +Cc: Michael S. Tsirkin, Kevin Wolf, qemu-devel, stefanha

On Wed, May 27, 2026 at 6:57 PM Warner Losh <imp@bsdimp.com> wrote:
> On Wed, May 27, 2026 at 10:48 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>> On Wed, May 27, 2026 at 06:05:47PM +0200, Paolo Bonzini wrote:
>> > On 5/27/26 16:14, Warner Losh wrote:
>> > > Why invent something new here when Assisted-by: is used elsewhere
>> > > and is likely more familiar to other users.
>> >
>> > Because Assisted-by was invented by AI companies to get free advertisement.
>>
>> Jonathan Corbet is in the pocket of AI companies? Or Sasha Levin?
>> https://lore.kernel.org/lkml/20251223122110.2496946-1-sashal@kernel.org/
>> https://lore.kernel.org/lkml/877bqtlzug.fsf@trenco.lwn.net/
>
> Yea. Assisted-by was not invented by AI companies. That's just rubbish.

I stand corrected - the *format* of the tag with the model name was
invented by AI companies, for example Anthropic who embedded it in
Claude Code. Likewise for the robot emoji at the end of GitHub PRs.

It does not matter whether it's Co-authored-by, Generated-by,
Assisted-by.  It's the new "Sent from my iPhone" and neither provides
useful information to me as a reviewer, even if it ends up in a Linux
documentation file.

If the submitter uses a "bad model" and is not able to correct that,
it's a submitter problem. I'll watch out for the submitter even after
they switch to a "good model", because they're not applying critical
thinking to the AI's output. On the contrary, if the submitter uses a
"good model" for scaffolding and does awesome work on top, I don't
think I should penalize the submitter.

> See https://github.com/anthropics/claude-code/issues/36105 for the open
> issue with Claude.

Good luck with that - Anthropic uses Co-authored-by so that the
authors show up on GitHub as "foo and claude"[1]. It's a *blatant*
marketing ploy, just like "Co-authored-by: Coke
<nobody@coca-cola.com>" would be.

For what it's worth, Claude agrees. When asked to "commit, but remove
the tag as I don't want to give out free ad space" it says this:
"There is one legitimate non-advertising function, and it's worth
naming precisely so we don't overcorrect: disclosure where a project's
policy requires it. Some projects now want to know whether code was
AI-assisted — for DCO/sign-off validity, copyright-provenance reasons,
license concerns. That's a real maintainer interest. But notice it
doesn't rescue either default:  1. The interest is in the fact of AI
assistance, not which model. The model name and version add nothing to
that. 2. Disclosure should track the project's policy and be the
submitter's conscious act — a line in the cover letter, written
deliberately — not a trailer the tool injects regardless of whether
the project cares. A non-adaptive default isn't disclosure, it's a
watermark".

Thanks,

Paolo

[1] see https://github.com/wiktor-k/ssh-agent-lib/pull/98/changes/a38760b6b4
for an example

>> > Not using it is also a good way to sieve people who didn't bother to read
>> > the policy, now that I think about it.
>>
>> In my experience, AI tools add "Co-developed-by:" tags by default. One
>> has to read the policy to add Assisted-by.
>
> Co-authored-by: you mean? That's what claude adds for me. I had to take
> extra effort to add the new Assisted-by: to my patch review.
>
> Warner

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-27 10:01             ` Paolo Bonzini
                                 ` (4 preceding siblings ...)
  2026-05-27 14:14               ` Warner Losh
@ 2026-05-27 16:39               ` Michael S. Tsirkin
  5 siblings, 0 replies; 59+ messages in thread
From: Michael S. Tsirkin @ 2026-05-27 16:39 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Kevin Wolf, Warner Losh, qemu-devel, stefanha

On Wed, May 27, 2026 at 12:01:10PM +0200, Paolo Bonzini wrote:
> Commit messages for AI-assisted changes
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> When AI/LLM tools produce or substantively shape your patch, add an
> ``AI-used-for:`` trailer.  The text of the trailer could be one or more of
> ``code``, ``tests``, ``docs``, ``research``, possibly followed by an
> explanation in parentheses::
> 
>     AI-used-for: tests, docs
>     AI-used-for: code
>     AI-used-for: code (refactoring)
>     AI-used-for: code (prototype)
>     AI-used-for: research
> 
> The trailer is intended as a clarification of your DCO obligations as well
> as to guide reviewers.  It is not intended for minimal presence such as
> autocomplete or asking for a pre-review of the patch, and it does not remove
> your responsibility to understand the changes that you are submitting.
> 
> Include the prompt in the commit message if it helps a reviewer judge the
> result:
> 
> * yes: "move field ``foo`` from ``struct aa`` to ``struct bb``.  If a
> function already has a local variable or parameter of type ``struct bb``,
> use it instead of accessing ``aa.bb``."
> 
> * yes: "add an implementation of the trait for ``Mutex<T: MyTrait>``,
> forwarding the member functions to ``T`` while taking the lock around the
> calls".
> 
> * no: "write user-facing documentation for the new tool"
> 
> * no: "write testcases for the new functions"

I don't understand what these yes/no examples are trying to show.
AI tools aren't really yet up to the task of generating a reasonable
qemu patchset from a single prompt. As a reviewer, I am not really
interested what kind of magic "think extra ultra hard" invocation was
used to coax the output from the model.

I actually would like to know which model was used, since I expect that
with time I'll learn to trust output from specific models more just
like I trust output from specific contributors more.

-- 
MST



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-26 18:59     ` Kevin Wolf
  2026-05-26 19:30       ` Michael S. Tsirkin
@ 2026-05-26 19:50       ` Michael S. Tsirkin
  2026-05-27  7:44         ` Kevin Wolf
  1 sibling, 1 reply; 59+ messages in thread
From: Michael S. Tsirkin @ 2026-05-26 19:50 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, stefanha

On Tue, May 26, 2026 at 08:59:55PM +0200, Kevin Wolf wrote:
> maybe practically speaking it has to be all or nothing in terms of
> creativity (for lack of a better word).

That's exactly what copyright is, right? creative expression.
So e.g. adding an include to make a file compile is not creative.

-- 
MST



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: on ai generated and code provenance
  2026-05-26 19:50       ` Michael S. Tsirkin
@ 2026-05-27  7:44         ` Kevin Wolf
  0 siblings, 0 replies; 59+ messages in thread
From: Kevin Wolf @ 2026-05-27  7:44 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel, stefanha

Am 26.05.2026 um 21:50 hat Michael S. Tsirkin geschrieben:
> On Tue, May 26, 2026 at 08:59:55PM +0200, Kevin Wolf wrote:
> > maybe practically speaking it has to be all or nothing in terms of
> > creativity (for lack of a better word).
> 
> That's exactly what copyright is, right? creative expression.
> So e.g. adding an include to make a file compile is not creative.

Yes, it's definitely similar to the question if something is
copyrightable or not. The threshold could be different in a project
specific policy, but that threshold is somewhat unclear anyway - and
that was my point, I'm not sure if it can be made clear.

It's easy to find examples that are clearly below the threshold (your
adding an #include to fix the build) and examples that are clearly above
it (say, a complex new device). But for a specific change somewhere in
the middle of the range, it can still be quite hard to tell.

Kevin

^ permalink raw reply	[flat|nested] 59+ messages in thread

end of thread, other threads:[~2026-05-28  7:33 UTC | newest]

Thread overview: 59+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-24 12:42 on ai generated and code provenance Michael S. Tsirkin
2026-05-24 17:06 ` Alex Bennée
2026-05-24 17:42   ` Michael S. Tsirkin
2026-05-24 18:26   ` Warner Losh
2026-05-24 20:04     ` Michael S. Tsirkin
2026-05-24 20:11   ` Michael S. Tsirkin
2026-05-24 20:44     ` Stefan Hajnoczi
2026-05-25 15:27       ` Stefan Hajnoczi
2026-05-25 16:32 ` Paolo Bonzini
2026-05-25 17:15   ` Warner Losh
2026-05-25 19:44     ` Stefan Hajnoczi
2026-05-25 22:36       ` Michael S. Tsirkin
2026-05-26 13:16         ` Stefan Hajnoczi
2026-05-25 19:56     ` Paolo Bonzini
2026-05-26 21:48     ` Philippe Mathieu-Daudé
2026-05-26  8:23   ` Peter Maydell
2026-05-26  9:28     ` Alex Bennée
2026-05-26  9:57     ` Paolo Bonzini
2026-05-26 11:27       ` BALATON Zoltan
2026-05-26 12:30         ` Michael S. Tsirkin
2026-05-26 12:37           ` Manos Pitsidianakis
2026-05-26 13:00             ` Michael S. Tsirkin
2026-05-26 13:22         ` Stefan Hajnoczi
2026-05-26 14:01           ` Warner Losh
2026-05-27  7:11     ` Philippe Mathieu-Daudé
2026-05-26 17:43 ` Kevin Wolf
2026-05-26 18:03   ` Michael S. Tsirkin
2026-05-26 18:59     ` Kevin Wolf
2026-05-26 19:30       ` Michael S. Tsirkin
2026-05-26 19:52         ` Warner Losh
2026-05-27  8:41           ` Kevin Wolf
2026-05-27 10:01             ` Paolo Bonzini
2026-05-27 10:43               ` Alex Bennée
2026-05-27 12:49                 ` Kevin Wolf
2026-05-27 10:53               ` Kevin Wolf
2026-05-27 12:33                 ` Paolo Bonzini
2026-05-27 12:43                   ` Michael S. Tsirkin
2026-05-27 10:54               ` Alistair Francis
2026-05-27 14:21                 ` Warner Losh
2026-05-28  1:59                   ` Alistair Francis
2026-05-28  5:06                     ` Michael S. Tsirkin
2026-05-28  7:32                       ` Paolo Bonzini
2026-05-27 14:11               ` Michael S. Tsirkin
2026-05-27 14:14               ` Warner Losh
2026-05-27 14:51                 ` Kevin Wolf
2026-05-27 16:41                   ` Michael S. Tsirkin
2026-05-27 16:50                     ` Kevin Wolf
2026-05-27 16:56                       ` Michael S. Tsirkin
2026-05-27 17:06                       ` Michael S. Tsirkin
2026-05-27 17:15                         ` Warner Losh
2026-05-27 17:07                       ` Warner Losh
2026-05-27 16:05                 ` Paolo Bonzini
2026-05-27 16:48                   ` Michael S. Tsirkin
2026-05-27 16:57                     ` Warner Losh
2026-05-27 17:05                       ` Michael S. Tsirkin
2026-05-27 17:48                       ` Paolo Bonzini
2026-05-27 16:39               ` Michael S. Tsirkin
2026-05-26 19:50       ` Michael S. Tsirkin
2026-05-27  7:44         ` Kevin Wolf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.