* on ai generated and code provenance
@ 2026-05-24 12:42 Michael S. Tsirkin
2026-05-24 17:06 ` Alex Bennée
` (2 more replies)
0 siblings, 3 replies; 59+ messages in thread
From: Michael S. Tsirkin @ 2026-05-24 12:42 UTC (permalink / raw)
To: qemu-devel; +Cc: stefanha
So, I had to reject a perfectly reasonable patch:
https://lore.kernel.org/qemu-devel/20260320193746.242704-1-jinpu.wang@ionos.com/
just because of a tool used to make it.
How contributors could comply with DCO terms (b) or (c) for the output of AI
content generators commonly available today is unclear. The QEMU project is
not willing or able to accept the legal risks of non-compliance.
But, since this was written, Red Hat's Richard Fontana and Chris Wright
published this piece:
https://www.redhat.com/en/blog/ai-assisted-development-and-open-source-navigating-legal-issues
Saying, in particular "
We understand this concern, but the DCO has never
been interpreted to require that every line of a contribution must be
the personal creative expression of the contributor or another human
developer.
"
I propose adopting linux's rules instead:
https://docs.kernel.org/process/coding-assistants.html
which boils down to attribution.
--
MST
^ permalink raw reply [flat|nested] 59+ messages in thread* Re: on ai generated and code provenance 2026-05-24 12:42 on ai generated and code provenance Michael S. Tsirkin @ 2026-05-24 17:06 ` Alex Bennée 2026-05-24 17:42 ` Michael S. Tsirkin ` (2 more replies) 2026-05-25 16:32 ` Paolo Bonzini 2026-05-26 17:43 ` Kevin Wolf 2 siblings, 3 replies; 59+ messages in thread From: Alex Bennée @ 2026-05-24 17:06 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: qemu-devel, stefanha "Michael S. Tsirkin" <mst@redhat.com> writes: > So, I had to reject a perfectly reasonable patch: > https://lore.kernel.org/qemu-devel/20260320193746.242704-1-jinpu.wang@ionos.com/ > just because of a tool used to make it. > > > How contributors could comply with DCO terms (b) or (c) for the output of AI > content generators commonly available today is unclear. The QEMU project is > not willing or able to accept the legal risks of non-compliance. In the linked case the LLM is basically doing a glorified search and replace. There seems to be no danger of accidentally regurgitating any training data which is where the worry about inadvertent copyright infringement comes from. That said in my experience generally any code that does come out from these tools tends to match the local code style and patterns pretty well. As a general purpose boilerplate generator they are probably better than a lot of people at this point. There has been some case law now that says LLM output could be un-copyrightable depending on how involved the user was in the iteration of the code. I suspect there is still more to come. > > > But, since this was written, Red Hat's Richard Fontana and Chris Wright > published this piece: > https://www.redhat.com/en/blog/ai-assisted-development-and-open-source-navigating-legal-issues > > > Saying, in particular " > We understand this concern, but the DCO has never > been interpreted to require that every line of a contribution must be > the personal creative expression of the contributor or another human > developer. > " > > I propose adopting linux's rules instead: > https://docs.kernel.org/process/coding-assistants.html > > which boils down to attribution. attribution and *ownership*. I think the key point of the policy is to make the actual engineer signing the DCO the responsible one for generating, testing and validating the code. It is strongly trying to suggest that vibe-coded slop isn't wanted. I still have concerns about the quality of the code and the "understanding" these models have. They can generate very convincing rationales for their decisions but they also are prone to being over-verbose and over-complicating the solutions. They have a tendency to chase down rabbit holes in the code and get lost while making wilder and more invasive changes to try and get things working. That said for personal scripts or random experiments the ability to quickly get to a PoC is pretty great. I think there is also scope for using LLMs for things that aren't directly writing code: - code review - investigation - generating test cases - polishing documentation and I wonder if we should spend some more time investigating the performance and pitfalls of LLMs before we open the flood gates to the code. -- Alex Bennée Virtualisation Tech Lead @ Linaro ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-24 17:06 ` Alex Bennée @ 2026-05-24 17:42 ` Michael S. Tsirkin 2026-05-24 18:26 ` Warner Losh 2026-05-24 20:11 ` Michael S. Tsirkin 2 siblings, 0 replies; 59+ messages in thread From: Michael S. Tsirkin @ 2026-05-24 17:42 UTC (permalink / raw) To: Alex Bennée; +Cc: qemu-devel, stefanha On Sun, May 24, 2026 at 06:06:46PM +0100, Alex Bennée wrote: > "Michael S. Tsirkin" <mst@redhat.com> writes: > > > So, I had to reject a perfectly reasonable patch: > > https://lore.kernel.org/qemu-devel/20260320193746.242704-1-jinpu.wang@ionos.com/ > > just because of a tool used to make it. > > > > > > How contributors could comply with DCO terms (b) or (c) for the output of AI > > content generators commonly available today is unclear. The QEMU project is > > not willing or able to accept the legal risks of non-compliance. > > In the linked case the LLM is basically doing a glorified search and > replace. There seems to be no danger of accidentally regurgitating any > training data which is where the worry about inadvertent copyright > infringement comes from. > > That said in my experience generally any code that does come out from > these tools tends to match the local code style and patterns pretty > well. Making the code original, too. > As a general purpose boilerplate generator they are probably > better than a lot of people at this point. > > There has been some case law now that says LLM output could be > un-copyrightable depending on how involved the user was in the iteration > of the code. I suspect there is still more to come. Waiting for courts to settle anything means waiting years, while the industry has mostly moved on. > > > > > > But, since this was written, Red Hat's Richard Fontana and Chris Wright > > published this piece: > > https://www.redhat.com/en/blog/ai-assisted-development-and-open-source-navigating-legal-issues > > > > > > Saying, in particular " > > We understand this concern, but the DCO has never > > been interpreted to require that every line of a contribution must be > > the personal creative expression of the contributor or another human > > developer. > > " > > > > I propose adopting linux's rules instead: > > https://docs.kernel.org/process/coding-assistants.html > > > > which boils down to attribution. > > attribution and *ownership*. I think the key point of the policy is to > make the actual engineer signing the DCO the responsible one for > generating, testing and validating the code. It is strongly trying to > suggest that vibe-coded slop isn't wanted. > > I still have concerns about the quality of the code and the > "understanding" these models have. They can generate very convincing > rationales for their decisions but they also are prone to being > over-verbose and over-complicating the solutions. They have a tendency > to chase down rabbit holes in the code and get lost while making wilder > and more invasive changes to try and get things working. That's up to maintainers though. > That said for personal scripts or random experiments the ability to > quickly get to a PoC is pretty great. Patch above is beyond that. > I think there is also scope for using LLMs for things that aren't > directly writing code: > > - code review > - investigation > - generating test cases > - polishing documentation > > and I wonder if we should spend some more time investigating the > performance and pitfalls of LLMs before we open the flood gates to the > code. Who would do the investigating? > -- > Alex Bennée > Virtualisation Tech Lead @ Linaro ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-24 17:06 ` Alex Bennée 2026-05-24 17:42 ` Michael S. Tsirkin @ 2026-05-24 18:26 ` Warner Losh 2026-05-24 20:04 ` Michael S. Tsirkin 2026-05-24 20:11 ` Michael S. Tsirkin 2 siblings, 1 reply; 59+ messages in thread From: Warner Losh @ 2026-05-24 18:26 UTC (permalink / raw) To: Alex Bennée; +Cc: Michael S. Tsirkin, qemu-devel, stefanha [-- Attachment #1: Type: text/plain, Size: 8247 bytes --] On Sun, May 24, 2026 at 11:08 AM Alex Bennée <alex.bennee@linaro.org> wrote: > "Michael S. Tsirkin" <mst@redhat.com> writes: > > > So, I had to reject a perfectly reasonable patch: > > > https://lore.kernel.org/qemu-devel/20260320193746.242704-1-jinpu.wang@ionos.com/ > > just because of a tool used to make it. > > > > > > How contributors could comply with DCO terms (b) or (c) for the > output of AI > > content generators commonly available today is unclear. The QEMU > project is > > not willing or able to accept the legal risks of non-compliance. > > In the linked case the LLM is basically doing a glorified search and > replace. There seems to be no danger of accidentally regurgitating any > training data which is where the worry about inadvertent copyright > infringement comes from. > Yes. The LLM copying code thing is so two years ago. LLMs don't do this anymore. They are just glorified pattern matchers, and generate based on the patterns they know. While there may be a tiny risk here, there's a greater risk today from humans doing this w/o attribution. > That said in my experience generally any code that does come out from > these tools tends to match the local code style and patterns pretty > well. As a general purpose boilerplate generator they are probably > better than a lot of people at this point. > > There has been some case law now that says LLM output could be > un-copyrightable depending on how involved the user was in the iteration > of the code. I suspect there is still more to come. > So let's be clear here, because it matters. The output of LLMs is in the public domain because there's not a human author. Why would that matter? I ask because there's large parts of the linux kernel that cannot enjoy copyright protection because they are mere facts (like tables of register writes to initialize a device). That doesn't stop the author from including the public domain code into the linux kernel (or FreeBSD or whatever). There are elements that can be protected by copyright and elements that can't. However, it's perfect acceptable to include public domain material in your copyrighted works. Adding LLM generated code, assuming it's unmodified, would be just that. Just like Disney did with a zillion movies. And most of the time when I use LLM output, I modify it a bit to be better. The LLM generation is close, but not quite right. It really is a so-so junior engineer that's a bit too keen on following rules. But anyway, the public domain aspect doesn't matter for us. Either there's no copyright, in which case people can copy it w/o a license. Or there is, and we grant one that's very permissive in what it allows. Folding the public domain code into projects is a time-honored tradition. Why would LLMs change this dynamic? > > > > > > But, since this was written, Red Hat's Richard Fontana and Chris Wright > > published this piece: > > > https://www.redhat.com/en/blog/ai-assisted-development-and-open-source-navigating-legal-issues > > > > > > Saying, in particular " > > We understand this concern, but the DCO has never > > been interpreted to require that every line of a contribution must > be > > the personal creative expression of the contributor or another > human > > developer. > > " > > > > I propose adopting linux's rules instead: > > https://docs.kernel.org/process/coding-assistants.html > > > > which boils down to attribution. > > attribution and *ownership*. I think the key point of the policy is to > make the actual engineer signing the DCO the responsible one for > generating, testing and validating the code. It is strongly trying to > suggest that vibe-coded slop isn't wanted. > But the DCO is correct here. If I take public domain code, and hack it I can still legitimately do a SOB. (a) The contribution was created in whole or in part by me and I have the right to submit it under the open source license indicated in the file; or So if I use LLM, and change it even a little, it's created in part by me. And if it's public domain, I have the right to submit it under any license I like. And the parts I created, I absolutely have the right to copyright and contribute under any terms I like. There's no new ground here. > I still have concerns about the quality of the code and the > "understanding" these models have. They can generate very convincing > rationales for their decisions but they also are prone to being > over-verbose and over-complicating the solutions. They have a tendency > to chase down rabbit holes in the code and get lost while making wilder > and more invasive changes to try and get things working. > Yes. One of the reasons that submitters need to explain (or be able to) every line and justify it in a debate in the context of the larger project. Though that's no different than today: we get submissions of varying quality from people that have varing degrees of competence. The code review process is supposed to set a minimum floor for code quality. LLMs are no different: the originator has to be able to justify and explain things here. > That said for personal scripts or random experiments the ability to > quickly get to a PoC is pretty great. > > I think there is also scope for using LLMs for things that aren't > directly writing code: > > - code review > - investigation > - generating test cases > - polishing documentation > > and I wonder if we should spend some more time investigating the > performance and pitfalls of LLMs before we open the flood gates to the > code. > I wouldn't open the floodgates. I would however expect the policy to understand that llm assist in generating code produces results that meet the minimum quality expectations. But also understand that these tools can be a firehose of information that's hard to filter. The problem with LLMs has always been one of verification. It takes a lot of time to know if they are right. Often times a lot more time than the traditional submission because LLM generated pull requests that I've seen in FreeBSD tend to be super verbose, with all kinds of irrelevant detail thrown in. And yet, the underlying changes are at least "close enough to review". We're struggling in that sister open source project on how to cope, honestly, and caution is likely called for, but bans when there's a sliding scale of LLM use likely aren't. In my bsd-user upstreaming, Claude has been great at code review, and at suggesting changes. I often do a change and then ask claude how it would fix the issue. Quite often they are the same thing. And claude is good about reviewing my fix for the issue. I'm sure, though, it's missing a lot of bigger picture things, but that's what I'm for. So maybe a good middle ground might be to allow claude for things that are low risk: - Things sed or coccinelle can do - Minor bug fixes with human written commit messages - Minor feature tweaks (say < 200 lines) - All things test an CI (well, maybe not that wide, but much wider in the CI space) - Generation of tools that build the system, though with extra vetting - other grunt tasks (like my upstreaming stuff, but I'm sure there's other areas that don't involve generation of large amounts of creative works). Coupled with a strong requirement for quality and standing behind the patch. Maybe with extra scrutiny in the reviews (though, the reviews I've gotten for bsd-user, while quite useful, have been tougher than I've seen in many other places). I like Linux's rules, generally, though we can have the door less open. It's one reason I added the Assisted-by: claude lines in my bsd-user reivews. Claude did the grunt work of git blame and slicing and dicing the patches (which it got mostly right, after feedback, I have some work to do to re-slice a few things). It also reviewed and I fixed several real issues, as well as a bunch of "logical" issues where I used host instaed of target things or vice versa. Warner > -- > Alex Bennée > Virtualisation Tech Lead @ Linaro > > [-- Attachment #2: Type: text/html, Size: 10948 bytes --] ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-24 18:26 ` Warner Losh @ 2026-05-24 20:04 ` Michael S. Tsirkin 0 siblings, 0 replies; 59+ messages in thread From: Michael S. Tsirkin @ 2026-05-24 20:04 UTC (permalink / raw) To: Warner Losh; +Cc: Alex Bennée, qemu-devel, stefanha On Sun, May 24, 2026 at 12:26:43PM -0600, Warner Losh wrote: > So maybe a good middle ground might be to allow claude for things that > are low risk: > - Things sed or coccinelle can do > - Minor bug fixes with human written commit messages > - Minor feature tweaks (say < 200 lines) As far as I am concerned, if it's a reasonably split patchset of multiple patches < 200 lines each, it the same. -- MST ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-24 17:06 ` Alex Bennée 2026-05-24 17:42 ` Michael S. Tsirkin 2026-05-24 18:26 ` Warner Losh @ 2026-05-24 20:11 ` Michael S. Tsirkin 2026-05-24 20:44 ` Stefan Hajnoczi 2 siblings, 1 reply; 59+ messages in thread From: Michael S. Tsirkin @ 2026-05-24 20:11 UTC (permalink / raw) To: Alex Bennée; +Cc: qemu-devel, stefanha On Sun, May 24, 2026 at 06:06:46PM +0100, Alex Bennée wrote: > "Michael S. Tsirkin" <mst@redhat.com> writes: > > > So, I had to reject a perfectly reasonable patch: > > https://lore.kernel.org/qemu-devel/20260320193746.242704-1-jinpu.wang@ionos.com/ > > just because of a tool used to make it. > > > > > > How contributors could comply with DCO terms (b) or (c) for the output of AI > > content generators commonly available today is unclear. The QEMU project is > > not willing or able to accept the legal risks of non-compliance. > > In the linked case the LLM is basically doing a glorified search and > replace. There seems to be no danger of accidentally regurgitating any > training data which is where the worry about inadvertent copyright > infringement comes from. Does this mean I can merge it, in your view? -- MST ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-24 20:11 ` Michael S. Tsirkin @ 2026-05-24 20:44 ` Stefan Hajnoczi 2026-05-25 15:27 ` Stefan Hajnoczi 0 siblings, 1 reply; 59+ messages in thread From: Stefan Hajnoczi @ 2026-05-24 20:44 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: Alex Bennée, qemu-devel, stefanha On Sun, May 24, 2026 at 4:12 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Sun, May 24, 2026 at 06:06:46PM +0100, Alex Bennée wrote: > > "Michael S. Tsirkin" <mst@redhat.com> writes: > > > > > So, I had to reject a perfectly reasonable patch: > > > https://lore.kernel.org/qemu-devel/20260320193746.242704-1-jinpu.wang@ionos.com/ > > > just because of a tool used to make it. > > > > > > > > > How contributors could comply with DCO terms (b) or (c) for the output of AI > > > content generators commonly available today is unclear. The QEMU project is > > > not willing or able to accept the legal risks of non-compliance. > > > > In the linked case the LLM is basically doing a glorified search and > > replace. There seems to be no danger of accidentally regurgitating any > > training data which is where the worry about inadvertent copyright > > infringement comes from. > > Does this mean I can merge it, in your view? It would be a good time to revisit the AI policy. From the QEMU Summit 2026 minutes: "- We plan to solicit feedback in spring next year on how the policy has worked out in practice." (https://lore.kernel.org/qemu-devel/CAFEAcA-OmqRTqwYZ2WCeqFu=zxG65t6WSfKR=NthfpazrjzpzA@mail.gmail.com/) That hasn't happened yet and it's almost summer, so now is a good time to have that discussion. The policy was written with the option of adding exceptions (see the Exceptions section at the bottom of docs/devel/code-provenance.rst). That is one place where it could be extended. Another option is to say that the situation has changed since the policy was written and to replace it with something that allows a broader range of AI-generated content instead of just specific exceptions. Here is Software Freedom Conservancy's most recent blog post about AI-generated content: https://sfconservancy.org/blog/2026/apr/15/eternal-november-generative-ai-llm/ Stefan ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-24 20:44 ` Stefan Hajnoczi @ 2026-05-25 15:27 ` Stefan Hajnoczi 0 siblings, 0 replies; 59+ messages in thread From: Stefan Hajnoczi @ 2026-05-25 15:27 UTC (permalink / raw) To: Stefan Hajnoczi; +Cc: Michael S. Tsirkin, Alex Bennée, qemu-devel [-- Attachment #1: Type: text/plain, Size: 1303 bytes --] On Sun, May 24, 2026 at 04:44:41PM -0400, Stefan Hajnoczi wrote: > On Sun, May 24, 2026 at 4:12 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > On Sun, May 24, 2026 at 06:06:46PM +0100, Alex Bennée wrote: > > > "Michael S. Tsirkin" <mst@redhat.com> writes: > > > > > > > So, I had to reject a perfectly reasonable patch: > > > > https://lore.kernel.org/qemu-devel/20260320193746.242704-1-jinpu.wang@ionos.com/ > > > > just because of a tool used to make it. > > > > > > > > > > > > How contributors could comply with DCO terms (b) or (c) for the output of AI > > > > content generators commonly available today is unclear. The QEMU project is > > > > not willing or able to accept the legal risks of non-compliance. > > > > > > In the linked case the LLM is basically doing a glorified search and > > > replace. There seems to be no danger of accidentally regurgitating any > > > training data which is where the worry about inadvertent copyright > > > infringement comes from. > > > > Does this mean I can merge it, in your view? > > It would be a good time to revisit the AI policy. From the QEMU Summit > 2026 minutes: Oops, "2026" should have been "2025". I think the policy should be updated if we're going to depart from the policy. Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-24 12:42 on ai generated and code provenance Michael S. Tsirkin 2026-05-24 17:06 ` Alex Bennée @ 2026-05-25 16:32 ` Paolo Bonzini 2026-05-25 17:15 ` Warner Losh 2026-05-26 8:23 ` Peter Maydell 2026-05-26 17:43 ` Kevin Wolf 2 siblings, 2 replies; 59+ messages in thread From: Paolo Bonzini @ 2026-05-25 16:32 UTC (permalink / raw) To: Michael S. Tsirkin, qemu-devel; +Cc: stefanha On 5/24/26 14:42, Michael S. Tsirkin wrote: > How contributors could comply with DCO terms (b) or (c) for the output of AI > content generators commonly available today is unclear. The QEMU project is > not willing or able to accept the legal risks of non-compliance. > > But, since this was written, Red Hat's Richard Fontana and Chris Wright > published this piece: > https://www.redhat.com/en/blog/ai-assisted-development-and-open-source-navigating-legal-issues > > Saying, in particular > We understand this concern, but the DCO has never > been interpreted to require that every line of a contribution must be > the personal creative expression of the contributor or another human > developer. This is not the objection or the worry; rather the question is, what if the contribution is a creative expression of someone that could claim copyright in it. In fact, looking at the Linux policy... Signed-off-by and Developer Certificate of Origin ================================================= AI agents MUST NOT add Signed-off-by tags. Only humans can legally certify the Developer Certificate of Origin (DCO). The human submitter is responsible for: * Reviewing all AI-generated code * Ensuring compliance with licensing requirements * Adding their own Signed-off-by tag to certify the DCO * Taking full responsibility for the contribution ... the question is how humans can actually do the second step. The piece you posted above says: "with disclosure and human attentiveness – and oversight – aided where possible by tools that check for code similarity, AI-assisted contributions can be entirely compatible with the spirit of the DCO". This is not encouraging, in my opinion, because it leaves a lot of the mechanics undefined. A while ago I suggested that in some scenarios this could actually be done[1][2]; another possible case is localized bugfixes (say, below 20 lines of code). For more general contributions however, the role of maintainers is not clear. Would we require to "check for code similarity"? I sure don't want to open that can of worms. > I propose adopting linux's rules instead: > https://docs.kernel.org/process/coding-assistants.html Replacing QEMU's policy with Linux's would be orthogonal to the topic of the DCO. Maintainers would still have the option of rejecting AI-assisted patches if they don't believe they can apply their own sign-off. Other projects have taken similar "no AI" policies for different reasons. Zig has one because they believe AI code would make it harder to retain contributors[3][4]; Rust is working on one that is fairly restrictive[5] (discussion at [6]) and requires previous communications with reviewers about *any* generated PRs[7]. Personally I think QEMU's policy is fine but we should start introducing exceptions, possibly including large contributions with pre-authorization (but not pre-approval) from the maintainer. Paolo [1] https://lore.kernel.org/qemu-devel/20250925075630.352720-1-pbonzini@redhat.com [2] https://lore.kernel.org/qemu-devel/20251008063546.376603-1-pbonzini@redhat.com/raw [3] https://ziglang.org/code-of-conduct/ [4] https://ziggit.dev/t/bun-s-zig-fork-got-4x-faster-compilation-times/15183/19 [5] https://github.com/jyn514/rust-forge/blob/llm-policy/src/policies/llm-usage.md [6] https://github.com/rust-lang/rust-forge/pull/1040 [7] https://github.com/jyn514/rust-forge/blob/llm-policy/src/policies/llm-usage.md#experiment-llm-created-code-changes ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-25 16:32 ` Paolo Bonzini @ 2026-05-25 17:15 ` Warner Losh 2026-05-25 19:44 ` Stefan Hajnoczi ` (2 more replies) 2026-05-26 8:23 ` Peter Maydell 1 sibling, 3 replies; 59+ messages in thread From: Warner Losh @ 2026-05-25 17:15 UTC (permalink / raw) To: Paolo Bonzini; +Cc: Michael S. Tsirkin, qemu-devel, stefanha [-- Attachment #1: Type: text/plain, Size: 5635 bytes --] On Mon, May 25, 2026 at 10:34 AM Paolo Bonzini <pbonzini@redhat.com> wrote: > On 5/24/26 14:42, Michael S. Tsirkin wrote: > > How contributors could comply with DCO terms (b) or (c) for the > output of AI > > content generators commonly available today is unclear. The QEMU > project is > > not willing or able to accept the legal risks of non-compliance. > > > > But, since this was written, Red Hat's Richard Fontana and Chris Wright > > published this piece: > > > https://www.redhat.com/en/blog/ai-assisted-development-and-open-source-navigating-legal-issues > > > > Saying, in particular > > We understand this concern, but the DCO has never > > been interpreted to require that every line of a contribution must > be > > the personal creative expression of the contributor or another > human > > developer. > This is not the objection or the worry; rather the question is, what if > the contribution is a creative expression of someone that could claim > copyright in it. In fact, looking at the Linux policy... > > Signed-off-by and Developer Certificate of Origin > ================================================= > > AI agents MUST NOT add Signed-off-by tags. Only humans can legally > certify the Developer Certificate of Origin (DCO). The human submitter > is responsible for: > > * Reviewing all AI-generated code > * Ensuring compliance with licensing requirements > * Adding their own Signed-off-by tag to certify the DCO > * Taking full responsibility for the contribution > > ... the question is how humans can actually do the second step. The > piece you posted above says: "with disclosure and human attentiveness – > and oversight – aided where possible by tools that check for code > similarity, AI-assisted contributions can be entirely compatible with > the spirit of the DCO". > The code produced by AI agents has no copyright. You can incorporate public domain code into your work and have the absolute right to license it (see all the Diseny movies). The notion that LLMs wholesale copy originates from the earliest days of Copilot and turned out were contrived. No recent evidence shows that plagiarism is a concern. To the extent that I modify public domain code, I have a copyright that I can choose to license however I want (and the SOB says it's compatible). So I'm struggling to understand the hesitation here. Is it the uncertainty around the copyright? Around the copying issue? Something else? We already have some level of risk around these issues with human coders: We have to take their word for it that they didn't copy, and if they did, the project is still on the hook to remedy the situation if the real rights holders show up.... There's always risk when submissions are accepted from the general public. Also, I've softened this paragraph several times, and it still comes across as more confrontational than I intend. I'm trying to understand. > This is not encouraging, in my opinion, because it leaves a lot of the > mechanics undefined. A while ago I suggested that in some scenarios > this could actually be done[1][2]; another possible case is localized > bugfixes (say, below 20 lines of code). For more general contributions > however, the role of maintainers is not clear. Would we require to > "check for code similarity"? I sure don't want to open that can of worms. > > > I propose adopting linux's rules instead: > > https://docs.kernel.org/process/coding-assistants.html > > Replacing QEMU's policy with Linux's would be orthogonal to the topic of > the DCO. Maintainers would still have the option of rejecting > AI-assisted patches if they don't believe they can apply their own > sign-off. > > Other projects have taken similar "no AI" policies for different > reasons. Zig has one because they believe AI code would make it harder > to retain contributors[3][4]; Rust is working on one that is fairly > restrictive[5] (discussion at [6]) and requires previous communications > with reviewers about *any* generated PRs[7]. Personally I think QEMU's > policy is fine but we should start introducing exceptions, possibly > including large contributions with pre-authorization (but not > pre-approval) from the maintainer. > I agree with the thrust of the proposals you've submitted: AI is a tool, and there are many ways to use it safely. AI's primary issue is verification. The folks being flooded can't verify the good from the bad in the flood and tend to have a knee jerk reaction to protect themselves: ban it. Unstated issue: How do we help people that want to contribute grow their skills using AI so they make submissions whose quality if good enough to be worth our time to verify and review. It's an industry wide problem, along with how do junior engineers become senior in a world of AI doing the grunt work they used to learn from. Warner > Paolo > > [1] > > https://lore.kernel.org/qemu-devel/20250925075630.352720-1-pbonzini@redhat.com > [2] > > https://lore.kernel.org/qemu-devel/20251008063546.376603-1-pbonzini@redhat.com/raw > [3] https://ziglang.org/code-of-conduct/ > [4] > > https://ziggit.dev/t/bun-s-zig-fork-got-4x-faster-compilation-times/15183/19 > [5] > > https://github.com/jyn514/rust-forge/blob/llm-policy/src/policies/llm-usage.md > [6] https://github.com/rust-lang/rust-forge/pull/1040 > [7] > > https://github.com/jyn514/rust-forge/blob/llm-policy/src/policies/llm-usage.md#experiment-llm-created-code-changes > > > [-- Attachment #2: Type: text/html, Size: 8000 bytes --] ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-25 17:15 ` Warner Losh @ 2026-05-25 19:44 ` Stefan Hajnoczi 2026-05-25 22:36 ` Michael S. Tsirkin 2026-05-25 19:56 ` Paolo Bonzini 2026-05-26 21:48 ` Philippe Mathieu-Daudé 2 siblings, 1 reply; 59+ messages in thread From: Stefan Hajnoczi @ 2026-05-25 19:44 UTC (permalink / raw) To: Warner Losh; +Cc: Paolo Bonzini, Michael S. Tsirkin, qemu-devel, stefanha On Mon, May 25, 2026 at 1:17 PM Warner Losh <imp@bsdimp.com> wrote: > On Mon, May 25, 2026 at 10:34 AM Paolo Bonzini <pbonzini@redhat.com> wrote: >> On 5/24/26 14:42, Michael S. Tsirkin wrote: >> > How contributors could comply with DCO terms (b) or (c) for the output of AI >> > content generators commonly available today is unclear. The QEMU project is >> > not willing or able to accept the legal risks of non-compliance. >> > >> > But, since this was written, Red Hat's Richard Fontana and Chris Wright >> > published this piece: >> > https://www.redhat.com/en/blog/ai-assisted-development-and-open-source-navigating-legal-issues >> > >> > Saying, in particular >> > We understand this concern, but the DCO has never >> > been interpreted to require that every line of a contribution must be >> > the personal creative expression of the contributor or another human >> > developer. >> This is not the objection or the worry; rather the question is, what if >> the contribution is a creative expression of someone that could claim >> copyright in it. In fact, looking at the Linux policy... >> >> Signed-off-by and Developer Certificate of Origin >> ================================================= >> >> AI agents MUST NOT add Signed-off-by tags. Only humans can legally >> certify the Developer Certificate of Origin (DCO). The human submitter >> is responsible for: >> >> * Reviewing all AI-generated code >> * Ensuring compliance with licensing requirements >> * Adding their own Signed-off-by tag to certify the DCO >> * Taking full responsibility for the contribution >> >> ... the question is how humans can actually do the second step. The >> piece you posted above says: "with disclosure and human attentiveness – >> and oversight – aided where possible by tools that check for code >> similarity, AI-assisted contributions can be entirely compatible with >> the spirit of the DCO". > > > The code produced by AI agents has no copyright. You can incorporate > public domain code into your work and have the absolute right to license > it (see all the Diseny movies). The notion that LLMs wholesale copy originates > from the earliest days of Copilot and turned out were contrived. No recent > evidence shows that plagiarism is a concern. To the extent that I modify > public domain code, I have a copyright that I can choose to license > however I want (and the SOB says it's compatible). There is an active field of research on memorization and the status is that LLMs do memorize. A paper from 2026 (https://arxiv.org/pdf/2601.02671) shows that production models can output significant chunks of Harry Potter, although the research deliberately extracts training inputs rather than doing so accidentally. I am sharing this because I don't think it's correct to say that concerns about models outputting copyrighted code are outdated. I do think that the risk for coding use cases is low as long as LLMs are used sensibly. If not, legal cases would have popped up by now. The example of ext4 for OpenBSD (https://lwn.net/Articles/1064541/) comes to mind as a case where LLMs were used in a risky way and maintainers decided to reject the code. Even though the output of AI has no copyright, when there is no suitably-licensed information to generate the code from, then it is risky to assume AI generated code is free from copyright, license, patent, etc effects. As long as we keep the usual practices around intellectual property in mind when merging code, then I think the risk of copyright issues is low and not a blocker for accepting AI generated contributions. Stefan ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-25 19:44 ` Stefan Hajnoczi @ 2026-05-25 22:36 ` Michael S. Tsirkin 2026-05-26 13:16 ` Stefan Hajnoczi 0 siblings, 1 reply; 59+ messages in thread From: Michael S. Tsirkin @ 2026-05-25 22:36 UTC (permalink / raw) To: Stefan Hajnoczi; +Cc: Warner Losh, Paolo Bonzini, qemu-devel, stefanha On Mon, May 25, 2026 at 03:44:02PM -0400, Stefan Hajnoczi wrote: > On Mon, May 25, 2026 at 1:17 PM Warner Losh <imp@bsdimp.com> wrote: > > On Mon, May 25, 2026 at 10:34 AM Paolo Bonzini <pbonzini@redhat.com> wrote: > >> On 5/24/26 14:42, Michael S. Tsirkin wrote: > >> > How contributors could comply with DCO terms (b) or (c) for the output of AI > >> > content generators commonly available today is unclear. The QEMU project is > >> > not willing or able to accept the legal risks of non-compliance. > >> > > >> > But, since this was written, Red Hat's Richard Fontana and Chris Wright > >> > published this piece: > >> > https://www.redhat.com/en/blog/ai-assisted-development-and-open-source-navigating-legal-issues > >> > > >> > Saying, in particular > >> > We understand this concern, but the DCO has never > >> > been interpreted to require that every line of a contribution must be > >> > the personal creative expression of the contributor or another human > >> > developer. > >> This is not the objection or the worry; rather the question is, what if > >> the contribution is a creative expression of someone that could claim > >> copyright in it. In fact, looking at the Linux policy... > >> > >> Signed-off-by and Developer Certificate of Origin > >> ================================================= > >> > >> AI agents MUST NOT add Signed-off-by tags. Only humans can legally > >> certify the Developer Certificate of Origin (DCO). The human submitter > >> is responsible for: > >> > >> * Reviewing all AI-generated code > >> * Ensuring compliance with licensing requirements > >> * Adding their own Signed-off-by tag to certify the DCO > >> * Taking full responsibility for the contribution > >> > >> ... the question is how humans can actually do the second step. The > >> piece you posted above says: "with disclosure and human attentiveness – > >> and oversight – aided where possible by tools that check for code > >> similarity, AI-assisted contributions can be entirely compatible with > >> the spirit of the DCO". > > > > > > The code produced by AI agents has no copyright. You can incorporate > > public domain code into your work and have the absolute right to license > > it (see all the Diseny movies). The notion that LLMs wholesale copy originates > > from the earliest days of Copilot and turned out were contrived. No recent > > evidence shows that plagiarism is a concern. To the extent that I modify > > public domain code, I have a copyright that I can choose to license > > however I want (and the SOB says it's compatible). > > There is an active field of research on memorization and the status is > that LLMs do memorize. A paper from 2026 > (https://arxiv.org/pdf/2601.02671) shows that production models can > output significant chunks of Harry Potter, although the research > deliberately extracts training inputs rather than doing so > accidentally. I am sharing this because I don't think it's correct to > say that concerns about models outputting copyrighted code are > outdated. But the concern is with them doing it *accidentally*. Because willful infringement was always possible. And that does not seem to be happening. > I do think that the risk for coding use cases is low as long as LLMs > are used sensibly. If not, legal cases would have popped up by now. > > The example of ext4 for OpenBSD (https://lwn.net/Articles/1064541/) > comes to mind as a case where LLMs were used in a risky way and > maintainers decided to reject the code. Even though the output of AI > has no copyright, when there is no suitably-licensed information to > generate the code from, then it is risky to assume AI generated code > is free from copyright, license, patent, etc effects. > > As long as we keep the usual practices around intellectual property in > mind when merging code, then I think the risk of copyright issues is > low and not a blocker for accepting AI generated contributions. > > Stefan ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-25 22:36 ` Michael S. Tsirkin @ 2026-05-26 13:16 ` Stefan Hajnoczi 0 siblings, 0 replies; 59+ messages in thread From: Stefan Hajnoczi @ 2026-05-26 13:16 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: Warner Losh, Paolo Bonzini, qemu-devel, stefanha On Mon, May 25, 2026 at 6:36 PM Michael S. Tsirkin <mst@redhat.com> wrote: > On Mon, May 25, 2026 at 03:44:02PM -0400, Stefan Hajnoczi wrote: > > On Mon, May 25, 2026 at 1:17 PM Warner Losh <imp@bsdimp.com> wrote: > > > On Mon, May 25, 2026 at 10:34 AM Paolo Bonzini <pbonzini@redhat.com> wrote: > > >> On 5/24/26 14:42, Michael S. Tsirkin wrote: > > >> > How contributors could comply with DCO terms (b) or (c) for the output of AI > > >> > content generators commonly available today is unclear. The QEMU project is > > >> > not willing or able to accept the legal risks of non-compliance. > > >> > > > >> > But, since this was written, Red Hat's Richard Fontana and Chris Wright > > >> > published this piece: > > >> > https://www.redhat.com/en/blog/ai-assisted-development-and-open-source-navigating-legal-issues > > >> > > > >> > Saying, in particular > > >> > We understand this concern, but the DCO has never > > >> > been interpreted to require that every line of a contribution must be > > >> > the personal creative expression of the contributor or another human > > >> > developer. > > >> This is not the objection or the worry; rather the question is, what if > > >> the contribution is a creative expression of someone that could claim > > >> copyright in it. In fact, looking at the Linux policy... > > >> > > >> Signed-off-by and Developer Certificate of Origin > > >> ================================================= > > >> > > >> AI agents MUST NOT add Signed-off-by tags. Only humans can legally > > >> certify the Developer Certificate of Origin (DCO). The human submitter > > >> is responsible for: > > >> > > >> * Reviewing all AI-generated code > > >> * Ensuring compliance with licensing requirements > > >> * Adding their own Signed-off-by tag to certify the DCO > > >> * Taking full responsibility for the contribution > > >> > > >> ... the question is how humans can actually do the second step. The > > >> piece you posted above says: "with disclosure and human attentiveness – > > >> and oversight – aided where possible by tools that check for code > > >> similarity, AI-assisted contributions can be entirely compatible with > > >> the spirit of the DCO". > > > > > > > > > The code produced by AI agents has no copyright. You can incorporate > > > public domain code into your work and have the absolute right to license > > > it (see all the Diseny movies). The notion that LLMs wholesale copy originates > > > from the earliest days of Copilot and turned out were contrived. No recent > > > evidence shows that plagiarism is a concern. To the extent that I modify > > > public domain code, I have a copyright that I can choose to license > > > however I want (and the SOB says it's compatible). > > > > There is an active field of research on memorization and the status is > > that LLMs do memorize. A paper from 2026 > > (https://arxiv.org/pdf/2601.02671) shows that production models can > > output significant chunks of Harry Potter, although the research > > deliberately extracts training inputs rather than doing so > > accidentally. I am sharing this because I don't think it's correct to > > say that concerns about models outputting copyrighted code are > > outdated. > > But the concern is with them doing it *accidentally*. > Because willful infringement was always possible. > And that does not seem to be happening. I agree. The chance of accidental copyright violations is too small to ban AI usage in my opinion... > > I do think that the risk for coding use cases is low as long as LLMs > > are used sensibly. If not, legal cases would have popped up by now. > > > > The example of ext4 for OpenBSD (https://lwn.net/Articles/1064541/) > > comes to mind as a case where LLMs were used in a risky way and > > maintainers decided to reject the code. Even though the output of AI > > has no copyright, when there is no suitably-licensed information to > > generate the code from, then it is risky to assume AI generated code > > is free from copyright, license, patent, etc effects. ...but here is a realistic example of where it might make sense to reject an AI-generated contribution. My point is that maintainers still need to consider whether contributions are risky and in some cases it's easier to do something reckless with AI because it may not feel like you are exposing yourself to licensing issues when the AI generates the code for you. Stefan ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-25 17:15 ` Warner Losh 2026-05-25 19:44 ` Stefan Hajnoczi @ 2026-05-25 19:56 ` Paolo Bonzini 2026-05-26 21:48 ` Philippe Mathieu-Daudé 2 siblings, 0 replies; 59+ messages in thread From: Paolo Bonzini @ 2026-05-25 19:56 UTC (permalink / raw) To: Warner Losh; +Cc: Michael S. Tsirkin, qemu-devel, Hajnoczi, Stefan [-- Attachment #1: Type: text/plain, Size: 1515 bytes --] Il lun 25 mag 2026, 19:15 Warner Losh <imp@bsdimp.com> ha scritto: > The code produced by AI agents has no copyright. > This is not entirely true. As models improve their capability to generate, they also improve their ability to recall exactly. Stefan gave more information. The ability to search and reuse code found on the internet could also be a problem. In that case the code is not produced by AI. While this is *generally speaking* not an issue, it can be in specific cases. https://www.devclass.com/ai-ml/2025/11/27/ocaml-maintainers-reject-massive-ai-generated-pull-request/1728083 is only about six months old. Also, I've softened this paragraph > several times, and it still comes across as more confrontational than I > intend. > No problem at all! AI's primary issue is verification. The folks being flooded can't verify > the good from the bad in the flood and tend to have a knee jerk reaction to > protect themselves: ban it. > Being cautious and open minded at the same time is a good way to react, IMO. Unstated issue: How do we help people that want to contribute grow their > skills using AI so they make submissions whose quality if good enough to > be worth our time to verify and review. It's an industry wide problem, > along > with how do junior engineers become senior in a world of AI doing the grunt > work they used to learn from. > This is not our problem to solve. What we can do is participate to outreach activities for students, such as Google Summer of Code. Paolo > [-- Attachment #2: Type: text/html, Size: 3435 bytes --] ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-25 17:15 ` Warner Losh 2026-05-25 19:44 ` Stefan Hajnoczi 2026-05-25 19:56 ` Paolo Bonzini @ 2026-05-26 21:48 ` Philippe Mathieu-Daudé 2 siblings, 0 replies; 59+ messages in thread From: Philippe Mathieu-Daudé @ 2026-05-26 21:48 UTC (permalink / raw) To: Warner Losh, Paolo Bonzini; +Cc: Michael S. Tsirkin, qemu-devel, stefanha On 25/5/26 19:15, Warner Losh wrote: > AI's primary issue is verification. The folks being flooded can't verify the > good from the bad in the flood and tend to have a knee jerk reaction to > protect themselves: ban it. > > Unstated issue: How do we help people that want to contribute grow their > skills using AI so they make submissions whose quality if good enough to > be worth our time to verify and review. It's an industry wide problem, along > with how do junior engineers become senior in a world of AI doing the grunt > work they used to learn from. While it seems easier to start contributing with new code rather than contributing reviewing code, I strongly suggest junier engineers to start reviewing before posting patches. That would help to unnarrow the maintainer funnel problem. AI could help them there too. But maybe I'm opening another can of worms by suggesting that direction. > Warner ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-25 16:32 ` Paolo Bonzini 2026-05-25 17:15 ` Warner Losh @ 2026-05-26 8:23 ` Peter Maydell 2026-05-26 9:28 ` Alex Bennée ` (2 more replies) 1 sibling, 3 replies; 59+ messages in thread From: Peter Maydell @ 2026-05-26 8:23 UTC (permalink / raw) To: Paolo Bonzini; +Cc: Michael S. Tsirkin, qemu-devel, stefanha On Mon, 25 May 2026 at 17:33, Paolo Bonzini <pbonzini@redhat.com> wrote: > On 5/24/26 14:42, Michael S. Tsirkin wrote: > > I propose adopting linux's rules instead: > > https://docs.kernel.org/process/coding-assistants.html > > Replacing QEMU's policy with Linux's would be orthogonal to the topic of > the DCO. Maintainers would still have the option of rejecting > AI-assisted patches if they don't believe they can apply their own sign-off. > > Other projects have taken similar "no AI" policies for different > reasons. Zig has one because they believe AI code would make it harder > to retain contributors[3][4]; Rust is working on one that is fairly > restrictive[5] (discussion at [6]) and requires previous communications > with reviewers about *any* generated PRs[7]. Personally I think QEMU's > policy is fine but we should start introducing exceptions, possibly > including large contributions with pre-authorization (but not > pre-approval) from the maintainer. If we revisit our AI policy (which we should, I think, in the sense that it's been a while and the situation has changed), I want to note that although our current policy essentially says "no, because we don't want the legal risks", that doesn't imply that "if we judge now that the legal risks are acceptable, that was the only blocker and so we are now open to AI contributions of all sorts". While we were essentially in the "blanket ban" state anyway, there was no particular need to have the discussion about other reasons we might also want to be restrictive or cautious about AI contributions, but those other reasons and viewpoints don't go away automatically with the legal one. I have quite a lot of sympathy with the rationale behind the Zig policy, for instance: https://kristoff.it/blog/contributor-poker-and-ai/ I spend quite a lot of time reviewing patches for things which are features I don't necessarily personally care about. I'm happy with doing that for other people who are hopefully learning and gaining something from the process; I'm much less interested in reviewing a mountain of LLM-generated patches. thanks -- PMM ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-26 8:23 ` Peter Maydell @ 2026-05-26 9:28 ` Alex Bennée 2026-05-26 9:57 ` Paolo Bonzini 2026-05-27 7:11 ` Philippe Mathieu-Daudé 2 siblings, 0 replies; 59+ messages in thread From: Alex Bennée @ 2026-05-26 9:28 UTC (permalink / raw) To: Peter Maydell; +Cc: Paolo Bonzini, Michael S. Tsirkin, qemu-devel, stefanha Peter Maydell <peter.maydell@linaro.org> writes: > On Mon, 25 May 2026 at 17:33, Paolo Bonzini <pbonzini@redhat.com> wrote: >> On 5/24/26 14:42, Michael S. Tsirkin wrote: >> > I propose adopting linux's rules instead: >> > https://docs.kernel.org/process/coding-assistants.html >> >> Replacing QEMU's policy with Linux's would be orthogonal to the topic of >> the DCO. Maintainers would still have the option of rejecting >> AI-assisted patches if they don't believe they can apply their own sign-off. >> >> Other projects have taken similar "no AI" policies for different >> reasons. Zig has one because they believe AI code would make it harder >> to retain contributors[3][4]; Rust is working on one that is fairly >> restrictive[5] (discussion at [6]) and requires previous communications >> with reviewers about *any* generated PRs[7]. Personally I think QEMU's >> policy is fine but we should start introducing exceptions, possibly >> including large contributions with pre-authorization (but not >> pre-approval) from the maintainer. > > If we revisit our AI policy (which we should, I think, in the sense > that it's been a while and the situation has changed), I want to > note that although our current policy essentially says "no, because > we don't want the legal risks", that doesn't imply that "if we > judge now that the legal risks are acceptable, that was the only > blocker and so we are now open to AI contributions of all sorts". I think there are still potential legal risks but in the normal use case they are pretty small. Prompts to re-factor QEMU code will likely be fine because the LLM is acting as a fungible editor - if anyone prompted "implement Rosetta's target code optimisation pass" we should be very wary of accidental infringement. > While we were essentially in the "blanket ban" state anyway, there was > no particular need to have the discussion about other reasons we might > also want to be restrictive or cautious about AI contributions, but > those other reasons and viewpoints don't go away automatically with > the legal one. > > I have quite a lot of sympathy with the rationale behind the > Zig policy, for instance: > https://kristoff.it/blog/contributor-poker-and-ai/ > I spend quite a lot of time reviewing patches for things which > are features I don't necessarily personally care about. I'm > happy with doing that for other people who are hopefully > learning and gaining something from the process; I'm much > less interested in reviewing a mountain of LLM-generated patches. I agree - I think we need to address the quality expectations expected of series authored with the help of AI before we open the doors to even a limited subset of exceptions. Otherwise I think we could see a similar deluge of patches overloading reviewers the same way the issue tracker has been recently. > > thanks > -- PMM -- Alex Bennée Virtualisation Tech Lead @ Linaro ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-26 8:23 ` Peter Maydell 2026-05-26 9:28 ` Alex Bennée @ 2026-05-26 9:57 ` Paolo Bonzini 2026-05-26 11:27 ` BALATON Zoltan 2026-05-27 7:11 ` Philippe Mathieu-Daudé 2 siblings, 1 reply; 59+ messages in thread From: Paolo Bonzini @ 2026-05-26 9:57 UTC (permalink / raw) To: Peter Maydell; +Cc: Michael S. Tsirkin, qemu-devel, Hajnoczi, Stefan [-- Attachment #1: Type: text/plain, Size: 1347 bytes --] Il mar 26 mag 2026, 10:23 Peter Maydell <peter.maydell@linaro.org> ha scritto: > > Personally I think QEMU's > > policy is fine but we should start introducing exceptions, possibly > > including large contributions with pre-authorization (but not > > pre-approval) from the maintainer. > > I want to note that [...] while we were essentially in the "blanket ban" > state anyway, there was no particular need to have the discussion about > other reasons we might also want to be restrictive or cautious about AI > contributions, but those other reasons and viewpoints don't go away > automatically with the legal one. > > I have quite a lot of sympathy with the rationale behind the Zig policy, > for instance: https://kristoff.it/blog/contributor-poker-and-ai/ I spend > quite a lot of time reviewing patches for things which are features I don't > necessarily personally care about. I'm happy with doing that for other > people who are hopefully learning and gaining something from the process; > I'm much less interested in reviewing a mountain of LLM-generated patches. > I agree and that's a good argument for pre-discussion with the maintainers. It would anyway be the right thing to do for large contributions, but it's even more important with AI given the different balance between contributor and reviewer. Paolo thanks > -- PMM > > [-- Attachment #2: Type: text/html, Size: 2197 bytes --] ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-26 9:57 ` Paolo Bonzini @ 2026-05-26 11:27 ` BALATON Zoltan 2026-05-26 12:30 ` Michael S. Tsirkin 2026-05-26 13:22 ` Stefan Hajnoczi 0 siblings, 2 replies; 59+ messages in thread From: BALATON Zoltan @ 2026-05-26 11:27 UTC (permalink / raw) To: Paolo Bonzini Cc: Peter Maydell, Michael S. Tsirkin, qemu-devel, Hajnoczi, Stefan On Tue, 26 May 2026, Paolo Bonzini wrote: > Il mar 26 mag 2026, 10:23 Peter Maydell <peter.maydell@linaro.org> ha > scritto: > >>> Personally I think QEMU's >>> policy is fine but we should start introducing exceptions, possibly >>> including large contributions with pre-authorization (but not >>> pre-approval) from the maintainer. >> >> I want to note that [...] while we were essentially in the "blanket ban" >> state anyway, there was no particular need to have the discussion about >> other reasons we might also want to be restrictive or cautious about AI >> contributions, but those other reasons and viewpoints don't go away >> automatically with the legal one. >> >> I have quite a lot of sympathy with the rationale behind the Zig policy, >> for instance: https://kristoff.it/blog/contributor-poker-and-ai/ I spend >> quite a lot of time reviewing patches for things which are features I don't >> necessarily personally care about. I'm happy with doing that for other >> people who are hopefully learning and gaining something from the process; >> I'm much less interested in reviewing a mountain of LLM-generated patches. >> > > I agree and that's a good argument for pre-discussion with the maintainers. > It would anyway be the right thing to do for large contributions, but it's > even more important with AI given the different balance between contributor > and reviewer. I think the real problem is people who don't know what they are doing yet use an AI to generate a patch and submit it anyway. Reviewers are then flooded with nonsense that they have to look at to find out if there's anything useful in it which takes their time from doing more useful things. So the policy should make clear that we don't accept patches generated by AI that no human has read and understood before submission and adding a S-o-b should also mean (besides that the submitter made sure there's no copyright infringement) that that person has knowledge about the patch and is willing to correct it. Then reviewers can just bounce AI nonsense back to the conrtibutor or ignore it if they don't reply (or reply with more AI nonsense suggesting they don't know what the patch does so can't correct it). I think that's the real fear that has led to the AI ban and the copyright issues were just a convenient excuse. Maybe clarifying this in the policy could be done although there will always be people who ignore documents. So maybe what we want is no direct submission of AI generated patches without at least a human inbetween who has already reviewed tha patch before sending it to the list. Regards, BALATON Zoltan ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-26 11:27 ` BALATON Zoltan @ 2026-05-26 12:30 ` Michael S. Tsirkin 2026-05-26 12:37 ` Manos Pitsidianakis 2026-05-26 13:22 ` Stefan Hajnoczi 1 sibling, 1 reply; 59+ messages in thread From: Michael S. Tsirkin @ 2026-05-26 12:30 UTC (permalink / raw) To: BALATON Zoltan; +Cc: Paolo Bonzini, Peter Maydell, qemu-devel, Hajnoczi, Stefan On Tue, May 26, 2026 at 01:27:40PM +0200, BALATON Zoltan wrote: > Maybe clarifying this in the policy could be > done although there will always be people who ignore documents. One advantage of the linux style tags is that at least they differ from whatever ai's put in by default. So whoever does it can get banned pretty quickly. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-26 12:30 ` Michael S. Tsirkin @ 2026-05-26 12:37 ` Manos Pitsidianakis 2026-05-26 13:00 ` Michael S. Tsirkin 0 siblings, 1 reply; 59+ messages in thread From: Manos Pitsidianakis @ 2026-05-26 12:37 UTC (permalink / raw) To: Michael S. Tsirkin Cc: BALATON Zoltan, Paolo Bonzini, Peter Maydell, qemu-devel, Hajnoczi, Stefan On Tue, May 26, 2026 at 3:31 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Tue, May 26, 2026 at 01:27:40PM +0200, BALATON Zoltan wrote: > > Maybe clarifying this in the policy could be > > done although there will always be people who ignore documents. > > One advantage of the linux style tags is that at least they differ > from whatever ai's put in by default. So whoever does it can get > banned pretty quickly. > What would be the mechanism for that though? Getting the list administrators involved to ban email addresses from the list? If banning is to be a deterrent, the process and rules should be codified in the docs so that it exists as a warning and there is little room for abuse and ambiguity in both sides. > -- Manos Pitsidianakis Emulation and Virtualization Engineer at Linaro Ltd ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-26 12:37 ` Manos Pitsidianakis @ 2026-05-26 13:00 ` Michael S. Tsirkin 0 siblings, 0 replies; 59+ messages in thread From: Michael S. Tsirkin @ 2026-05-26 13:00 UTC (permalink / raw) To: Manos Pitsidianakis Cc: BALATON Zoltan, Paolo Bonzini, Peter Maydell, qemu-devel, Hajnoczi, Stefan On Tue, May 26, 2026 at 03:37:50PM +0300, Manos Pitsidianakis wrote: > On Tue, May 26, 2026 at 3:31 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > On Tue, May 26, 2026 at 01:27:40PM +0200, BALATON Zoltan wrote: > > > Maybe clarifying this in the policy could be > > > done although there will always be people who ignore documents. > > > > One advantage of the linux style tags is that at least they differ > > from whatever ai's put in by default. So whoever does it can get > > banned pretty quickly. > > > > What would be the mechanism for that though? Getting the list > administrators involved to ban email addresses from the list? maintainers learning to ignore patches from bad actors works well enough. > If banning is to be a deterrent, the process and rules should be > codified in the docs so that it exists as a warning and there is > little room for abuse and ambiguity in both sides. > > > > > -- > Manos Pitsidianakis > Emulation and Virtualization Engineer at Linaro Ltd ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-26 11:27 ` BALATON Zoltan 2026-05-26 12:30 ` Michael S. Tsirkin @ 2026-05-26 13:22 ` Stefan Hajnoczi 2026-05-26 14:01 ` Warner Losh 1 sibling, 1 reply; 59+ messages in thread From: Stefan Hajnoczi @ 2026-05-26 13:22 UTC (permalink / raw) To: BALATON Zoltan Cc: Paolo Bonzini, Peter Maydell, Michael S. Tsirkin, qemu-devel, Hajnoczi, Stefan On Tue, May 26, 2026 at 7:28 AM BALATON Zoltan <balaton@eik.bme.hu> wrote: > On Tue, 26 May 2026, Paolo Bonzini wrote: > > Il mar 26 mag 2026, 10:23 Peter Maydell <peter.maydell@linaro.org> ha > > scritto: > > > >>> Personally I think QEMU's > >>> policy is fine but we should start introducing exceptions, possibly > >>> including large contributions with pre-authorization (but not > >>> pre-approval) from the maintainer. > >> > >> I want to note that [...] while we were essentially in the "blanket ban" > >> state anyway, there was no particular need to have the discussion about > >> other reasons we might also want to be restrictive or cautious about AI > >> contributions, but those other reasons and viewpoints don't go away > >> automatically with the legal one. > >> > >> I have quite a lot of sympathy with the rationale behind the Zig policy, > >> for instance: https://kristoff.it/blog/contributor-poker-and-ai/ I spend > >> quite a lot of time reviewing patches for things which are features I don't > >> necessarily personally care about. I'm happy with doing that for other > >> people who are hopefully learning and gaining something from the process; > >> I'm much less interested in reviewing a mountain of LLM-generated patches. > >> > > > > I agree and that's a good argument for pre-discussion with the maintainers. > > It would anyway be the right thing to do for large contributions, but it's > > even more important with AI given the different balance between contributor > > and reviewer. > > I think the real problem is people who don't know what they are doing yet > use an AI to generate a patch and submit it anyway. Reviewers are then > flooded with nonsense that they have to look at to find out if there's > anything useful in it which takes their time from doing more useful > things. So the policy should make clear that we don't accept patches > generated by AI that no human has read and understood before submission > and adding a S-o-b should also mean (besides that the submitter made sure > there's no copyright infringement) that that person has knowledge about > the patch and is willing to correct it. Then reviewers can just bounce AI > nonsense back to the conrtibutor or ignore it if they don't reply (or > reply with more AI nonsense suggesting they don't know what the patch does > so can't correct it). I think that's the real fear that has led to the AI > ban and the copyright issues were just a convenient excuse. Maybe > clarifying this in the policy could be done although there will always be > people who ignore documents. So maybe what we want is no direct submission > of AI generated patches without at least a human inbetween who has already > reviewed tha patch before sending it to the list. That sounds reasonable. Stefan ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-26 13:22 ` Stefan Hajnoczi @ 2026-05-26 14:01 ` Warner Losh 0 siblings, 0 replies; 59+ messages in thread From: Warner Losh @ 2026-05-26 14:01 UTC (permalink / raw) To: Stefan Hajnoczi Cc: BALATON Zoltan, Paolo Bonzini, Peter Maydell, Michael S. Tsirkin, qemu-devel, Hajnoczi, Stefan [-- Attachment #1: Type: text/plain, Size: 3174 bytes --] On Tue, May 26, 2026, 7:24 AM Stefan Hajnoczi <stefanha@gmail.com> wrote: > On Tue, May 26, 2026 at 7:28 AM BALATON Zoltan <balaton@eik.bme.hu> wrote: > > On Tue, 26 May 2026, Paolo Bonzini wrote: > > > Il mar 26 mag 2026, 10:23 Peter Maydell <peter.maydell@linaro.org> ha > > > scritto: > > > > > >>> Personally I think QEMU's > > >>> policy is fine but we should start introducing exceptions, possibly > > >>> including large contributions with pre-authorization (but not > > >>> pre-approval) from the maintainer. > > >> > > >> I want to note that [...] while we were essentially in the "blanket > ban" > > >> state anyway, there was no particular need to have the discussion > about > > >> other reasons we might also want to be restrictive or cautious about > AI > > >> contributions, but those other reasons and viewpoints don't go away > > >> automatically with the legal one. > > >> > > >> I have quite a lot of sympathy with the rationale behind the Zig > policy, > > >> for instance: https://kristoff.it/blog/contributor-poker-and-ai/ I > spend > > >> quite a lot of time reviewing patches for things which are features I > don't > > >> necessarily personally care about. I'm happy with doing that for other > > >> people who are hopefully learning and gaining something from the > process; > > >> I'm much less interested in reviewing a mountain of LLM-generated > patches. > > >> > > > > > > I agree and that's a good argument for pre-discussion with the > maintainers. > > > It would anyway be the right thing to do for large contributions, but > it's > > > even more important with AI given the different balance between > contributor > > > and reviewer. > > > > I think the real problem is people who don't know what they are doing yet > > use an AI to generate a patch and submit it anyway. Reviewers are then > > flooded with nonsense that they have to look at to find out if there's > > anything useful in it which takes their time from doing more useful > > things. So the policy should make clear that we don't accept patches > > generated by AI that no human has read and understood before submission > > and adding a S-o-b should also mean (besides that the submitter made sure > > there's no copyright infringement) that that person has knowledge about > > the patch and is willing to correct it. Then reviewers can just bounce AI > > nonsense back to the conrtibutor or ignore it if they don't reply (or > > reply with more AI nonsense suggesting they don't know what the patch > does > > so can't correct it). I think that's the real fear that has led to the AI > > ban and the copyright issues were just a convenient excuse. Maybe > > clarifying this in the policy could be done although there will always be > > people who ignore documents. So maybe what we want is no direct > submission > > of AI generated patches without at least a human inbetween who has > already > > reviewed tha patch before sending it to the list. > > That sounds reasonable. > I agree. And for large submission we can have a smaller limit for AI or other poorly explained code. Warner > [-- Attachment #2: Type: text/html, Size: 4651 bytes --] ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-26 8:23 ` Peter Maydell 2026-05-26 9:28 ` Alex Bennée 2026-05-26 9:57 ` Paolo Bonzini @ 2026-05-27 7:11 ` Philippe Mathieu-Daudé 2 siblings, 0 replies; 59+ messages in thread From: Philippe Mathieu-Daudé @ 2026-05-27 7:11 UTC (permalink / raw) To: Peter Maydell, Paolo Bonzini; +Cc: Michael S. Tsirkin, qemu-devel, stefanha On 26/5/26 10:23, Peter Maydell wrote: > I have quite a lot of sympathy with the rationale behind the > Zig policy, for instance: > https://kristoff.it/blog/contributor-poker-and-ai/ Thanks for sharing this link! > I spend quite a lot of time reviewing patches for things which > are features I don't necessarily personally care about. I'm > happy with doing that for other people who are hopefully > learning and gaining something from the process; I'm much > less interested in reviewing a mountain of LLM-generated patches. > > thanks > -- PMM > ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-24 12:42 on ai generated and code provenance Michael S. Tsirkin 2026-05-24 17:06 ` Alex Bennée 2026-05-25 16:32 ` Paolo Bonzini @ 2026-05-26 17:43 ` Kevin Wolf 2026-05-26 18:03 ` Michael S. Tsirkin 2 siblings, 1 reply; 59+ messages in thread From: Kevin Wolf @ 2026-05-26 17:43 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: qemu-devel, stefanha Am 24.05.2026 um 14:42 hat Michael S. Tsirkin geschrieben: > So, I had to reject a perfectly reasonable patch: > https://lore.kernel.org/qemu-devel/20260320193746.242704-1-jinpu.wang@ionos.com/ > just because of a tool used to make it. > > > How contributors could comply with DCO terms (b) or (c) for the output of AI > content generators commonly available today is unclear. The QEMU project is > not willing or able to accept the legal risks of non-compliance. > > > But, since this was written, Red Hat's Richard Fontana and Chris Wright > published this piece: > https://www.redhat.com/en/blog/ai-assisted-development-and-open-source-navigating-legal-issues > > > Saying, in particular " > We understand this concern, but the DCO has never > been interpreted to require that every line of a contribution must be > the personal creative expression of the contributor or another human > developer. > " I never found that blog post particularly convincing, especially because they acknowledge a concern: There are two versions of this concern. The first is practical: that an AI tool could covertly insert excerpts of proprietary (or license-incompatible) code into an open source project, potentially creating legal risk for maintainers and users. The second is broader and more philosophical: that large language models, trained on vast amounts of open source software, are essentially misappropriating the community’s work, producing outputs stripped of the obligations that open source licenses require. We think these concerns deserve to be taken seriously. The second one is essentially what I understood the QEMU policy to be about. Unfortunately, the blog post then goes on to only ever deal with the first one and ignore the second one that seems more relevant for us. So yes, the DCO isn't about "personal creative expression" or whatever (and nobody suggested it is, this is a strawman), but it's about whether the submitter has the legal rights to submit the code. And that's exactly the question we decided we don't want to take a risk on. So if that part isn't helpful, what has changed since we introduced the AI policy? It's a few points: 1. While AI has been in use for a while now, we haven't seen projects accepting AI generated code/content get into big trouble. While it could still happen in the future, it might be an indication that the probability of the risk hitting us is not that high. 2. The useful part of the blog post is that it tells us that Red Hat considers the risk acceptable. This can inform our assessment of the risks, though of course there might be a significant difference in the impact of the risk for a company with a legal department and an open source community consisting mainly of developers acting as individuals. I think it's obvious that if the QEMU project gets involved in a legal case, we have a problem (at the very least long lasting distraction from actual work on QEMU), even if we didn't do anything wrong and a good lawyer would easily win the case. 3. It was easy to just outright ban AI while its results were usually not really usable anyway. This has changed meanwhile, so it's much harder to maintain an absolute ban. It's not really the best use of my time to look at the idea in AI-generated test cases and then rewrite them from scratch so I can actually submit them. (On the other hand, I think my rewritten submissions were always better and more maintainable than what AI produced initially, so there's that.) So while my perspective is a lot more nuanced than yours, I do see a shift in the balance and was actually thinking of suggesting a change of the policy myself. What I was thinking of was allowing AI-generated content in places where it's at least easy to revert if there is ever a problem with it: Tests, documentation etc., but not core code that lots of other things depend on and that will have evolved a lot when we notice a problem and for which throwing away is simply not an option. > I propose adopting linux's rules instead: > https://docs.kernel.org/process/coding-assistants.html > > which boils down to attribution. What would we actually do with the detailed information? Why do we care which model was used? Is this helpful commit metadata or is it just free advertising for a handful of companies? I think I would see more use in a tag like (better name welcome): AI-used-for: [code|tests|docs|commit message]... Kevin ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-26 17:43 ` Kevin Wolf @ 2026-05-26 18:03 ` Michael S. Tsirkin 2026-05-26 18:59 ` Kevin Wolf 0 siblings, 1 reply; 59+ messages in thread From: Michael S. Tsirkin @ 2026-05-26 18:03 UTC (permalink / raw) To: Kevin Wolf; +Cc: qemu-devel, stefanha On Tue, May 26, 2026 at 07:43:35PM +0200, Kevin Wolf wrote: > Am 24.05.2026 um 14:42 hat Michael S. Tsirkin geschrieben: > > So, I had to reject a perfectly reasonable patch: > > https://lore.kernel.org/qemu-devel/20260320193746.242704-1-jinpu.wang@ionos.com/ > > just because of a tool used to make it. > > > > > > How contributors could comply with DCO terms (b) or (c) for the output of AI > > content generators commonly available today is unclear. The QEMU project is > > not willing or able to accept the legal risks of non-compliance. > > > > > > But, since this was written, Red Hat's Richard Fontana and Chris Wright > > published this piece: > > https://www.redhat.com/en/blog/ai-assisted-development-and-open-source-navigating-legal-issues > > > > > > Saying, in particular " > > We understand this concern, but the DCO has never > > been interpreted to require that every line of a contribution must be > > the personal creative expression of the contributor or another human > > developer. > > " > > I never found that blog post particularly convincing, especially because > they acknowledge a concern: > > There are two versions of this concern. The first is practical: that > an AI tool could covertly insert excerpts of proprietary (or > license-incompatible) code into an open source project, potentially > creating legal risk for maintainers and users. The second is broader > and more philosophical: that large language models, trained on vast > amounts of open source software, are essentially misappropriating > the community’s work, producing outputs stripped of the obligations > that open source licenses require. > > We think these concerns deserve to be taken seriously. > > The second one is essentially what I understood the QEMU policy to be > about. Unfortunately, the blog post then goes on to only ever deal with > the first one and ignore the second one that seems more relevant for us. > > So yes, the DCO isn't about "personal creative expression" or whatever > (and nobody suggested it is, this is a strawman), but it's about whether > the submitter has the legal rights to submit the code. And that's > exactly the question we decided we don't want to take a risk on. > > > So if that part isn't helpful, what has changed since we introduced the > AI policy? It's a few points: > > 1. While AI has been in use for a while now, we haven't seen projects > accepting AI generated code/content get into big trouble. While it > could still happen in the future, it might be an indication that the > probability of the risk hitting us is not that high. > > 2. The useful part of the blog post is that it tells us that Red Hat > considers the risk acceptable. This can inform our assessment of the > risks, though of course there might be a significant difference in > the impact of the risk for a company with a legal department and an > open source community consisting mainly of developers acting as > individuals. > > I think it's obvious that if the QEMU project gets involved in a > legal case, we have a problem (at the very least long lasting > distraction from actual work on QEMU), even if we didn't do anything > wrong and a good lawyer would easily win the case. > > 3. It was easy to just outright ban AI while its results were usually > not really usable anyway. This has changed meanwhile, so it's much > harder to maintain an absolute ban. > > It's not really the best use of my time to look at the idea in > AI-generated test cases and then rewrite them from scratch so I can > actually submit them. (On the other hand, I think my rewritten > submissions were always better and more maintainable than what AI > produced initially, so there's that.) > > So while my perspective is a lot more nuanced than yours, I do see a > shift in the balance and was actually thinking of suggesting a change of > the policy myself. > > What I was thinking of was allowing AI-generated content in places where > it's at least easy to revert if there is ever a problem with it: Tests, > documentation etc., but not core code that lots of other things depend > on and that will have evolved a lot when we notice a problem and for > which throwing away is simply not an option. OK. what about trivial changes? Using AI as a better sed? > > I propose adopting linux's rules instead: > > https://docs.kernel.org/process/coding-assistants.html > > > > which boils down to attribution. > > What would we actually do with the detailed information? Why do we care > which model was used? Is this helpful commit metadata or is it just free > advertising for a handful of companies? I presume, if a specific model is somehow declared "contaminated" so we can locate its output? > I think I would see more use in a tag like (better name welcome): > > AI-used-for: [code|tests|docs|commit message]... > > Kevin I surely don't mind. -- MST ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-26 18:03 ` Michael S. Tsirkin @ 2026-05-26 18:59 ` Kevin Wolf 2026-05-26 19:30 ` Michael S. Tsirkin 2026-05-26 19:50 ` Michael S. Tsirkin 0 siblings, 2 replies; 59+ messages in thread From: Kevin Wolf @ 2026-05-26 18:59 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: qemu-devel, stefanha Am 26.05.2026 um 20:03 hat Michael S. Tsirkin geschrieben: > On Tue, May 26, 2026 at 07:43:35PM +0200, Kevin Wolf wrote: > > Am 24.05.2026 um 14:42 hat Michael S. Tsirkin geschrieben: > > > So, I had to reject a perfectly reasonable patch: > > > https://lore.kernel.org/qemu-devel/20260320193746.242704-1-jinpu.wang@ionos.com/ > > > just because of a tool used to make it. > > > > > > > > > How contributors could comply with DCO terms (b) or (c) for the output of AI > > > content generators commonly available today is unclear. The QEMU project is > > > not willing or able to accept the legal risks of non-compliance. > > > > > > > > > But, since this was written, Red Hat's Richard Fontana and Chris Wright > > > published this piece: > > > https://www.redhat.com/en/blog/ai-assisted-development-and-open-source-navigating-legal-issues > > > > > > > > > Saying, in particular " > > > We understand this concern, but the DCO has never > > > been interpreted to require that every line of a contribution must be > > > the personal creative expression of the contributor or another human > > > developer. > > > " > > > > I never found that blog post particularly convincing, especially because > > they acknowledge a concern: > > > > There are two versions of this concern. The first is practical: that > > an AI tool could covertly insert excerpts of proprietary (or > > license-incompatible) code into an open source project, potentially > > creating legal risk for maintainers and users. The second is broader > > and more philosophical: that large language models, trained on vast > > amounts of open source software, are essentially misappropriating > > the community’s work, producing outputs stripped of the obligations > > that open source licenses require. > > > > We think these concerns deserve to be taken seriously. > > > > The second one is essentially what I understood the QEMU policy to be > > about. Unfortunately, the blog post then goes on to only ever deal with > > the first one and ignore the second one that seems more relevant for us. > > > > So yes, the DCO isn't about "personal creative expression" or whatever > > (and nobody suggested it is, this is a strawman), but it's about whether > > the submitter has the legal rights to submit the code. And that's > > exactly the question we decided we don't want to take a risk on. > > > > > > So if that part isn't helpful, what has changed since we introduced the > > AI policy? It's a few points: > > > > 1. While AI has been in use for a while now, we haven't seen projects > > accepting AI generated code/content get into big trouble. While it > > could still happen in the future, it might be an indication that the > > probability of the risk hitting us is not that high. > > > > 2. The useful part of the blog post is that it tells us that Red Hat > > considers the risk acceptable. This can inform our assessment of the > > risks, though of course there might be a significant difference in > > the impact of the risk for a company with a legal department and an > > open source community consisting mainly of developers acting as > > individuals. > > > > I think it's obvious that if the QEMU project gets involved in a > > legal case, we have a problem (at the very least long lasting > > distraction from actual work on QEMU), even if we didn't do anything > > wrong and a good lawyer would easily win the case. > > > > 3. It was easy to just outright ban AI while its results were usually > > not really usable anyway. This has changed meanwhile, so it's much > > harder to maintain an absolute ban. > > > > It's not really the best use of my time to look at the idea in > > AI-generated test cases and then rewrite them from scratch so I can > > actually submit them. (On the other hand, I think my rewritten > > submissions were always better and more maintainable than what AI > > produced initially, so there's that.) > > > > So while my perspective is a lot more nuanced than yours, I do see a > > shift in the balance and was actually thinking of suggesting a change of > > the policy myself. > > > > What I was thinking of was allowing AI-generated content in places where > > it's at least easy to revert if there is ever a problem with it: Tests, > > documentation etc., but not core code that lots of other things depend > > on and that will have evolved a lot when we notice a problem and for > > which throwing away is simply not an option. > > OK. what about trivial changes? Using AI as a better sed? The above is just what I was thinking of suggesting myself. I didn't mean to imply that I'm opposed to anything else, but just thought I'd post it as an example of fairly obvious things we could allow. Of course, it also shows my own pain points. I don't see that much use in it for generating code for QEMU proper, because these changes tend to be few lines and I have an opinion on each of the lines - tests are the opposite, lots of boilerplate and I don't care much how elegant they are because nothing else will build on them anyway. So yes, trivial patches is another obvious starting point. The challenge there is defining the line where a patch stops being trivial. So I'm not completely sure if making this distinction in a policy is a good idea; maybe practically speaking it has to be all or nothing in terms of creativity (for lack of a better word). As an aside, personally, I'm not convinced that AI can be a "better sed". If it's really about mechanical changes, I think the resulting patch is much more reviewable if the agent doesn't modify the code, but just generate the sed command line or the Coccinelle patch and that is included in the commit message. Reviewers can then just review that and then reproduce the result themselves for comparison. This is impossible with AI prompts and agents do tend to forget an instance of something to replace here and there, so you do have to review the result carefully. But none of these "better sed" problems need to handled in an AI policy. If a patch is hard to review, the maintainer will already reject it on those grounds. > > > I propose adopting linux's rules instead: > > > https://docs.kernel.org/process/coding-assistants.html > > > > > > which boils down to attribution. > > > > What would we actually do with the detailed information? Why do we care > > which model was used? Is this helpful commit metadata or is it just free > > advertising for a handful of companies? > > I presume, if a specific model is somehow declared "contaminated" so we > can locate its output? Contaminated in what respect? Quality? Might be because of malicious intentions or just because the model happens to be bad at a specific question. Review and testing must be able to catch quality problems. I don't think this is different from any other contributions. Copyright? If so, then we're back to "can you really sign the DCO?" Something completely different? > > I think I would see more use in a tag like (better name welcome): > > > > AI-used-for: [code|tests|docs|commit message]... > > > > Kevin > > I surely don't mind. Great. Let's see what others think. Kevin ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-26 18:59 ` Kevin Wolf @ 2026-05-26 19:30 ` Michael S. Tsirkin 2026-05-26 19:52 ` Warner Losh 2026-05-26 19:50 ` Michael S. Tsirkin 1 sibling, 1 reply; 59+ messages in thread From: Michael S. Tsirkin @ 2026-05-26 19:30 UTC (permalink / raw) To: Kevin Wolf; +Cc: qemu-devel, stefanha On Tue, May 26, 2026 at 08:59:55PM +0200, Kevin Wolf wrote: > Am 26.05.2026 um 20:03 hat Michael S. Tsirkin geschrieben: > > On Tue, May 26, 2026 at 07:43:35PM +0200, Kevin Wolf wrote: > > > Am 24.05.2026 um 14:42 hat Michael S. Tsirkin geschrieben: > > > > So, I had to reject a perfectly reasonable patch: > > > > https://lore.kernel.org/qemu-devel/20260320193746.242704-1-jinpu.wang@ionos.com/ > > > > just because of a tool used to make it. > > > > > > > > > > > > How contributors could comply with DCO terms (b) or (c) for the output of AI > > > > content generators commonly available today is unclear. The QEMU project is > > > > not willing or able to accept the legal risks of non-compliance. > > > > > > > > > > > > But, since this was written, Red Hat's Richard Fontana and Chris Wright > > > > published this piece: > > > > https://www.redhat.com/en/blog/ai-assisted-development-and-open-source-navigating-legal-issues > > > > > > > > > > > > Saying, in particular " > > > > We understand this concern, but the DCO has never > > > > been interpreted to require that every line of a contribution must be > > > > the personal creative expression of the contributor or another human > > > > developer. > > > > " > > > > > > I never found that blog post particularly convincing, especially because > > > they acknowledge a concern: > > > > > > There are two versions of this concern. The first is practical: that > > > an AI tool could covertly insert excerpts of proprietary (or > > > license-incompatible) code into an open source project, potentially > > > creating legal risk for maintainers and users. The second is broader > > > and more philosophical: that large language models, trained on vast > > > amounts of open source software, are essentially misappropriating > > > the community’s work, producing outputs stripped of the obligations > > > that open source licenses require. > > > > > > We think these concerns deserve to be taken seriously. > > > > > > The second one is essentially what I understood the QEMU policy to be > > > about. Unfortunately, the blog post then goes on to only ever deal with > > > the first one and ignore the second one that seems more relevant for us. > > > > > > So yes, the DCO isn't about "personal creative expression" or whatever > > > (and nobody suggested it is, this is a strawman), but it's about whether > > > the submitter has the legal rights to submit the code. And that's > > > exactly the question we decided we don't want to take a risk on. > > > > > > > > > So if that part isn't helpful, what has changed since we introduced the > > > AI policy? It's a few points: > > > > > > 1. While AI has been in use for a while now, we haven't seen projects > > > accepting AI generated code/content get into big trouble. While it > > > could still happen in the future, it might be an indication that the > > > probability of the risk hitting us is not that high. > > > > > > 2. The useful part of the blog post is that it tells us that Red Hat > > > considers the risk acceptable. This can inform our assessment of the > > > risks, though of course there might be a significant difference in > > > the impact of the risk for a company with a legal department and an > > > open source community consisting mainly of developers acting as > > > individuals. > > > > > > I think it's obvious that if the QEMU project gets involved in a > > > legal case, we have a problem (at the very least long lasting > > > distraction from actual work on QEMU), even if we didn't do anything > > > wrong and a good lawyer would easily win the case. > > > > > > 3. It was easy to just outright ban AI while its results were usually > > > not really usable anyway. This has changed meanwhile, so it's much > > > harder to maintain an absolute ban. > > > > > > It's not really the best use of my time to look at the idea in > > > AI-generated test cases and then rewrite them from scratch so I can > > > actually submit them. (On the other hand, I think my rewritten > > > submissions were always better and more maintainable than what AI > > > produced initially, so there's that.) > > > > > > So while my perspective is a lot more nuanced than yours, I do see a > > > shift in the balance and was actually thinking of suggesting a change of > > > the policy myself. > > > > > > What I was thinking of was allowing AI-generated content in places where > > > it's at least easy to revert if there is ever a problem with it: Tests, > > > documentation etc., but not core code that lots of other things depend > > > on and that will have evolved a lot when we notice a problem and for > > > which throwing away is simply not an option. > > > > OK. what about trivial changes? Using AI as a better sed? > > The above is just what I was thinking of suggesting myself. I didn't > mean to imply that I'm opposed to anything else, but just thought I'd > post it as an example of fairly obvious things we could allow. > > Of course, it also shows my own pain points. I don't see that much use > in it for generating code for QEMU proper, because these changes tend to > be few lines and I have an opinion on each of the lines - tests are the > opposite, lots of boilerplate and I don't care much how elegant they > are because nothing else will build on them anyway. > > So yes, trivial patches is another obvious starting point. The challenge > there is defining the line where a patch stops being trivial. So I'm not > completely sure if making this distinction in a policy is a good idea; > maybe practically speaking it has to be all or nothing in terms of > creativity (for lack of a better word). Let the maintainers decide? Or we can enumerate things: - fixing tool (compiler/checkpatch/smatch) errors/warnings in obvious ways (e.g. suggested by the tools itself, such as initializing an uninitialized variable) - propagating API changes (e.g. rebasing a patch after an API change) - anything that could be done by a perl/sed/coccinelle script - adding or fixing code comments > As an aside, personally, I'm not convinced that AI can be a "better > sed". If it's really about mechanical changes, I think the resulting > patch is much more reviewable if the agent doesn't modify the code, but > just generate the sed command line or the Coccinelle patch and that is > included in the commit message. Reviewers can then just review that and > then reproduce the result themselves for comparison. This is impossible > with AI prompts and agents do tend to forget an instance of something to > replace here and there, so you do have to review the result carefully. > > But none of these "better sed" problems need to handled in an AI policy. > If a patch is hard to review, the maintainer will already reject it on > those grounds. Absolutely. > > > > I propose adopting linux's rules instead: > > > > https://docs.kernel.org/process/coding-assistants.html > > > > > > > > which boils down to attribution. > > > > > > What would we actually do with the detailed information? Why do we care > > > which model was used? Is this helpful commit metadata or is it just free > > > advertising for a handful of companies? > > > > I presume, if a specific model is somehow declared "contaminated" so we > > can locate its output? > > Contaminated in what respect? > > Quality? Might be because of malicious intentions or just because the > model happens to be bad at a specific question. Review and testing must > be able to catch quality problems. I don't think this is different from > any other contributions. > > Copyright? If so, then we're back to "can you really sign the DCO?" > > Something completely different? > > > > I think I would see more use in a tag like (better name welcome): > > > > > > AI-used-for: [code|tests|docs|commit message]... > > > > > > Kevin > > > > I surely don't mind. > > Great. Let's see what others think. > > Kevin ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-26 19:30 ` Michael S. Tsirkin @ 2026-05-26 19:52 ` Warner Losh 2026-05-27 8:41 ` Kevin Wolf 0 siblings, 1 reply; 59+ messages in thread From: Warner Losh @ 2026-05-26 19:52 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: Kevin Wolf, qemu-devel, stefanha [-- Attachment #1: Type: text/plain, Size: 10033 bytes --] On Tue, May 26, 2026 at 1:32 PM Michael S. Tsirkin <mst@redhat.com> wrote: > On Tue, May 26, 2026 at 08:59:55PM +0200, Kevin Wolf wrote: > > Am 26.05.2026 um 20:03 hat Michael S. Tsirkin geschrieben: > > > On Tue, May 26, 2026 at 07:43:35PM +0200, Kevin Wolf wrote: > > > > Am 24.05.2026 um 14:42 hat Michael S. Tsirkin geschrieben: > > > > > So, I had to reject a perfectly reasonable patch: > > > > > > https://lore.kernel.org/qemu-devel/20260320193746.242704-1-jinpu.wang@ionos.com/ > > > > > just because of a tool used to make it. > > > > > > > > > > > > > > > How contributors could comply with DCO terms (b) or (c) > for the output of AI > > > > > content generators commonly available today is unclear. > The QEMU project is > > > > > not willing or able to accept the legal risks of > non-compliance. > > > > > > > > > > > > > > > But, since this was written, Red Hat's Richard Fontana and Chris > Wright > > > > > published this piece: > > > > > > https://www.redhat.com/en/blog/ai-assisted-development-and-open-source-navigating-legal-issues > > > > > > > > > > > > > > > Saying, in particular " > > > > > We understand this concern, but the DCO has never > > > > > been interpreted to require that every line of a > contribution must be > > > > > the personal creative expression of the contributor or > another human > > > > > developer. > > > > > " > > > > > > > > I never found that blog post particularly convincing, especially > because > > > > they acknowledge a concern: > > > > > > > > There are two versions of this concern. The first is practical: > that > > > > an AI tool could covertly insert excerpts of proprietary (or > > > > license-incompatible) code into an open source project, > potentially > > > > creating legal risk for maintainers and users. The second is > broader > > > > and more philosophical: that large language models, trained on > vast > > > > amounts of open source software, are essentially misappropriating > > > > the community’s work, producing outputs stripped of the > obligations > > > > that open source licenses require. > > > > > > > > We think these concerns deserve to be taken seriously. > > > > > > > > The second one is essentially what I understood the QEMU policy to be > > > > about. Unfortunately, the blog post then goes on to only ever deal > with > > > > the first one and ignore the second one that seems more relevant for > us. > > > > > > > > So yes, the DCO isn't about "personal creative expression" or > whatever > > > > (and nobody suggested it is, this is a strawman), but it's about > whether > > > > the submitter has the legal rights to submit the code. And that's > > > > exactly the question we decided we don't want to take a risk on. > > > > > > > > > > > > So if that part isn't helpful, what has changed since we introduced > the > > > > AI policy? It's a few points: > > > > > > > > 1. While AI has been in use for a while now, we haven't seen projects > > > > accepting AI generated code/content get into big trouble. While it > > > > could still happen in the future, it might be an indication that > the > > > > probability of the risk hitting us is not that high. > > > > > > > > 2. The useful part of the blog post is that it tells us that Red Hat > > > > considers the risk acceptable. This can inform our assessment of > the > > > > risks, though of course there might be a significant difference in > > > > the impact of the risk for a company with a legal department and > an > > > > open source community consisting mainly of developers acting as > > > > individuals. > > > > > > > > I think it's obvious that if the QEMU project gets involved in a > > > > legal case, we have a problem (at the very least long lasting > > > > distraction from actual work on QEMU), even if we didn't do > anything > > > > wrong and a good lawyer would easily win the case. > > > > > > > > 3. It was easy to just outright ban AI while its results were usually > > > > not really usable anyway. This has changed meanwhile, so it's much > > > > harder to maintain an absolute ban. > > > > > > > > It's not really the best use of my time to look at the idea in > > > > AI-generated test cases and then rewrite them from scratch so I > can > > > > actually submit them. (On the other hand, I think my rewritten > > > > submissions were always better and more maintainable than what AI > > > > produced initially, so there's that.) > > > > > > > > So while my perspective is a lot more nuanced than yours, I do see a > > > > shift in the balance and was actually thinking of suggesting a > change of > > > > the policy myself. > > > > > > > > What I was thinking of was allowing AI-generated content in places > where > > > > it's at least easy to revert if there is ever a problem with it: > Tests, > > > > documentation etc., but not core code that lots of other things > depend > > > > on and that will have evolved a lot when we notice a problem and for > > > > which throwing away is simply not an option. > > > > > > OK. what about trivial changes? Using AI as a better sed? > > > > The above is just what I was thinking of suggesting myself. I didn't > > mean to imply that I'm opposed to anything else, but just thought I'd > > post it as an example of fairly obvious things we could allow. > > > > Of course, it also shows my own pain points. I don't see that much use > > in it for generating code for QEMU proper, because these changes tend to > > be few lines and I have an opinion on each of the lines - tests are the > > opposite, lots of boilerplate and I don't care much how elegant they > > are because nothing else will build on them anyway. > > > > So yes, trivial patches is another obvious starting point. The challenge > > there is defining the line where a patch stops being trivial. So I'm not > > completely sure if making this distinction in a policy is a good idea; > > maybe practically speaking it has to be all or nothing in terms of > > creativity (for lack of a better word). > > Let the maintainers decide? > > Or we can enumerate things: > - fixing tool (compiler/checkpatch/smatch) errors/warnings in obvious ways > (e.g. suggested by the > tools itself, such as initializing an uninitialized variable) > - propagating API changes (e.g. rebasing a patch after an API change) > - anything that could be done by a perl/sed/coccinelle script > - adding or fixing code comments > Those are good examples. Perhaps the following words are good place to start to frame what I've seen expressed here: The QEMU Project currently may accept limited uses of AI that produce high quality patches that are limited in the creative content added. While maintainers will ultimately decide, changes like the following fall within this policy 1. Fixing obvious warnings in the obvious ways suggested by the tool 2. Tree wide API changes, and other similar mechanical changes done today with perl/python/sed/coccinelle 3. Limited, small changes to fix bugs or add a small new feature whose scope is less than about 100 lines and the originator can explain them all or the meta issues about the patch. Maintainers are free to accept or reject changes outside these guidelines, but please check with the maintainers before sending to keep the load from AI content to something they can manage. Large and Very Large patches, especailly ones that have not been deeply analyised and tested by humans, should be avoided. Though maybe the list of 'exceptions' needs work. But the basic framing is that we will accept some, high quality patches. Maintainers have some discression for larger pieces to a point, and we still don't want to drown in AI slop. Warner > > > As an aside, personally, I'm not convinced that AI can be a "better > > sed". If it's really about mechanical changes, I think the resulting > > patch is much more reviewable if the agent doesn't modify the code, but > > just generate the sed command line or the Coccinelle patch and that is > > included in the commit message. Reviewers can then just review that and > > then reproduce the result themselves for comparison. This is impossible > > with AI prompts and agents do tend to forget an instance of something to > > replace here and there, so you do have to review the result carefully. > > > > But none of these "better sed" problems need to handled in an AI policy. > > If a patch is hard to review, the maintainer will already reject it on > > those grounds. > > Absolutely. > > > > > > I propose adopting linux's rules instead: > > > > > https://docs.kernel.org/process/coding-assistants.html > > > > > > > > > > which boils down to attribution. > > > > > > > > What would we actually do with the detailed information? Why do we > care > > > > which model was used? Is this helpful commit metadata or is it just > free > > > > advertising for a handful of companies? > > > > > > I presume, if a specific model is somehow declared "contaminated" so we > > > can locate its output? > > > > Contaminated in what respect? > > > > Quality? Might be because of malicious intentions or just because the > > model happens to be bad at a specific question. Review and testing must > > be able to catch quality problems. I don't think this is different from > > any other contributions. > > > > Copyright? If so, then we're back to "can you really sign the DCO?" > > > > Something completely different? > > > > > > I think I would see more use in a tag like (better name welcome): > > > > > > > > AI-used-for: [code|tests|docs|commit message]... > > > > > > > > Kevin > > > > > > I surely don't mind. > > > > Great. Let's see what others think. > > > > Kevin > > > [-- Attachment #2: Type: text/html, Size: 12827 bytes --] ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-26 19:52 ` Warner Losh @ 2026-05-27 8:41 ` Kevin Wolf 2026-05-27 10:01 ` Paolo Bonzini 0 siblings, 1 reply; 59+ messages in thread From: Kevin Wolf @ 2026-05-27 8:41 UTC (permalink / raw) To: Warner Losh; +Cc: Michael S. Tsirkin, qemu-devel, stefanha Am 26.05.2026 um 21:52 hat Warner Losh geschrieben: > On Tue, May 26, 2026 at 1:32 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > On Tue, May 26, 2026 at 08:59:55PM +0200, Kevin Wolf wrote: > > > So yes, trivial patches is another obvious starting point. The challenge > > > there is defining the line where a patch stops being trivial. So I'm not > > > completely sure if making this distinction in a policy is a good idea; > > > maybe practically speaking it has to be all or nothing in terms of > > > creativity (for lack of a better word). > > > > Let the maintainers decide? > > > > Or we can enumerate things: > > - fixing tool (compiler/checkpatch/smatch) errors/warnings in obvious ways > > (e.g. suggested by the > > tools itself, such as initializing an uninitialized variable) > > - propagating API changes (e.g. rebasing a patch after an API change) > > - anything that could be done by a perl/sed/coccinelle script > > - adding or fixing code comments > > > > Those are good examples. Perhaps the following words are good place to start > to frame what I've seen expressed here: > > The QEMU Project currently may accept limited uses of AI that produce > high quality patches that are limited in the creative content added. > While maintainers will ultimately decide, changes like the following > fall within this policy > 1. Fixing obvious warnings in the obvious ways suggested by the tool > 2. Tree wide API changes, and other similar mechanical changes done > today with perl/python/sed/coccinelle As I said in the paragraph you quoted below, I don't think we should encourage using AI for tasks that a deterministic tool could do. If you can use a deterministic tool like sed or Coccinelle for the job, you should. I know that writing Coccinelle spatches can be challenging; that is the part that you can ask AI to help with. (Perl and Python follow the same logic as long as the script is simple, but obviously you have to stop when the helper script becomes almost as complex as the change itself.) Letting AI perform the change directly instead may be an acceptable shortcut for a one-man hobby project that nobody else will ever look at, but in the context of a community project like QEMU in which your changes have to be reviewed and understood by others, it matters a lot that the output of the tool is reproducible. Otherwise, you're creating unnecessary work for others, and that isn't acceptable. So maybe we should even explicitly mention a recommendation like the following: If you can use a deterministic tool, don't use AI instead. If you don't know how to use the deterministic tool, use the AI to tell you how to use it instead of trying to replace it. > 3. Limited, small changes to fix bugs or add a small new feature whose > scope is less than about 100 lines and the originator can explain > them all or the meta issues about the patch. Not sure if mentioning a number of lines is wise. 100 lines can be mostly boilerplate and simple sequential code or they can be a deeply nested complex algorithm. > Maintainers are free to accept or reject changes outside these > guidelines, but please check with the maintainers before sending to > keep the load from AI content to something they can manage. Large and > Very Large patches, especailly ones that have not been deeply > analyised and tested by humans, should be avoided. > > Though maybe the list of 'exceptions' needs work. But the basic > framing is that we will accept some, high quality patches. Maintainers > have some discression for larger pieces to a point, and we still don't > want to drown in AI slop. Yes, if we decide that we do want to make patch complexity/creative expression/whatever you may call it part of the criteria, then having a list like this looks like a possible approach. The details of what exactly should be in it would certainly lead to more discussion, though. Kevin > Warner > > > > > > > As an aside, personally, I'm not convinced that AI can be a "better > > > sed". If it's really about mechanical changes, I think the resulting > > > patch is much more reviewable if the agent doesn't modify the code, but > > > just generate the sed command line or the Coccinelle patch and that is > > > included in the commit message. Reviewers can then just review that and > > > then reproduce the result themselves for comparison. This is impossible > > > with AI prompts and agents do tend to forget an instance of something to > > > replace here and there, so you do have to review the result carefully. > > > > > > But none of these "better sed" problems need to handled in an AI policy. > > > If a patch is hard to review, the maintainer will already reject it on > > > those grounds. > > > > Absolutely. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-27 8:41 ` Kevin Wolf @ 2026-05-27 10:01 ` Paolo Bonzini 2026-05-27 10:43 ` Alex Bennée ` (5 more replies) 0 siblings, 6 replies; 59+ messages in thread From: Paolo Bonzini @ 2026-05-27 10:01 UTC (permalink / raw) To: Kevin Wolf, Warner Losh; +Cc: Michael S. Tsirkin, qemu-devel, stefanha On 5/27/26 10:41, Kevin Wolf wrote: > Am 26.05.2026 um 21:52 hat Warner Losh geschrieben: >> The QEMU Project currently may accept limited uses of AI that produce >> high quality patches that are limited in the creative content added. >> While maintainers will ultimately decide, changes like the following >> fall within this policy >> 1. Fixing obvious warnings in the obvious ways suggested by the tool >> 2. Tree wide API changes, and other similar mechanical changes done >> today with perl/python/sed/coccinelle > > As I said in the paragraph you quoted below, I don't think we should > encourage using AI for tasks that a deterministic tool could do. In some cases such a tool does not exist. Much to my surprise, there is no tool to do static type inference on Python code, but AI is very good at doing it. > Letting AI perform the change directly instead may be an acceptable > shortcut for a one-man hobby project that nobody else will ever look at, > but in the context of a community project like QEMU in which your > changes have to be reviewed and understood by others, it matters a lot > that the output of the tool is reproducible. Otherwise, you're creating > unnecessary work for others, and that isn't acceptable. When applicable, going through coccinelle (with the aid of AI if needed! is indeed a good middle ground as it helps reviewers for large changes. If you have many slightly different but easily separated changes (e.g. you can split the patch by struct field), it may make things worse. Its also worth noting that in other cases even sed or coccinelle, while deterministic, cannot produce 100% of the patch. > So maybe we should even explicitly mention a recommendation like the > following: > > If you can use a deterministic tool, don't use AI instead. If you > don't know how to use the deterministic tool, use the AI to tell you > how to use it instead of trying to replace it. I like it. >> 3. Limited, small changes to fix bugs or add a small new feature whose >> scope is less than about 100 lines and the originator can explain >> them all or the meta issues about the patch. > > Not sure if mentioning a number of lines is wise. 100 lines can be > mostly boilerplate and simple sequential code or they can be a deeply > nested complex algorithm. I'd put the threshold at 20-50 at most. > I think I would see more use in a tag like (better name welcome): > > AI-used-for: [code|tests|docs|commit message]... I like this *a lot*. No need for free advertisement, but some traceability is useful. For tools such as sed or coccinelle, having the exact script in the patch or commit message useful. Plus, the execution of the script more or lesss delimits the commit by itself (or 90%+ of it). For LLMs it's a bit less clear cut because separating docs makes little sense. And the exact model is pointless, it will be obsolete in 6 months and provide no useful information. So, something like: ------------------- 8< ------------------- Use of AI-generated content ~~~~~~~~~~~~~~~~~~~~~~~~~~~ The QEMU project currently allows using AI/LLM tools to produce patches in scenarios with limited creative content: Mechanical changes If you can use a deterministic tool or a script, don't use AI instead. If you don't know how to do the change deterministically, you may ask the AI for help, rather than having it stand in for the tools. Small bug fixes These should be limited to 20 lines of code or less, not including tests. You are still expected to understand and explain your changes and the rationale behind them. These boundaries do not apply to other uses of AI, such as researching APIs or algorithms, static analysis, or debugging, provided their output is not included in contributions. Larger uses of AI are allowed as an experiment, but they should be agreed upon with the maintainer prior to submission. Use of AI does not remove the need for authors to comply with all other requirements for contribution. In particular, the "Signed-off-by" label in a patch submission is a statement that the author takes responsibility for the entire contents of the patch, certifying that their patch submission is made in accordance with the rules of the `Developer's Certificate of Origin (DCO) <dco>`. Commit messages for AI-assisted changes ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ When AI/LLM tools produce or substantively shape your patch, add an ``AI-used-for:`` trailer. The text of the trailer could be one or more of ``code``, ``tests``, ``docs``, ``research``, possibly followed by an explanation in parentheses:: AI-used-for: tests, docs AI-used-for: code AI-used-for: code (refactoring) AI-used-for: code (prototype) AI-used-for: research The trailer is intended as a clarification of your DCO obligations as well as to guide reviewers. It is not intended for minimal presence such as autocomplete or asking for a pre-review of the patch, and it does not remove your responsibility to understand the changes that you are submitting. Include the prompt in the commit message if it helps a reviewer judge the result: * yes: "move field ``foo`` from ``struct aa`` to ``struct bb``. If a function already has a local variable or parameter of type ``struct bb``, use it instead of accessing ``aa.bb``." * yes: "add an implementation of the trait for ``Mutex<T: MyTrait>``, forwarding the member functions to ``T`` while taking the lock around the calls". * no: "write user-facing documentation for the new tool" * no: "write testcases for the new functions" Deterministic tooling (sed, coccinelle, formatters) is out of scope for the trailer, but should be mentioned in the commit message. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-27 10:01 ` Paolo Bonzini @ 2026-05-27 10:43 ` Alex Bennée 2026-05-27 12:49 ` Kevin Wolf 2026-05-27 10:53 ` Kevin Wolf ` (4 subsequent siblings) 5 siblings, 1 reply; 59+ messages in thread From: Alex Bennée @ 2026-05-27 10:43 UTC (permalink / raw) To: Paolo Bonzini Cc: Kevin Wolf, Warner Losh, Michael S. Tsirkin, qemu-devel, stefanha Paolo Bonzini <pbonzini@redhat.com> writes: > On 5/27/26 10:41, Kevin Wolf wrote: >> Am 26.05.2026 um 21:52 hat Warner Losh geschrieben: >>> The QEMU Project currently may accept limited uses of AI that produce >>> high quality patches that are limited in the creative content added. >>> While maintainers will ultimately decide, changes like the following >>> fall within this policy >>> 1. Fixing obvious warnings in the obvious ways suggested by the tool >>> 2. Tree wide API changes, and other similar mechanical changes done >>> today with perl/python/sed/coccinelle >> As I said in the paragraph you quoted below, I don't think we should >> encourage using AI for tasks that a deterministic tool could do. > > In some cases such a tool does not exist. Much to my surprise, there > is no tool to do static type inference on Python code, but AI is very > good at doing it. > >> Letting AI perform the change directly instead may be an acceptable >> shortcut for a one-man hobby project that nobody else will ever look at, >> but in the context of a community project like QEMU in which your >> changes have to be reviewed and understood by others, it matters a lot >> that the output of the tool is reproducible. Otherwise, you're creating >> unnecessary work for others, and that isn't acceptable. > > When applicable, going through coccinelle (with the aid of AI if > needed! is indeed a good middle ground as it helps reviewers for large > changes. If you have many slightly different but easily separated > changes (e.g. you can split the patch by struct field), it may make > things worse. > > Its also worth noting that in other cases even sed or coccinelle, > while deterministic, cannot produce 100% of the patch. > >> So maybe we should even explicitly mention a recommendation like the >> following: >> If you can use a deterministic tool, don't use AI instead. If >> you >> don't know how to use the deterministic tool, use the AI to tell you >> how to use it instead of trying to replace it. > > I like it. > >>> 3. Limited, small changes to fix bugs or add a small new feature whose >>> scope is less than about 100 lines and the originator can explain >>> them all or the meta issues about the patch. >> Not sure if mentioning a number of lines is wise. 100 lines can be >> mostly boilerplate and simple sequential code or they can be a deeply >> nested complex algorithm. > > I'd put the threshold at 20-50 at most. > >> I think I would see more use in a tag like (better name welcome): >> AI-used-for: [code|tests|docs|commit message]... > > I like this *a lot*. No need for free advertisement, but some > traceability is useful. > > For tools such as sed or coccinelle, having the exact script in the > patch or commit message useful. Plus, the execution of the script > more or lesss delimits the commit by itself (or 90%+ of it). For LLMs > it's a bit less clear cut because separating docs makes little sense. > And the exact model is pointless, it will be obsolete in 6 months and > provide no useful information. > > So, something like: > > ------------------- 8< ------------------- > Use of AI-generated content > ~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > The QEMU project currently allows using AI/LLM tools to produce > patches in scenarios with limited creative content: > > Mechanical changes > If you can use a deterministic tool or a script, don't use AI instead. > If you don't know how to do the change deterministically, you may > ask the AI for help, rather than having it stand in for the tools. I like the idea of pointing people towards tools but I wouldn't be quite so prescriptive. The series MST referred to was easily eyeball-able and I suspect the extra steps would generate friction for contributions. That said the wider the change to the code base the more likely a random hallucination can get lost in the noise. Maybe: Mechanical changes Using AI tools to make simple mechanical changes is allowed. For larger tree-wide changes it is strongly recommended to use a deterministic tool like `sed` or `coccinelle`. You can use AI to help you craft the invocation for you. ? > Small bug fixes > These should be limited to 20 lines of code or less, not including > tests. You are still expected to understand and explain your changes > and the rationale behind them. > > These boundaries do not apply to other uses of AI, such as researching > APIs or algorithms, static analysis, or debugging, provided their output > is not included in contributions. Larger uses of AI are allowed as an > experiment, but they should be agreed upon with the maintainer prior > to submission. > > Use of AI does not remove the need for authors to comply with all other > requirements for contribution. In particular, the "Signed-off-by" > label in a patch submission is a statement that the author takes > responsibility for the entire contents of the patch, certifying that > their patch submission is made in accordance with the rules of the > `Developer's Certificate of Origin (DCO) <dco>`. > > Commit messages for AI-assisted changes > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > When AI/LLM tools produce or substantively shape your patch, add an > ``AI-used-for:`` trailer. The text of the trailer could be one or > more of ``code``, ``tests``, ``docs``, ``research``, possibly followed > by an explanation in parentheses:: > > AI-used-for: tests, docs > AI-used-for: code > AI-used-for: code (refactoring) > AI-used-for: code (prototype) > AI-used-for: research > > The trailer is intended as a clarification of your DCO obligations as > well as to guide reviewers. It is not intended for minimal presence > such as autocomplete or asking for a pre-review of the patch, and it > does not remove your responsibility to understand the changes that you > are submitting. > > Include the prompt in the commit message if it helps a reviewer judge > the result: > > * yes: "move field ``foo`` from ``struct aa`` to ``struct bb``. If a > function already has a local variable or parameter of type ``struct > bb``, use it instead of accessing ``aa.bb``." > > * yes: "add an implementation of the trait for ``Mutex<T: MyTrait>``, > forwarding the member functions to ``T`` while taking the lock > around the calls". > > * no: "write user-facing documentation for the new tool" > > * no: "write testcases for the new functions" > > Deterministic tooling (sed, coccinelle, formatters) is out of scope > for the trailer, but should be mentioned in the commit message. -- Alex Bennée Virtualisation Tech Lead @ Linaro ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-27 10:43 ` Alex Bennée @ 2026-05-27 12:49 ` Kevin Wolf 0 siblings, 0 replies; 59+ messages in thread From: Kevin Wolf @ 2026-05-27 12:49 UTC (permalink / raw) To: Alex Bennée Cc: Paolo Bonzini, Warner Losh, Michael S. Tsirkin, qemu-devel, stefanha Am 27.05.2026 um 12:43 hat Alex Bennée geschrieben: > Paolo Bonzini <pbonzini@redhat.com> writes: > > > On 5/27/26 10:41, Kevin Wolf wrote: > >> Am 26.05.2026 um 21:52 hat Warner Losh geschrieben: > >>> The QEMU Project currently may accept limited uses of AI that produce > >>> high quality patches that are limited in the creative content added. > >>> While maintainers will ultimately decide, changes like the following > >>> fall within this policy > >>> 1. Fixing obvious warnings in the obvious ways suggested by the tool > >>> 2. Tree wide API changes, and other similar mechanical changes done > >>> today with perl/python/sed/coccinelle > >> As I said in the paragraph you quoted below, I don't think we should > >> encourage using AI for tasks that a deterministic tool could do. > > > > In some cases such a tool does not exist. Much to my surprise, there > > is no tool to do static type inference on Python code, but AI is very > > good at doing it. > > > >> Letting AI perform the change directly instead may be an acceptable > >> shortcut for a one-man hobby project that nobody else will ever look at, > >> but in the context of a community project like QEMU in which your > >> changes have to be reviewed and understood by others, it matters a lot > >> that the output of the tool is reproducible. Otherwise, you're creating > >> unnecessary work for others, and that isn't acceptable. > > > > When applicable, going through coccinelle (with the aid of AI if > > needed! is indeed a good middle ground as it helps reviewers for large > > changes. If you have many slightly different but easily separated > > changes (e.g. you can split the patch by struct field), it may make > > things worse. > > > > Its also worth noting that in other cases even sed or coccinelle, > > while deterministic, cannot produce 100% of the patch. > > > >> So maybe we should even explicitly mention a recommendation like the > >> following: > >> If you can use a deterministic tool, don't use AI instead. If > >> you > >> don't know how to use the deterministic tool, use the AI to tell you > >> how to use it instead of trying to replace it. > > > > I like it. > > > >>> 3. Limited, small changes to fix bugs or add a small new feature whose > >>> scope is less than about 100 lines and the originator can explain > >>> them all or the meta issues about the patch. > >> Not sure if mentioning a number of lines is wise. 100 lines can be > >> mostly boilerplate and simple sequential code or they can be a deeply > >> nested complex algorithm. > > > > I'd put the threshold at 20-50 at most. > > > >> I think I would see more use in a tag like (better name welcome): > >> AI-used-for: [code|tests|docs|commit message]... > > > > I like this *a lot*. No need for free advertisement, but some > > traceability is useful. > > > > For tools such as sed or coccinelle, having the exact script in the > > patch or commit message useful. Plus, the execution of the script > > more or lesss delimits the commit by itself (or 90%+ of it). For LLMs > > it's a bit less clear cut because separating docs makes little sense. > > And the exact model is pointless, it will be obsolete in 6 months and > > provide no useful information. > > > > So, something like: > > > > ------------------- 8< ------------------- > > Use of AI-generated content > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > > The QEMU project currently allows using AI/LLM tools to produce > > patches in scenarios with limited creative content: > > > > Mechanical changes > > If you can use a deterministic tool or a script, don't use AI instead. > > If you don't know how to do the change deterministically, you may > > ask the AI for help, rather than having it stand in for the tools. > > I like the idea of pointing people towards tools but I wouldn't be quite > so prescriptive. The series MST referred to was easily eyeball-able and > I suspect the extra steps would generate friction for contributions. > That said the wider the change to the code base the more likely a random > hallucination can get lost in the noise. > > Maybe: > > Mechanical changes > Using AI tools to make simple mechanical changes is allowed. For larger > tree-wide changes it is strongly recommended to use a deterministic > tool like `sed` or `coccinelle`. You can use AI to help you craft the > invocation for you. I think we do want to discourage the direct use of AI in such cases, while not outright banning it. So maybe just a minor tweak to Paolo's wording? Mechanical changes If you can use a deterministic tool or a script, it is preferred that you use it and not replace it with AI. If you don't know how to do the change deterministically, you can ask the AI for help, rather than having it stand in for the tools. Kevin ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-27 10:01 ` Paolo Bonzini 2026-05-27 10:43 ` Alex Bennée @ 2026-05-27 10:53 ` Kevin Wolf 2026-05-27 12:33 ` Paolo Bonzini 2026-05-27 10:54 ` Alistair Francis ` (3 subsequent siblings) 5 siblings, 1 reply; 59+ messages in thread From: Kevin Wolf @ 2026-05-27 10:53 UTC (permalink / raw) To: Paolo Bonzini; +Cc: Warner Losh, Michael S. Tsirkin, qemu-devel, stefanha Am 27.05.2026 um 12:01 hat Paolo Bonzini geschrieben: > On 5/27/26 10:41, Kevin Wolf wrote: > > Am 26.05.2026 um 21:52 hat Warner Losh geschrieben: > > > The QEMU Project currently may accept limited uses of AI that produce > > > high quality patches that are limited in the creative content added. > > > While maintainers will ultimately decide, changes like the following > > > fall within this policy > > > 1. Fixing obvious warnings in the obvious ways suggested by the tool > > > 2. Tree wide API changes, and other similar mechanical changes done > > > today with perl/python/sed/coccinelle > > > > As I said in the paragraph you quoted below, I don't think we should > > encourage using AI for tasks that a deterministic tool could do. > > In some cases such a tool does not exist. Then it's not a task that a deterministic tool could do. Of course, you can always write a new tool that does the exact thing you want to change. But that's not what I was talking about here, I was really talking about existing common tools. > Much to my surprise, there is no tool to do static type inference on > Python code, but AI is very good at doing it. I think this is a special case that has a different balance anyway. When reviewing such a patch, I would skim the change for the general approach and if I like it, but checking for consistency and completeness is something I would use mypy for - that is, a deterministic tool that can verify the change. So I'd still use one, just at a different time. (It actually also might be a rare instance where someone (TM) should actually write the tool because it would be generally useful.) > > Letting AI perform the change directly instead may be an acceptable > > shortcut for a one-man hobby project that nobody else will ever look at, > > but in the context of a community project like QEMU in which your > > changes have to be reviewed and understood by others, it matters a lot > > that the output of the tool is reproducible. Otherwise, you're creating > > unnecessary work for others, and that isn't acceptable. > > When applicable, going through coccinelle (with the aid of AI if needed! is > indeed a good middle ground as it helps reviewers for large changes. If you > have many slightly different but easily separated changes (e.g. you can > split the patch by struct field), it may make things worse. > > Its also worth noting that in other cases even sed or coccinelle, while > deterministic, cannot produce 100% of the patch. Agreed, it's all a case of "if possible, prefer this", not "you have to do this 100% of the time". > > So maybe we should even explicitly mention a recommendation like the > > following: > > > > If you can use a deterministic tool, don't use AI instead. If you > > don't know how to use the deterministic tool, use the AI to tell you > > how to use it instead of trying to replace it. > > I like it. > > > > 3. Limited, small changes to fix bugs or add a small new feature whose > > > scope is less than about 100 lines and the originator can explain > > > them all or the meta issues about the patch. > > > > Not sure if mentioning a number of lines is wise. 100 lines can be > > mostly boilerplate and simple sequential code or they can be a deeply > > nested complex algorithm. > > I'd put the threshold at 20-50 at most. > > > I think I would see more use in a tag like (better name welcome): > > > > AI-used-for: [code|tests|docs|commit message]... > > I like this *a lot*. No need for free advertisement, but some traceability > is useful. > > For tools such as sed or coccinelle, having the exact script in the patch or > commit message useful. Plus, the execution of the script more or lesss > delimits the commit by itself (or 90%+ of it). For LLMs it's a bit less > clear cut because separating docs makes little sense. And the exact model > is pointless, it will be obsolete in 6 months and provide no useful > information. > > So, something like: > > ------------------- 8< ------------------- > Use of AI-generated content > ~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > The QEMU project currently allows using AI/LLM tools to produce patches in > scenarios with limited creative content: > > Mechanical changes > If you can use a deterministic tool or a script, don't use AI instead. > If you don't know how to do the change deterministically, you may > ask the AI for help, rather than having it stand in for the tools. > > Small bug fixes > These should be limited to 20 lines of code or less, not including > tests. You are still expected to understand and explain your changes > and the rationale behind them. I agree with "not including tests". But I think this would be more consistent if we also add new tests (that come without a small bug fix at the same time; either because the problem is already fixed or because the fix is too complex to qualify) as another allowed category. (To be honest, I'm a bit biased here because allowing tests is my single biggest wish from an AI policy update.) > These boundaries do not apply to other uses of AI, such as researching > APIs or algorithms, static analysis, or debugging, provided their output > is not included in contributions. Larger uses of AI are allowed as an > experiment, but they should be agreed upon with the maintainer prior to > submission. > > Use of AI does not remove the need for authors to comply with all other > requirements for contribution. In particular, the "Signed-off-by" > label in a patch submission is a statement that the author takes > responsibility for the entire contents of the patch, certifying that > their patch submission is made in accordance with the rules of the > `Developer's Certificate of Origin (DCO) <dco>`. > > Commit messages for AI-assisted changes > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > When AI/LLM tools produce or substantively shape your patch, add an > ``AI-used-for:`` trailer. The text of the trailer could be one or more of > ``code``, ``tests``, ``docs``, ``research``, possibly followed by an > explanation in parentheses:: Include a category for commit messages, or are we expecting that commit messages are always written by a human? If so, that should be explicit. > AI-used-for: tests, docs > AI-used-for: code > AI-used-for: code (refactoring) > AI-used-for: code (prototype) > AI-used-for: research > > The trailer is intended as a clarification of your DCO obligations as well > as to guide reviewers. It is not intended for minimal presence such as > autocomplete or asking for a pre-review of the patch, and it does not remove > your responsibility to understand the changes that you are submitting. > > Include the prompt in the commit message if it helps a reviewer judge the > result: > > * yes: "move field ``foo`` from ``struct aa`` to ``struct bb``. If a > function already has a local variable or parameter of type ``struct bb``, > use it instead of accessing ``aa.bb``." > > * yes: "add an implementation of the trait for ``Mutex<T: MyTrait>``, > forwarding the member functions to ``T`` while taking the lock around the > calls". > > * no: "write user-facing documentation for the new tool" > > * no: "write testcases for the new functions" > > Deterministic tooling (sed, coccinelle, formatters) is out of scope for the > trailer, but should be mentioned in the commit message. Apart from the above comments, this looks good to me. Kevin ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-27 10:53 ` Kevin Wolf @ 2026-05-27 12:33 ` Paolo Bonzini 2026-05-27 12:43 ` Michael S. Tsirkin 0 siblings, 1 reply; 59+ messages in thread From: Paolo Bonzini @ 2026-05-27 12:33 UTC (permalink / raw) To: Kevin Wolf; +Cc: Warner Losh, Michael S. Tsirkin, qemu-devel, stefanha On 5/27/26 12:53, Kevin Wolf wrote: > Am 27.05.2026 um 12:01 hat Paolo Bonzini geschrieben: >> On 5/27/26 10:41, Kevin Wolf wrote: >>> Am 26.05.2026 um 21:52 hat Warner Losh geschrieben: >>>> The QEMU Project currently may accept limited uses of AI that produce >>>> high quality patches that are limited in the creative content added. >>>> While maintainers will ultimately decide, changes like the following >>>> fall within this policy >>>> 1. Fixing obvious warnings in the obvious ways suggested by the tool >>>> 2. Tree wide API changes, and other similar mechanical changes done >>>> today with perl/python/sed/coccinelle >>> >>> As I said in the paragraph you quoted below, I don't think we should >>> encourage using AI for tasks that a deterministic tool could do. >> >> In some cases such a tool does not exist. > > Then it's not a task that a deterministic tool could do. You have a point. :) > [type annotations] might be a rare instance where someone (TM) should > actually write the tool because it would be generally useful. Agreed, especially the "someone" part. >> Small bug fixes >> These should be limited to 20 lines of code or less, not including >> tests. You are still expected to understand and explain your changes >> and the rationale behind them. > > I agree with "not including tests". But I think this would be more > consistent if we also add new tests (that come without a small bug fix > at the same time; either because the problem is already fixed or because > the fix is too complex to qualify) as another allowed category. Yes, absolutely. Can you propose a wording? >> These boundaries do not apply to other uses of AI, such as researching >> APIs or algorithms, static analysis, or debugging, provided their output >> is not included in contributions. Larger uses of AI are allowed as an >> experiment, but they should be agreed upon with the maintainer prior >> to submission. Taking into account Alistair's input I'd rephrase as The intention of these boundaries is to reduce the risk of maintainer burnout from AI contributions, as well as the risk to the project from unintentional copyright violations. They do not apply to other uses of AI, such as researching APIs or algorithms, static analysis, or debugging, provided the model's output is not included in contributions. If you wish to send large amounts of AI-generated changes, or any other contribution not in the above categories, please get in touch with the maintainer beforehand. >> When AI/LLM tools produce or substantively shape your patch, add an >> ``AI-used-for:`` trailer. The text of the trailer could be one or more of >> ``code``, ``tests``, ``docs``, ``research``, possibly followed by an >> explanation in parentheses:: > > Include a category for commit messages, or are we expecting that commit > messages are always written by a human? If so, that should be explicit. Mostly, I don't think it matters. A commit message written purely by an LLM is usually very bad. A commit message edited with an LLM falls under this: >> It is not intended for minimal presence such as >> autocomplete or asking for a pre-review of the patch, and it does not remove >> your responsibility to understand the changes that you are submitting. Technically "research" shouldn't matter for the policy either, but it may be interesting to write it out, if AI usage was important enough to mention in the commit message. Perhaps coccinelle scripts would fall under that as well. Paolo ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-27 12:33 ` Paolo Bonzini @ 2026-05-27 12:43 ` Michael S. Tsirkin 0 siblings, 0 replies; 59+ messages in thread From: Michael S. Tsirkin @ 2026-05-27 12:43 UTC (permalink / raw) To: Paolo Bonzini; +Cc: Kevin Wolf, Warner Losh, qemu-devel, stefanha On Wed, May 27, 2026 at 02:33:03PM +0200, Paolo Bonzini wrote: > The intention of these boundaries is to reduce the risk of maintainer > burnout from AI contributions, as well as the risk to the project from > unintentional copyright violations. They do not apply to other uses of AI, > such as researching APIs or algorithms, static analysis, or debugging, > provided the model's output is not included in contributions. Although I will be frank, "static analysis" can induce maintainer burnout just as easily) But I don't see what we can do about that. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-27 10:01 ` Paolo Bonzini 2026-05-27 10:43 ` Alex Bennée 2026-05-27 10:53 ` Kevin Wolf @ 2026-05-27 10:54 ` Alistair Francis 2026-05-27 14:21 ` Warner Losh 2026-05-27 14:11 ` Michael S. Tsirkin ` (2 subsequent siblings) 5 siblings, 1 reply; 59+ messages in thread From: Alistair Francis @ 2026-05-27 10:54 UTC (permalink / raw) To: Paolo Bonzini Cc: Kevin Wolf, Warner Losh, Michael S. Tsirkin, qemu-devel, stefanha On Wed, May 27, 2026 at 8:02 PM Paolo Bonzini <pbonzini@redhat.com> wrote: > > On 5/27/26 10:41, Kevin Wolf wrote: > > Am 26.05.2026 um 21:52 hat Warner Losh geschrieben: > >> The QEMU Project currently may accept limited uses of AI that produce > >> high quality patches that are limited in the creative content added. > >> While maintainers will ultimately decide, changes like the following > >> fall within this policy > >> 1. Fixing obvious warnings in the obvious ways suggested by the tool > >> 2. Tree wide API changes, and other similar mechanical changes done > >> today with perl/python/sed/coccinelle > > > > As I said in the paragraph you quoted below, I don't think we should > > encourage using AI for tasks that a deterministic tool could do. > > In some cases such a tool does not exist. Much to my surprise, there is > no tool to do static type inference on Python code, but AI is very good > at doing it. > > > Letting AI perform the change directly instead may be an acceptable > > shortcut for a one-man hobby project that nobody else will ever look at, > > but in the context of a community project like QEMU in which your > > changes have to be reviewed and understood by others, it matters a lot > > that the output of the tool is reproducible. Otherwise, you're creating > > unnecessary work for others, and that isn't acceptable. > > When applicable, going through coccinelle (with the aid of AI if needed! > is indeed a good middle ground as it helps reviewers for large changes. > If you have many slightly different but easily separated changes (e.g. > you can split the patch by struct field), it may make things worse. > > Its also worth noting that in other cases even sed or coccinelle, while > deterministic, cannot produce 100% of the patch. > > > So maybe we should even explicitly mention a recommendation like the > > following: > > > > If you can use a deterministic tool, don't use AI instead. If you > > don't know how to use the deterministic tool, use the AI to tell you > > how to use it instead of trying to replace it. > > I like it. > > >> 3. Limited, small changes to fix bugs or add a small new feature whose > >> scope is less than about 100 lines and the originator can explain > >> them all or the meta issues about the patch. > > > > Not sure if mentioning a number of lines is wise. 100 lines can be > > mostly boilerplate and simple sequential code or they can be a deeply > > nested complex algorithm. > > I'd put the threshold at 20-50 at most. > > > I think I would see more use in a tag like (better name welcome): > > > > AI-used-for: [code|tests|docs|commit message]... > > I like this *a lot*. No need for free advertisement, but some > traceability is useful. > > For tools such as sed or coccinelle, having the exact script in the > patch or commit message useful. Plus, the execution of the script more > or lesss delimits the commit by itself (or 90%+ of it). For LLMs it's a > bit less clear cut because separating docs makes little sense. And the > exact model is pointless, it will be obsolete in 6 months and provide no > useful information. > > So, something like: > > ------------------- 8< ------------------- > Use of AI-generated content > ~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > The QEMU project currently allows using AI/LLM tools to produce patches > in scenarios with limited creative content: > > Mechanical changes > If you can use a deterministic tool or a script, don't use AI instead. > If you don't know how to do the change deterministically, you may > ask the AI for help, rather than having it stand in for the tools. > > Small bug fixes > These should be limited to 20 lines of code or less, not including > tests. You are still expected to understand and explain your changes > and the rationale behind them. Coming back to Peter's earlier comments and the Zig policy, one thing we have in RISC-V is people are running AI tools against QEMU and the RISC-V spec to identify places where we don't meet the spec. They then write patches and submit them upstream. The patches appear human written, so have been accepted. A lot of the bugs found are corner cases that people aren't actually hitting. From my use of AI review systems in the past, they do tend to be very nit-picky. So it's not too hard to catch issues that users won't actually hit and fix them. It's a valid fix, but easy to inundate reviewers. If this process was entirely run by an LLM it could be way too much. So maybe we should add something here about don't send large numbers of "small bug fix" patches. So someone doesn't point an AI at QEMU and a spec and generate huge numbers of patches, all of which are just small bug fixes. Alistair > > These boundaries do not apply to other uses of AI, such as researching > APIs or algorithms, static analysis, or debugging, provided their output > is not included in contributions. Larger uses of AI are allowed as an > experiment, but they should be agreed upon with the maintainer prior to > submission. > > Use of AI does not remove the need for authors to comply with all other > requirements for contribution. In particular, the "Signed-off-by" > label in a patch submission is a statement that the author takes > responsibility for the entire contents of the patch, certifying that > their patch submission is made in accordance with the rules of the > `Developer's Certificate of Origin (DCO) <dco>`. > > Commit messages for AI-assisted changes > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > When AI/LLM tools produce or substantively shape your patch, add an > ``AI-used-for:`` trailer. The text of the trailer could be one or more > of ``code``, ``tests``, ``docs``, ``research``, possibly followed by an > explanation in parentheses:: > > AI-used-for: tests, docs > AI-used-for: code > AI-used-for: code (refactoring) > AI-used-for: code (prototype) > AI-used-for: research > > The trailer is intended as a clarification of your DCO obligations as > well as to guide reviewers. It is not intended for minimal presence > such as autocomplete or asking for a pre-review of the patch, and it > does not remove your responsibility to understand the changes that you > are submitting. > > Include the prompt in the commit message if it helps a reviewer judge > the result: > > * yes: "move field ``foo`` from ``struct aa`` to ``struct bb``. If a > function already has a local variable or parameter of type ``struct > bb``, use it instead of accessing ``aa.bb``." > > * yes: "add an implementation of the trait for ``Mutex<T: MyTrait>``, > forwarding the member functions to ``T`` while taking the lock around > the calls". > > * no: "write user-facing documentation for the new tool" > > * no: "write testcases for the new functions" > > Deterministic tooling (sed, coccinelle, formatters) is out of scope for > the trailer, but should be mentioned in the commit message. > > ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-27 10:54 ` Alistair Francis @ 2026-05-27 14:21 ` Warner Losh 2026-05-28 1:59 ` Alistair Francis 0 siblings, 1 reply; 59+ messages in thread From: Warner Losh @ 2026-05-27 14:21 UTC (permalink / raw) To: Alistair Francis Cc: Paolo Bonzini, Kevin Wolf, Michael S. Tsirkin, qemu-devel, stefanha [-- Attachment #1: Type: text/plain, Size: 8329 bytes --] On Wed, May 27, 2026 at 4:54 AM Alistair Francis <alistair23@gmail.com> wrote: > On Wed, May 27, 2026 at 8:02 PM Paolo Bonzini <pbonzini@redhat.com> wrote: > > > > On 5/27/26 10:41, Kevin Wolf wrote: > > > Am 26.05.2026 um 21:52 hat Warner Losh geschrieben: > > >> The QEMU Project currently may accept limited uses of AI that produce > > >> high quality patches that are limited in the creative content added. > > >> While maintainers will ultimately decide, changes like the following > > >> fall within this policy > > >> 1. Fixing obvious warnings in the obvious ways suggested by the tool > > >> 2. Tree wide API changes, and other similar mechanical changes done > > >> today with perl/python/sed/coccinelle > > > > > > As I said in the paragraph you quoted below, I don't think we should > > > encourage using AI for tasks that a deterministic tool could do. > > > > In some cases such a tool does not exist. Much to my surprise, there is > > no tool to do static type inference on Python code, but AI is very good > > at doing it. > > > > > Letting AI perform the change directly instead may be an acceptable > > > shortcut for a one-man hobby project that nobody else will ever look > at, > > > but in the context of a community project like QEMU in which your > > > changes have to be reviewed and understood by others, it matters a lot > > > that the output of the tool is reproducible. Otherwise, you're creating > > > unnecessary work for others, and that isn't acceptable. > > > > When applicable, going through coccinelle (with the aid of AI if needed! > > is indeed a good middle ground as it helps reviewers for large changes. > > If you have many slightly different but easily separated changes (e.g. > > you can split the patch by struct field), it may make things worse. > > > > Its also worth noting that in other cases even sed or coccinelle, while > > deterministic, cannot produce 100% of the patch. > > > > > So maybe we should even explicitly mention a recommendation like the > > > following: > > > > > > If you can use a deterministic tool, don't use AI instead. If you > > > don't know how to use the deterministic tool, use the AI to tell > you > > > how to use it instead of trying to replace it. > > > > I like it. > > > > >> 3. Limited, small changes to fix bugs or add a small new feature whose > > >> scope is less than about 100 lines and the originator can explain > > >> them all or the meta issues about the patch. > > > > > > Not sure if mentioning a number of lines is wise. 100 lines can be > > > mostly boilerplate and simple sequential code or they can be a deeply > > > nested complex algorithm. > > > > I'd put the threshold at 20-50 at most. > > > > > I think I would see more use in a tag like (better name welcome): > > > > > > AI-used-for: [code|tests|docs|commit message]... > > > > I like this *a lot*. No need for free advertisement, but some > > traceability is useful. > > > > For tools such as sed or coccinelle, having the exact script in the > > patch or commit message useful. Plus, the execution of the script more > > or lesss delimits the commit by itself (or 90%+ of it). For LLMs it's a > > bit less clear cut because separating docs makes little sense. And the > > exact model is pointless, it will be obsolete in 6 months and provide no > > useful information. > > > > So, something like: > > > > ------------------- 8< ------------------- > > Use of AI-generated content > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > > The QEMU project currently allows using AI/LLM tools to produce patches > > in scenarios with limited creative content: > > > > Mechanical changes > > If you can use a deterministic tool or a script, don't use AI instead. > > If you don't know how to do the change deterministically, you may > > ask the AI for help, rather than having it stand in for the tools. > > > > Small bug fixes > > These should be limited to 20 lines of code or less, not including > > tests. You are still expected to understand and explain your changes > > and the rationale behind them. > > Coming back to Peter's earlier comments and the Zig policy, one thing > we have in RISC-V is people are running AI tools against QEMU and the > RISC-V spec to identify places where we don't meet the spec. They then > write patches and submit them upstream. The patches appear human > written, so have been accepted. > I have a checklist for bsd-user changes that checks the common mistakes around the lock_user family of interfaces. I've generated patches from what claude found, but claude's patches were generally identical to what I produced. > A lot of the bugs found are corner cases that people aren't actually > hitting. From my use of AI review systems in the past, they do tend to > be very nit-picky. So it's not too hard to catch issues that users > won't actually hit and fix them. It's a valid fix, but easy to > inundate reviewers. If this process was entirely run by an LLM it > could be way too much. > My experience is that claude found 150 or so "bugs". All were "legit" in the sense the APIs were used wrong, but maybe 10 were actual critical bugs that explained some, but not all, of the mysterious hangs we see. My worry has been one of testing: how do I test it all? Or do I continue to use the 'just build thousands of packages' as the acid test? > So maybe we should add something here about don't send large numbers > of "small bug fix" patches. So someone doesn't point an AI at QEMU and > a spec and generate huge numbers of patches, all of which are just > small bug fixes. > Wouldn't this concern fall under the general requirement to not send more than a manageable number of patches at a time (like 50)? Or do you think a lower number is warranted? Warner > Alistair > > > > > These boundaries do not apply to other uses of AI, such as researching > > APIs or algorithms, static analysis, or debugging, provided their output > > is not included in contributions. Larger uses of AI are allowed as an > > experiment, but they should be agreed upon with the maintainer prior to > > submission. > > > > Use of AI does not remove the need for authors to comply with all other > > requirements for contribution. In particular, the "Signed-off-by" > > label in a patch submission is a statement that the author takes > > responsibility for the entire contents of the patch, certifying that > > their patch submission is made in accordance with the rules of the > > `Developer's Certificate of Origin (DCO) <dco>`. > > > > Commit messages for AI-assisted changes > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > > When AI/LLM tools produce or substantively shape your patch, add an > > ``AI-used-for:`` trailer. The text of the trailer could be one or more > > of ``code``, ``tests``, ``docs``, ``research``, possibly followed by an > > explanation in parentheses:: > > > > AI-used-for: tests, docs > > AI-used-for: code > > AI-used-for: code (refactoring) > > AI-used-for: code (prototype) > > AI-used-for: research > > > > The trailer is intended as a clarification of your DCO obligations as > > well as to guide reviewers. It is not intended for minimal presence > > such as autocomplete or asking for a pre-review of the patch, and it > > does not remove your responsibility to understand the changes that you > > are submitting. > > > > Include the prompt in the commit message if it helps a reviewer judge > > the result: > > > > * yes: "move field ``foo`` from ``struct aa`` to ``struct bb``. If a > > function already has a local variable or parameter of type ``struct > > bb``, use it instead of accessing ``aa.bb``." > > > > * yes: "add an implementation of the trait for ``Mutex<T: MyTrait>``, > > forwarding the member functions to ``T`` while taking the lock around > > the calls". > > > > * no: "write user-facing documentation for the new tool" > > > > * no: "write testcases for the new functions" > > > > Deterministic tooling (sed, coccinelle, formatters) is out of scope for > > the trailer, but should be mentioned in the commit message. > > > > > [-- Attachment #2: Type: text/html, Size: 10458 bytes --] ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-27 14:21 ` Warner Losh @ 2026-05-28 1:59 ` Alistair Francis 2026-05-28 5:06 ` Michael S. Tsirkin 0 siblings, 1 reply; 59+ messages in thread From: Alistair Francis @ 2026-05-28 1:59 UTC (permalink / raw) To: Warner Losh Cc: Paolo Bonzini, Kevin Wolf, Michael S. Tsirkin, qemu-devel, stefanha On Thu, May 28, 2026 at 12:21 AM Warner Losh <imp@bsdimp.com> wrote: > > > > On Wed, May 27, 2026 at 4:54 AM Alistair Francis <alistair23@gmail.com> wrote: >> >> On Wed, May 27, 2026 at 8:02 PM Paolo Bonzini <pbonzini@redhat.com> wrote: >> > >> > On 5/27/26 10:41, Kevin Wolf wrote: >> > > Am 26.05.2026 um 21:52 hat Warner Losh geschrieben: >> > >> The QEMU Project currently may accept limited uses of AI that produce >> > >> high quality patches that are limited in the creative content added. >> > >> While maintainers will ultimately decide, changes like the following >> > >> fall within this policy >> > >> 1. Fixing obvious warnings in the obvious ways suggested by the tool >> > >> 2. Tree wide API changes, and other similar mechanical changes done >> > >> today with perl/python/sed/coccinelle >> > > >> > > As I said in the paragraph you quoted below, I don't think we should >> > > encourage using AI for tasks that a deterministic tool could do. >> > >> > In some cases such a tool does not exist. Much to my surprise, there is >> > no tool to do static type inference on Python code, but AI is very good >> > at doing it. >> > >> > > Letting AI perform the change directly instead may be an acceptable >> > > shortcut for a one-man hobby project that nobody else will ever look at, >> > > but in the context of a community project like QEMU in which your >> > > changes have to be reviewed and understood by others, it matters a lot >> > > that the output of the tool is reproducible. Otherwise, you're creating >> > > unnecessary work for others, and that isn't acceptable. >> > >> > When applicable, going through coccinelle (with the aid of AI if needed! >> > is indeed a good middle ground as it helps reviewers for large changes. >> > If you have many slightly different but easily separated changes (e.g. >> > you can split the patch by struct field), it may make things worse. >> > >> > Its also worth noting that in other cases even sed or coccinelle, while >> > deterministic, cannot produce 100% of the patch. >> > >> > > So maybe we should even explicitly mention a recommendation like the >> > > following: >> > > >> > > If you can use a deterministic tool, don't use AI instead. If you >> > > don't know how to use the deterministic tool, use the AI to tell you >> > > how to use it instead of trying to replace it. >> > >> > I like it. >> > >> > >> 3. Limited, small changes to fix bugs or add a small new feature whose >> > >> scope is less than about 100 lines and the originator can explain >> > >> them all or the meta issues about the patch. >> > > >> > > Not sure if mentioning a number of lines is wise. 100 lines can be >> > > mostly boilerplate and simple sequential code or they can be a deeply >> > > nested complex algorithm. >> > >> > I'd put the threshold at 20-50 at most. >> > >> > > I think I would see more use in a tag like (better name welcome): >> > > >> > > AI-used-for: [code|tests|docs|commit message]... >> > >> > I like this *a lot*. No need for free advertisement, but some >> > traceability is useful. >> > >> > For tools such as sed or coccinelle, having the exact script in the >> > patch or commit message useful. Plus, the execution of the script more >> > or lesss delimits the commit by itself (or 90%+ of it). For LLMs it's a >> > bit less clear cut because separating docs makes little sense. And the >> > exact model is pointless, it will be obsolete in 6 months and provide no >> > useful information. >> > >> > So, something like: >> > >> > ------------------- 8< ------------------- >> > Use of AI-generated content >> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> > >> > The QEMU project currently allows using AI/LLM tools to produce patches >> > in scenarios with limited creative content: >> > >> > Mechanical changes >> > If you can use a deterministic tool or a script, don't use AI instead. >> > If you don't know how to do the change deterministically, you may >> > ask the AI for help, rather than having it stand in for the tools. >> > >> > Small bug fixes >> > These should be limited to 20 lines of code or less, not including >> > tests. You are still expected to understand and explain your changes >> > and the rationale behind them. >> >> Coming back to Peter's earlier comments and the Zig policy, one thing >> we have in RISC-V is people are running AI tools against QEMU and the >> RISC-V spec to identify places where we don't meet the spec. They then >> write patches and submit them upstream. The patches appear human >> written, so have been accepted. > > > I have a checklist for bsd-user changes that checks the common mistakes > around the lock_user family of interfaces. I've generated patches from > what claude found, but claude's patches were generally identical to > what I produced. > >> >> A lot of the bugs found are corner cases that people aren't actually >> hitting. From my use of AI review systems in the past, they do tend to >> be very nit-picky. So it's not too hard to catch issues that users >> won't actually hit and fix them. It's a valid fix, but easy to >> inundate reviewers. If this process was entirely run by an LLM it >> could be way too much. > > > My experience is that claude found 150 or so "bugs". All were "legit" in > the sense the APIs were used wrong, but maybe 10 were actual critical > bugs that explained some, but not all, of the mysterious hangs we see. Yeah, that's exactly what I don't want, someone drive by sending all 150 "legit" bug fix patches. > My worry has been one of testing: how do I test it all? Or do I continue > to use the 'just build thousands of packages' as the acid test? > >> >> So maybe we should add something here about don't send large numbers >> of "small bug fix" patches. So someone doesn't point an AI at QEMU and >> a spec and generate huge numbers of patches, all of which are just >> small bug fixes. > > > Wouldn't this concern fall under the general requirement to not send more > than a manageable number of patches at a time (like 50)? Or do you think > a lower number is warranted? I could see someone reading the proposed wording and then sending one 20 line patch, then another, then another, then another, then another, then another and then more. All while only sending "small patches", but the ability to generate a large number. It's not clear to me at least from the proposed wording that we should discourage that. Alistair ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-28 1:59 ` Alistair Francis @ 2026-05-28 5:06 ` Michael S. Tsirkin 2026-05-28 7:32 ` Paolo Bonzini 0 siblings, 1 reply; 59+ messages in thread From: Michael S. Tsirkin @ 2026-05-28 5:06 UTC (permalink / raw) To: Alistair Francis Cc: Warner Losh, Paolo Bonzini, Kevin Wolf, qemu-devel, stefanha On Thu, May 28, 2026 at 11:59:35AM +1000, Alistair Francis wrote: > > My worry has been one of testing: how do I test it all? Or do I continue > > to use the 'just build thousands of packages' as the acid test? > > > >> > >> So maybe we should add something here about don't send large numbers > >> of "small bug fix" patches. So someone doesn't point an AI at QEMU and > >> a spec and generate huge numbers of patches, all of which are just > >> small bug fixes. > > > > > > Wouldn't this concern fall under the general requirement to not send more > > than a manageable number of patches at a time (like 50)? Or do you think > > a lower number is warranted? > > I could see someone reading the proposed wording and then sending one > 20 line patch, then another, then another, then another, then another, > then another and then more. All while only sending "small patches", > but the ability to generate a large number. > > It's not clear to me at least from the proposed wording that we should > discourage that. > > Alistair Maybe we shouldn't? It's far from trivial to split up functionality even in 100 line self contained chunks, let alone 20. And reviewing such small patches is *easy*. -- MST ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-28 5:06 ` Michael S. Tsirkin @ 2026-05-28 7:32 ` Paolo Bonzini 0 siblings, 0 replies; 59+ messages in thread From: Paolo Bonzini @ 2026-05-28 7:32 UTC (permalink / raw) To: Michael S. Tsirkin, Alistair Francis Cc: Warner Losh, Kevin Wolf, qemu-devel, stefanha On 5/28/26 07:06, Michael S. Tsirkin wrote: >> I could see someone reading the proposed wording and then sending one >> 20 line patch, then another, then another, then another, then another, >> then another and then more. All while only sending "small patches", >> but the ability to generate a large number. >> >> It's not clear to me at least from the proposed wording that we should >> discourage that. > > Maybe we shouldn't? It's far from trivial to split up functionality even > in 100 line self contained chunks, let alone 20. And reviewing > such small patches is *easy*. Bugfixes for something like TCG front-ends are usually well within 20 lines, or even less: target/i386/tcg: fix decoding of MOVBE and CRC32 in 16-bit mode 1 file changed, 10 insertions(+), 6 deletions(-) target/i386/tcg: fix typo in dpps/dppd instructions 2 files changed, 4 insertions(+), 4 deletions(-) target/i386/tcg: fix a few instructions that do not support VEX.L=1 1 file changed, 4 insertions(+), 4 deletions(-) target/i386/tcg: allow VEX in 16-bit protected mode 1 file changed, 3 insertions(+), 7 deletions(-) target/i386/tcg: do not mark all SSE instructions as unaligned 2 files changed, 9 insertions(+), 4 deletions(-) target/i386/tcg: do not leave non-arithmetic flags in CC_SRC after PUSHF 1 file changed, 1 insertion(+), 2 deletions(-) target/i386/tcg: mark more instructions that are invalid in 64-bit mode 1 file changed, 4 insertions(+), 4 deletions(-) target/i386/tcg: ignore V3 in 32-bit mode 1 file changed, 1 insertion(+), 1 deletion(-) target/i386: Fix #GP error code for INT instructions 1 file changed, 1 insertion(+), 1 deletion(-) target/i386/tcg: validate segment registers 1 file changed, 6 insertions(+), 1 deletion(-) target/i386: Mark VPERMILPS as not valid with prefix 0 1 file changed, 1 insertion(+), 1 deletion(-) target/x86: Correctly handle invalid 0x0f 0xc7 0xxx insns 1 file changed, 2 insertions(+) target/i386: fix x86_64 pushw op 1 file changed, 1 insertion(+), 1 deletion(-) target/i386: fix width of third operand of VINSERTx128 1 file changed, 2 insertions(+), 2 deletions(-) target/i386: fix TB exit logic in gen_movl_seg() when writing to SS 1 file changed, 5 insertions(+), 2 deletions(-) And reviewing them is not necessarily easy if they touch weird corner cases of the architecture. The code change might match the intention but you still need to check the manual. Or on the contrary, the error is apparent but the fix may be obscure, as in commit 5a2faa0a0a's single-line change: - [0x0e] = X86_OP_ENTRYr(PUSH, E,f64), + [0x0e] = X86_OP_ENTRYr(PUSH, E,d64), These changes could still be welcome, but I suppose the maintainer would also prefer a heads-up about them. Paolo ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-27 10:01 ` Paolo Bonzini ` (2 preceding siblings ...) 2026-05-27 10:54 ` Alistair Francis @ 2026-05-27 14:11 ` Michael S. Tsirkin 2026-05-27 14:14 ` Warner Losh 2026-05-27 16:39 ` Michael S. Tsirkin 5 siblings, 0 replies; 59+ messages in thread From: Michael S. Tsirkin @ 2026-05-27 14:11 UTC (permalink / raw) To: Paolo Bonzini; +Cc: Kevin Wolf, Warner Losh, qemu-devel, stefanha On Wed, May 27, 2026 at 12:01:10PM +0200, Paolo Bonzini wrote: > > > 3. Limited, small changes to fix bugs or add a small new feature whose > > > scope is less than about 100 lines and the originator can explain > > > them all or the meta issues about the patch. > > > > Not sure if mentioning a number of lines is wise. 100 lines can be > > mostly boilerplate and simple sequential code or they can be a deeply > > nested complex algorithm. > > I'd put the threshold at 20-50 at most. At most 50 lines added, right? OK. -- MST ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-27 10:01 ` Paolo Bonzini ` (3 preceding siblings ...) 2026-05-27 14:11 ` Michael S. Tsirkin @ 2026-05-27 14:14 ` Warner Losh 2026-05-27 14:51 ` Kevin Wolf 2026-05-27 16:05 ` Paolo Bonzini 2026-05-27 16:39 ` Michael S. Tsirkin 5 siblings, 2 replies; 59+ messages in thread From: Warner Losh @ 2026-05-27 14:14 UTC (permalink / raw) To: Paolo Bonzini; +Cc: Kevin Wolf, Michael S. Tsirkin, qemu-devel, stefanha [-- Attachment #1: Type: text/plain, Size: 6852 bytes --] On Wed, May 27, 2026 at 4:01 AM Paolo Bonzini <pbonzini@redhat.com> wrote: > On 5/27/26 10:41, Kevin Wolf wrote: > > Am 26.05.2026 um 21:52 hat Warner Losh geschrieben: > >> The QEMU Project currently may accept limited uses of AI that produce > >> high quality patches that are limited in the creative content added. > >> While maintainers will ultimately decide, changes like the following > >> fall within this policy > >> 1. Fixing obvious warnings in the obvious ways suggested by the tool > >> 2. Tree wide API changes, and other similar mechanical changes done > >> today with perl/python/sed/coccinelle > > > > As I said in the paragraph you quoted below, I don't think we should > > encourage using AI for tasks that a deterministic tool could do. > > In some cases such a tool does not exist. Much to my surprise, there is > no tool to do static type inference on Python code, but AI is very good > at doing it. > > > Letting AI perform the change directly instead may be an acceptable > > shortcut for a one-man hobby project that nobody else will ever look at, > > but in the context of a community project like QEMU in which your > > changes have to be reviewed and understood by others, it matters a lot > > that the output of the tool is reproducible. Otherwise, you're creating > > unnecessary work for others, and that isn't acceptable. > > When applicable, going through coccinelle (with the aid of AI if needed! > is indeed a good middle ground as it helps reviewers for large changes. > If you have many slightly different but easily separated changes (e.g. > you can split the patch by struct field), it may make things worse. > > Its also worth noting that in other cases even sed or coccinelle, while > deterministic, cannot produce 100% of the patch. > > > So maybe we should even explicitly mention a recommendation like the > > following: > > > > If you can use a deterministic tool, don't use AI instead. If you > > don't know how to use the deterministic tool, use the AI to tell you > > how to use it instead of trying to replace it. > > I like it. > > >> 3. Limited, small changes to fix bugs or add a small new feature whose > >> scope is less than about 100 lines and the originator can explain > >> them all or the meta issues about the patch. > > > > Not sure if mentioning a number of lines is wise. 100 lines can be > > mostly boilerplate and simple sequential code or they can be a deeply > > nested complex algorithm. > > I'd put the threshold at 20-50 at most. > > > I think I would see more use in a tag like (better name welcome): > > > > AI-used-for: [code|tests|docs|commit message]... > > I like this *a lot*. No need for free advertisement, but some > traceability is useful. > > For tools such as sed or coccinelle, having the exact script in the > patch or commit message useful. Plus, the execution of the script more > or lesss delimits the commit by itself (or 90%+ of it). For LLMs it's a > bit less clear cut because separating docs makes little sense. And the > exact model is pointless, it will be obsolete in 6 months and provide no > useful information. > > So, something like: > > ------------------- 8< ------------------- > Use of AI-generated content > ~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > The QEMU project currently allows using AI/LLM tools to produce patches > in scenarios with limited creative content: > > Mechanical changes > If you can use a deterministic tool or a script, don't use AI instead. > If you don't know how to do the change deterministically, you may > ask the AI for help, rather than having it stand in for the tools. > > Small bug fixes > These should be limited to 20 lines of code or less, not including > tests. You are still expected to understand and explain your changes > and the rationale behind them. > > These boundaries do not apply to other uses of AI, such as researching > APIs or algorithms, static analysis, or debugging, provided their output > is not included in contributions. Larger uses of AI are allowed as an > experiment, but they should be agreed upon with the maintainer prior to > submission. > > Use of AI does not remove the need for authors to comply with all other > requirements for contribution. In particular, the "Signed-off-by" > label in a patch submission is a statement that the author takes > responsibility for the entire contents of the patch, certifying that > their patch submission is made in accordance with the rules of the > `Developer's Certificate of Origin (DCO) <dco>`. > > Commit messages for AI-assisted changes > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > When AI/LLM tools produce or substantively shape your patch, add an > ``AI-used-for:`` trailer. The text of the trailer could be one or more > of ``code``, ``tests``, ``docs``, ``research``, possibly followed by an > explanation in parentheses:: > > AI-used-for: tests, docs > AI-used-for: code > AI-used-for: code (refactoring) > AI-used-for: code (prototype) > AI-used-for: research > > The trailer is intended as a clarification of your DCO obligations as > well as to guide reviewers. It is not intended for minimal presence > such as autocomplete or asking for a pre-review of the patch, and it > does not remove your responsibility to understand the changes that you > are submitting. > Why invent something new here when Assisted-by: is used elsewhere and is likely more familiar to other users. > Include the prompt in the commit message if it helps a reviewer judge > the result: > > * yes: "move field ``foo`` from ``struct aa`` to ``struct bb``. If a > function already has a local variable or parameter of type ``struct > bb``, use it instead of accessing ``aa.bb``." > > * yes: "add an implementation of the trait for ``Mutex<T: MyTrait>``, > forwarding the member functions to ``T`` while taking the lock around > the calls". > > * no: "write user-facing documentation for the new tool" > > * no: "write testcases for the new functions" > I think this fundamentally misunderstands how AI tends to be use. It usually is a long, iterative process that's become impossible to capture "THE" prompt. The bsd-user changes under review now are the result of months of memories, hundreds of interactions with claude, including one argument about how things worked. It's OK for people trying to "one shot" things, but it's been my experience having worked with these tools extensively that "one shot" is cool for demos, but not cool for code you have to use in anger. > Deterministic tooling (sed, coccinelle, formatters) is out of scope for > the trailer, but should be mentioned in the commit message. > > [-- Attachment #2: Type: text/html, Size: 8291 bytes --] ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-27 14:14 ` Warner Losh @ 2026-05-27 14:51 ` Kevin Wolf 2026-05-27 16:41 ` Michael S. Tsirkin 2026-05-27 16:05 ` Paolo Bonzini 1 sibling, 1 reply; 59+ messages in thread From: Kevin Wolf @ 2026-05-27 14:51 UTC (permalink / raw) To: Warner Losh; +Cc: Paolo Bonzini, Michael S. Tsirkin, qemu-devel, stefanha Am 27.05.2026 um 16:14 hat Warner Losh geschrieben: > On Wed, May 27, 2026 at 4:01 AM Paolo Bonzini <pbonzini@redhat.com> wrote: > > Commit messages for AI-assisted changes > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > > When AI/LLM tools produce or substantively shape your patch, add an > > ``AI-used-for:`` trailer. The text of the trailer could be one or more > > of ``code``, ``tests``, ``docs``, ``research``, possibly followed by an > > explanation in parentheses:: > > > > AI-used-for: tests, docs > > AI-used-for: code > > AI-used-for: code (refactoring) > > AI-used-for: code (prototype) > > AI-used-for: research > > > > The trailer is intended as a clarification of your DCO obligations as > > well as to guide reviewers. It is not intended for minimal presence > > such as autocomplete or asking for a pre-review of the patch, and it > > does not remove your responsibility to understand the changes that you > > are submitting. > > Why invent something new here when Assisted-by: is used elsewhere > and is likely more familiar to other users. Because Assisted-by: gives different information, which at least to me isn't really interesting at all. It's much more interesting to me if the code I'm looking at is generated, or if you only generated the tests. Kevin ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-27 14:51 ` Kevin Wolf @ 2026-05-27 16:41 ` Michael S. Tsirkin 2026-05-27 16:50 ` Kevin Wolf 0 siblings, 1 reply; 59+ messages in thread From: Michael S. Tsirkin @ 2026-05-27 16:41 UTC (permalink / raw) To: Kevin Wolf; +Cc: Warner Losh, Paolo Bonzini, qemu-devel, stefanha On Wed, May 27, 2026 at 04:51:38PM +0200, Kevin Wolf wrote: > Am 27.05.2026 um 16:14 hat Warner Losh geschrieben: > > On Wed, May 27, 2026 at 4:01 AM Paolo Bonzini <pbonzini@redhat.com> wrote: > > > Commit messages for AI-assisted changes > > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > > > > When AI/LLM tools produce or substantively shape your patch, add an > > > ``AI-used-for:`` trailer. The text of the trailer could be one or more > > > of ``code``, ``tests``, ``docs``, ``research``, possibly followed by an > > > explanation in parentheses:: > > > > > > AI-used-for: tests, docs > > > AI-used-for: code > > > AI-used-for: code (refactoring) > > > AI-used-for: code (prototype) > > > AI-used-for: research > > > > > > The trailer is intended as a clarification of your DCO obligations as > > > well as to guide reviewers. It is not intended for minimal presence > > > such as autocomplete or asking for a pre-review of the patch, and it > > > does not remove your responsibility to understand the changes that you > > > are submitting. > > > > Why invent something new here when Assisted-by: is used elsewhere > > and is likely more familiar to other users. > > Because Assisted-by: gives different information, which at least to me > isn't really interesting at all. It's much more interesting to me if the > code I'm looking at is generated, or if you only generated the tests. > > Kevin I personally am interested to know which models work better than others. Contributions are about reputation not just code. I'll learn which models produce better output, just like I learn to trust specific contributors better. -- MST ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-27 16:41 ` Michael S. Tsirkin @ 2026-05-27 16:50 ` Kevin Wolf 2026-05-27 16:56 ` Michael S. Tsirkin ` (2 more replies) 0 siblings, 3 replies; 59+ messages in thread From: Kevin Wolf @ 2026-05-27 16:50 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: Warner Losh, Paolo Bonzini, qemu-devel, stefanha Am 27.05.2026 um 18:41 hat Michael S. Tsirkin geschrieben: > On Wed, May 27, 2026 at 04:51:38PM +0200, Kevin Wolf wrote: > > Am 27.05.2026 um 16:14 hat Warner Losh geschrieben: > > > On Wed, May 27, 2026 at 4:01 AM Paolo Bonzini <pbonzini@redhat.com> wrote: > > > > Commit messages for AI-assisted changes > > > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > > > > > > When AI/LLM tools produce or substantively shape your patch, add an > > > > ``AI-used-for:`` trailer. The text of the trailer could be one or more > > > > of ``code``, ``tests``, ``docs``, ``research``, possibly followed by an > > > > explanation in parentheses:: > > > > > > > > AI-used-for: tests, docs > > > > AI-used-for: code > > > > AI-used-for: code (refactoring) > > > > AI-used-for: code (prototype) > > > > AI-used-for: research > > > > > > > > The trailer is intended as a clarification of your DCO obligations as > > > > well as to guide reviewers. It is not intended for minimal presence > > > > such as autocomplete or asking for a pre-review of the patch, and it > > > > does not remove your responsibility to understand the changes that you > > > > are submitting. > > > > > > Why invent something new here when Assisted-by: is used elsewhere > > > and is likely more familiar to other users. > > > > Because Assisted-by: gives different information, which at least to me > > isn't really interesting at all. It's much more interesting to me if the > > code I'm looking at is generated, or if you only generated the tests. > > I personally am interested to know which models work better than others. > Contributions are about reputation not just code. I'll learn which > models produce better output, just like I learn to trust specific > contributors better. You don't see how well the model worked. What you see is filtered by the submitter, and the policy we're discussing is specifically made to make sure that bad results never reach the list. Even for things that do reach the list, Assisted-by: doesn't tell you how much of the submission is AI-generated and it also doesn't tell you if it's "I used model X and a simple prompt gave me the perfect result in the first attempt" or "I used model X and it took me two days of back and forth and eventually I just rewrote most of it, but there are a few AI-generated lines left". So what you should trust is the contributor, not an Assisted-by: tag. Kevin ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-27 16:50 ` Kevin Wolf @ 2026-05-27 16:56 ` Michael S. Tsirkin 2026-05-27 17:06 ` Michael S. Tsirkin 2026-05-27 17:07 ` Warner Losh 2 siblings, 0 replies; 59+ messages in thread From: Michael S. Tsirkin @ 2026-05-27 16:56 UTC (permalink / raw) To: Kevin Wolf; +Cc: Warner Losh, Paolo Bonzini, qemu-devel, stefanha On Wed, May 27, 2026 at 06:50:14PM +0200, Kevin Wolf wrote: > Am 27.05.2026 um 18:41 hat Michael S. Tsirkin geschrieben: > > On Wed, May 27, 2026 at 04:51:38PM +0200, Kevin Wolf wrote: > > > Am 27.05.2026 um 16:14 hat Warner Losh geschrieben: > > > > On Wed, May 27, 2026 at 4:01 AM Paolo Bonzini <pbonzini@redhat.com> wrote: > > > > > Commit messages for AI-assisted changes > > > > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > > > > > > > > When AI/LLM tools produce or substantively shape your patch, add an > > > > > ``AI-used-for:`` trailer. The text of the trailer could be one or more > > > > > of ``code``, ``tests``, ``docs``, ``research``, possibly followed by an > > > > > explanation in parentheses:: > > > > > > > > > > AI-used-for: tests, docs > > > > > AI-used-for: code > > > > > AI-used-for: code (refactoring) > > > > > AI-used-for: code (prototype) > > > > > AI-used-for: research > > > > > > > > > > The trailer is intended as a clarification of your DCO obligations as > > > > > well as to guide reviewers. It is not intended for minimal presence > > > > > such as autocomplete or asking for a pre-review of the patch, and it > > > > > does not remove your responsibility to understand the changes that you > > > > > are submitting. > > > > > > > > Why invent something new here when Assisted-by: is used elsewhere > > > > and is likely more familiar to other users. > > > > > > Because Assisted-by: gives different information, which at least to me > > > isn't really interesting at all. It's much more interesting to me if the > > > code I'm looking at is generated, or if you only generated the tests. > > > > I personally am interested to know which models work better than others. > > Contributions are about reputation not just code. I'll learn which > > models produce better output, just like I learn to trust specific > > contributors better. > > You don't see how well the model worked. What you see is filtered by the > submitter, and the policy we're discussing is specifically made to make > sure that bad results never reach the list. > > Even for things that do reach the list, Assisted-by: doesn't tell you > how much of the submission is AI-generated and it also doesn't tell you > if it's "I used model X and a simple prompt gave me the perfect result > in the first attempt" or "I used model X and it took me two days of back > and forth and eventually I just rewrote most of it, but there are a few > AI-generated lines left". > > So what you should trust is the contributor, not an Assisted-by: tag. > > Kevin Well, AI-used-for research isn't really useful to me at all then. Why do I care about research? -- MST ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-27 16:50 ` Kevin Wolf 2026-05-27 16:56 ` Michael S. Tsirkin @ 2026-05-27 17:06 ` Michael S. Tsirkin 2026-05-27 17:15 ` Warner Losh 2026-05-27 17:07 ` Warner Losh 2 siblings, 1 reply; 59+ messages in thread From: Michael S. Tsirkin @ 2026-05-27 17:06 UTC (permalink / raw) To: Kevin Wolf; +Cc: Warner Losh, Paolo Bonzini, qemu-devel, stefanha On Wed, May 27, 2026 at 06:50:14PM +0200, Kevin Wolf wrote: > Am 27.05.2026 um 18:41 hat Michael S. Tsirkin geschrieben: > > On Wed, May 27, 2026 at 04:51:38PM +0200, Kevin Wolf wrote: > > > Am 27.05.2026 um 16:14 hat Warner Losh geschrieben: > > > > On Wed, May 27, 2026 at 4:01 AM Paolo Bonzini <pbonzini@redhat.com> wrote: > > > > > Commit messages for AI-assisted changes > > > > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > > > > > > > > When AI/LLM tools produce or substantively shape your patch, add an > > > > > ``AI-used-for:`` trailer. The text of the trailer could be one or more > > > > > of ``code``, ``tests``, ``docs``, ``research``, possibly followed by an > > > > > explanation in parentheses:: > > > > > > > > > > AI-used-for: tests, docs > > > > > AI-used-for: code > > > > > AI-used-for: code (refactoring) > > > > > AI-used-for: code (prototype) > > > > > AI-used-for: research > > > > > > > > > > The trailer is intended as a clarification of your DCO obligations as > > > > > well as to guide reviewers. It is not intended for minimal presence > > > > > such as autocomplete or asking for a pre-review of the patch, and it > > > > > does not remove your responsibility to understand the changes that you > > > > > are submitting. > > > > > > > > Why invent something new here when Assisted-by: is used elsewhere > > > > and is likely more familiar to other users. > > > > > > Because Assisted-by: gives different information, which at least to me > > > isn't really interesting at all. It's much more interesting to me if the > > > code I'm looking at is generated, or if you only generated the tests. > > > > I personally am interested to know which models work better than others. > > Contributions are about reputation not just code. I'll learn which > > models produce better output, just like I learn to trust specific > > contributors better. > > You don't see how well the model worked. What you see is filtered by the > submitter, and the policy we're discussing is specifically made to make > sure that bad results never reach the list. > > Even for things that do reach the list, Assisted-by: doesn't tell you > how much of the submission is AI-generated and it also doesn't tell you > if it's "I used model X and a simple prompt gave me the perfect result > in the first attempt" or "I used model X and it took me two days of back > and forth and eventually I just rewrote most of it, but there are a few > AI-generated lines left". I am capable of observing trends over multiple contributions from multiple people. > So what you should trust is the contributor, not an Assisted-by: tag. > > Kevin Both. -- MST ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-27 17:06 ` Michael S. Tsirkin @ 2026-05-27 17:15 ` Warner Losh 0 siblings, 0 replies; 59+ messages in thread From: Warner Losh @ 2026-05-27 17:15 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: Kevin Wolf, Paolo Bonzini, qemu-devel, stefanha [-- Attachment #1: Type: text/plain, Size: 3054 bytes --] On Wed, May 27, 2026 at 11:06 AM Michael S. Tsirkin <mst@redhat.com> wrote: > On Wed, May 27, 2026 at 06:50:14PM +0200, Kevin Wolf wrote: > > Am 27.05.2026 um 18:41 hat Michael S. Tsirkin geschrieben: > > > On Wed, May 27, 2026 at 04:51:38PM +0200, Kevin Wolf wrote: > > > > Am 27.05.2026 um 16:14 hat Warner Losh geschrieben: > > > > > On Wed, May 27, 2026 at 4:01 AM Paolo Bonzini <pbonzini@redhat.com> > wrote: > > > > > > Commit messages for AI-assisted changes > > > > > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > > > > > > > > > > When AI/LLM tools produce or substantively shape your patch, add > an > > > > > > ``AI-used-for:`` trailer. The text of the trailer could be one > or more > > > > > > of ``code``, ``tests``, ``docs``, ``research``, possibly > followed by an > > > > > > explanation in parentheses:: > > > > > > > > > > > > AI-used-for: tests, docs > > > > > > AI-used-for: code > > > > > > AI-used-for: code (refactoring) > > > > > > AI-used-for: code (prototype) > > > > > > AI-used-for: research > > > > > > > > > > > > The trailer is intended as a clarification of your DCO > obligations as > > > > > > well as to guide reviewers. It is not intended for minimal > presence > > > > > > such as autocomplete or asking for a pre-review of the patch, > and it > > > > > > does not remove your responsibility to understand the changes > that you > > > > > > are submitting. > > > > > > > > > > Why invent something new here when Assisted-by: is used elsewhere > > > > > and is likely more familiar to other users. > > > > > > > > Because Assisted-by: gives different information, which at least to > me > > > > isn't really interesting at all. It's much more interesting to me if > the > > > > code I'm looking at is generated, or if you only generated the tests. > > > > > > I personally am interested to know which models work better than > others. > > > Contributions are about reputation not just code. I'll learn which > > > models produce better output, just like I learn to trust specific > > > contributors better. > > > > You don't see how well the model worked. What you see is filtered by the > > submitter, and the policy we're discussing is specifically made to make > > sure that bad results never reach the list. > > > > Even for things that do reach the list, Assisted-by: doesn't tell you > > how much of the submission is AI-generated and it also doesn't tell you > > if it's "I used model X and a simple prompt gave me the perfect result > > in the first attempt" or "I used model X and it took me two days of back > > and forth and eventually I just rewrote most of it, but there are a few > > AI-generated lines left". > > I am capable of observing trends over multiple contributions from > multiple people. > As the primary person landing commits on the FreeBSD github experiment, I can say that I have observed trends over multiple committer and can spot the ones using Claude + Opus 4.5 or 4.6. Warner [-- Attachment #2: Type: text/html, Size: 4094 bytes --] ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-27 16:50 ` Kevin Wolf 2026-05-27 16:56 ` Michael S. Tsirkin 2026-05-27 17:06 ` Michael S. Tsirkin @ 2026-05-27 17:07 ` Warner Losh 2 siblings, 0 replies; 59+ messages in thread From: Warner Losh @ 2026-05-27 17:07 UTC (permalink / raw) To: Kevin Wolf; +Cc: Michael S. Tsirkin, Paolo Bonzini, qemu-devel, stefanha [-- Attachment #1: Type: text/plain, Size: 4722 bytes --] On Wed, May 27, 2026 at 10:50 AM Kevin Wolf <kwolf@redhat.com> wrote: > Am 27.05.2026 um 18:41 hat Michael S. Tsirkin geschrieben: > > On Wed, May 27, 2026 at 04:51:38PM +0200, Kevin Wolf wrote: > > > Am 27.05.2026 um 16:14 hat Warner Losh geschrieben: > > > > On Wed, May 27, 2026 at 4:01 AM Paolo Bonzini <pbonzini@redhat.com> > wrote: > > > > > Commit messages for AI-assisted changes > > > > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > > > > > > > > When AI/LLM tools produce or substantively shape your patch, add an > > > > > ``AI-used-for:`` trailer. The text of the trailer could be one or > more > > > > > of ``code``, ``tests``, ``docs``, ``research``, possibly followed > by an > > > > > explanation in parentheses:: > > > > > > > > > > AI-used-for: tests, docs > > > > > AI-used-for: code > > > > > AI-used-for: code (refactoring) > > > > > AI-used-for: code (prototype) > > > > > AI-used-for: research > > > > > > > > > > The trailer is intended as a clarification of your DCO obligations > as > > > > > well as to guide reviewers. It is not intended for minimal > presence > > > > > such as autocomplete or asking for a pre-review of the patch, and > it > > > > > does not remove your responsibility to understand the changes that > you > > > > > are submitting. > > > > > > > > Why invent something new here when Assisted-by: is used elsewhere > > > > and is likely more familiar to other users. > > > > > > Because Assisted-by: gives different information, which at least to me > > > isn't really interesting at all. It's much more interesting to me if > the > > > code I'm looking at is generated, or if you only generated the tests. > > > > I personally am interested to know which models work better than others. > > Contributions are about reputation not just code. I'll learn which > > models produce better output, just like I learn to trust specific > > contributors better. > > You don't see how well the model worked. What you see is filtered by the > submitter, and the policy we're discussing is specifically made to make > sure that bad results never reach the list. > You actually do. Bad results will 100% hit the list. Guaranteed. No policy will stop that. Having the right tag, like the model used, will help train reviewers which ones work and which ones don't for their subset of the tree. > Even for things that do reach the list, Assisted-by: doesn't tell you > how much of the submission is AI-generated and it also doesn't tell you > if it's "I used model X and a simple prompt gave me the perfect result > in the first attempt" or "I used model X and it took me two days of back > and forth and eventually I just rewrote most of it, but there are a few > AI-generated lines left". > I covered that in my bsd-user cover letter. But in that case I used it mostly to move the code from upstream with proper attribution (yes, deterministic tools could do that, but I wasted a week on trying to write them years ago and AI is just "move this over" for me now). But then, after I started, I had Claude reivew them in thee style of reivews I'd gotten in the past, so it developed a checklist to do the reviews and at first I had those separate since they were a joint effort (mostly Claude finding the issue which was obvious to fix, but I had it re-review my fixes and/or suggest its own). I merged them after advice that said basically "do what you do for human reviews: fold them back" so I did that in subsequent reviews. Also, I've started seeing claude generated submissions for FreeBSD pull requests. I recognize its style since I've interacted with it so much. I also recognize other, unknown to me, styles that have different quirks that I know Claude doesn't do, but that other models do do. I wish I had the raw data to know which AI tools and models were used since that cues me to look for certain things (with claude it tends to be it's insane insistance for printf'ing what's going on to an annoying degree). And there people have been trying to sneak things in since a preliminary "no ai" policy leaked out that was never formally approved. I have to guess at it. > So what you should trust is the contributor, not an Assisted-by: tag. > You should use both to form your opinion. It's not an either-or situation. I've had conversations with the AI contributors in FreeBSD about how they use it and tend to ask more questions at review than when AI isn't used. Knowing what AI was used will help me delve into different things that I might not otherwise poke at since I know the quirks of at least a couple tools by now. Warner [-- Attachment #2: Type: text/html, Size: 6247 bytes --] ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-27 14:14 ` Warner Losh 2026-05-27 14:51 ` Kevin Wolf @ 2026-05-27 16:05 ` Paolo Bonzini 2026-05-27 16:48 ` Michael S. Tsirkin 1 sibling, 1 reply; 59+ messages in thread From: Paolo Bonzini @ 2026-05-27 16:05 UTC (permalink / raw) To: Warner Losh; +Cc: Kevin Wolf, Michael S. Tsirkin, qemu-devel, stefanha On 5/27/26 16:14, Warner Losh wrote: > Why invent something new here when Assisted-by: is used elsewhere > and is likely more familiar to other users. Because Assisted-by was invented by AI companies to get free advertisement. (It's not just me; see https://akselmo.dev/posts/stop-advertising-in-your-commits/ for an example). Also, it does not answer any interesting question. Not using it is also a good way to sieve people who didn't bother to read the policy, now that I think about it. > * yes: "move field ``foo`` from ``struct aa`` to ``struct bb``. If a > function already has a local variable or parameter of type ``struct > bb``, use it instead of accessing ``aa.bb <http://aa.bb>``." > > * yes: "add an implementation of the trait for ``Mutex<T: MyTrait>``, > forwarding the member functions to ``T`` while taking the lock around > the calls". > > * no: "write user-facing documentation for the new tool" > > * no: "write testcases for the new functions" > > I think this fundamentally misunderstands how AI tends to be use. It > usually is a long, iterative process that's become impossible to capture > "THE" prompt. Sure, but remember that these rules are for the cases listed above: tests, mechanical changes, <20 line fixes. Even for mechanical changes, using AI does not remove the need to separate commits, therefore having small "one shottable" prompts (50 prompts for planning + 20-ish patches) is a plausible way to proceed. For example large parts of https://lore.kernel.org/kvm/20260511150648.685374-1-pbonzini@redhat.com/ were done as a ~30 minutes plan mode conversation followed by ~1.5 hours of mostly one-shotted prompts. Here are some real examples of the prompts: "ok, i think we're done. double check, then prepare for changing walk_mmu to struct kvm_pagewalk (initializing it with .w)" "rename walk_mmu to cpu_walk" "next step is changing nested_mmu from `struct kvm_mmu` to `struct kvm_pagewalk" "ok! now we have to change cpu_walk to not be a pointer. init_kvm_nested_cpu_walk becomes init_kvm_cpu_walk and is called always in kvm_init_mmu. init_kvm_tdp_mmu stops initializing context->w. this won't be all of it but it's a start" "move towards removing explicit access to w when the mmu is known, for example when initializing shadow EPT/NPT you want to use tdp_walk instead of w. later on we will figure out whether to 1) remove w 2) add it back to struct kvm_mmu but this time as a pointer 3) do something like mmu == guest_mmu ? tdp_walk : cpu_walk" "pull all the permissions stuff into a separate struct kvm_page_format" [here the LLM wanted to clarify what's the "permissions stuff" :)] "good analysis - leave cpu_role aside, and put the others (which are used by e.g. permission_fault()) in the new struct" "great, merge struct rsvd_bits_validate into kvm_page_format (changing `struct rsvd_bits_validate shadow_zero_check` to `struct kvm_page_format` while (for now) ignoring the initialization of other fields)" If you don't work like that fine, it's not mandatory. Paolo ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-27 16:05 ` Paolo Bonzini @ 2026-05-27 16:48 ` Michael S. Tsirkin 2026-05-27 16:57 ` Warner Losh 0 siblings, 1 reply; 59+ messages in thread From: Michael S. Tsirkin @ 2026-05-27 16:48 UTC (permalink / raw) To: Paolo Bonzini; +Cc: Warner Losh, Kevin Wolf, qemu-devel, stefanha On Wed, May 27, 2026 at 06:05:47PM +0200, Paolo Bonzini wrote: > On 5/27/26 16:14, Warner Losh wrote: > > Why invent something new here when Assisted-by: is used elsewhere > > and is likely more familiar to other users. > > Because Assisted-by was invented by AI companies to get free advertisement. Jonathan Corbet is in the pocket of AI companies? Or Sasha Levin? https://lore.kernel.org/lkml/20251223122110.2496946-1-sashal@kernel.org/ https://lore.kernel.org/lkml/877bqtlzug.fsf@trenco.lwn.net/ > (It's not just me; see > https://akselmo.dev/posts/stop-advertising-in-your-commits/ for an example). Does not impress me as being either super informed or super professional. > Also, it does not answer any interesting question. It does for me - I will learn which models are more likely to produce bad slop. > Not using it is also a good way to sieve people who didn't bother to read > the policy, now that I think about it. In my experience, AI tools add "Co-developed-by:" tags by default. One has to read the policy to add Assisted-by. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-27 16:48 ` Michael S. Tsirkin @ 2026-05-27 16:57 ` Warner Losh 2026-05-27 17:05 ` Michael S. Tsirkin 2026-05-27 17:48 ` Paolo Bonzini 0 siblings, 2 replies; 59+ messages in thread From: Warner Losh @ 2026-05-27 16:57 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: Paolo Bonzini, Kevin Wolf, qemu-devel, stefanha [-- Attachment #1: Type: text/plain, Size: 1737 bytes --] On Wed, May 27, 2026 at 10:48 AM Michael S. Tsirkin <mst@redhat.com> wrote: > On Wed, May 27, 2026 at 06:05:47PM +0200, Paolo Bonzini wrote: > > On 5/27/26 16:14, Warner Losh wrote: > > > Why invent something new here when Assisted-by: is used elsewhere > > > and is likely more familiar to other users. > > > > Because Assisted-by was invented by AI companies to get free > advertisement. > > Jonathan Corbet is in the pocket of AI companies? Or Sasha Levin? > https://lore.kernel.org/lkml/20251223122110.2496946-1-sashal@kernel.org/ > https://lore.kernel.org/lkml/877bqtlzug.fsf@trenco.lwn.net/ Yea. Assisted-by was not invented by AI companies. That's just rubbish. See https://github.com/anthropics/claude-code/issues/36105 for the open issue with Claude. > (It's not just me; see > > https://akselmo.dev/posts/stop-advertising-in-your-commits/ for an > example). > > Does not impress me as being either super informed or super > professional. > Yea, seems crazy. > > Also, it does not answer any interesting question. > > It does for me - I will learn which models are more likely to produce > bad slop. > Same. I can tell the difference in the degree of slop between the different models and have sometimes selected the older one over the newer one because it does a better job. > > Not using it is also a good way to sieve people who didn't bother to read > > the policy, now that I think about it. > > In my experience, AI tools add "Co-developed-by:" tags by default. One > has to read the policy to add Assisted-by. > Co-authored-by: you mean? That's what claude adds for me. I had to take extra effort to add the new Assisted-by: to my patch review. Warner [-- Attachment #2: Type: text/html, Size: 3174 bytes --] ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-27 16:57 ` Warner Losh @ 2026-05-27 17:05 ` Michael S. Tsirkin 2026-05-27 17:48 ` Paolo Bonzini 1 sibling, 0 replies; 59+ messages in thread From: Michael S. Tsirkin @ 2026-05-27 17:05 UTC (permalink / raw) To: Warner Losh; +Cc: Paolo Bonzini, Kevin Wolf, qemu-devel, stefanha On Wed, May 27, 2026 at 10:57:09AM -0600, Warner Losh wrote: > > > On Wed, May 27, 2026 at 10:48 AM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Wed, May 27, 2026 at 06:05:47PM +0200, Paolo Bonzini wrote: > > On 5/27/26 16:14, Warner Losh wrote: > > > Why invent something new here when Assisted-by: is used elsewhere > > > and is likely more familiar to other users. > > > > Because Assisted-by was invented by AI companies to get free > advertisement. > > Jonathan Corbet is in the pocket of AI companies? Or Sasha Levin? > https://lore.kernel.org/lkml/20251223122110.2496946-1-sashal@kernel.org/ > https://lore.kernel.org/lkml/877bqtlzug.fsf@trenco.lwn.net/ > > > Yea. Assisted-by was not invented by AI companies. That's just rubbish. > > See https://github.com/anthropics/claude-code/issues/36105 for the open > issue with Claude. > > > > (It's not just me; see > > https://akselmo.dev/posts/stop-advertising-in-your-commits/ for an > example). > > Does not impress me as being either super informed or super > professional. > > > Yea, seems crazy. > > > > Also, it does not answer any interesting question. > > It does for me - I will learn which models are more likely to produce > bad slop. > > > Same. I can tell the difference in the degree of slop between the different > models > and have sometimes selected the older one over the newer one because it does > a better job. > > > > Not using it is also a good way to sieve people who didn't bother to read > > the policy, now that I think about it. > > In my experience, AI tools add "Co-developed-by:" tags by default. One > has to read the policy to add Assisted-by. > > > Co-authored-by: you mean? That's what claude adds for me. I had to take > extra effort to add the new Assisted-by: to my patch review. > > Warner Right. ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-27 16:57 ` Warner Losh 2026-05-27 17:05 ` Michael S. Tsirkin @ 2026-05-27 17:48 ` Paolo Bonzini 1 sibling, 0 replies; 59+ messages in thread From: Paolo Bonzini @ 2026-05-27 17:48 UTC (permalink / raw) To: Warner Losh; +Cc: Michael S. Tsirkin, Kevin Wolf, qemu-devel, stefanha On Wed, May 27, 2026 at 6:57 PM Warner Losh <imp@bsdimp.com> wrote: > On Wed, May 27, 2026 at 10:48 AM Michael S. Tsirkin <mst@redhat.com> wrote: >> On Wed, May 27, 2026 at 06:05:47PM +0200, Paolo Bonzini wrote: >> > On 5/27/26 16:14, Warner Losh wrote: >> > > Why invent something new here when Assisted-by: is used elsewhere >> > > and is likely more familiar to other users. >> > >> > Because Assisted-by was invented by AI companies to get free advertisement. >> >> Jonathan Corbet is in the pocket of AI companies? Or Sasha Levin? >> https://lore.kernel.org/lkml/20251223122110.2496946-1-sashal@kernel.org/ >> https://lore.kernel.org/lkml/877bqtlzug.fsf@trenco.lwn.net/ > > Yea. Assisted-by was not invented by AI companies. That's just rubbish. I stand corrected - the *format* of the tag with the model name was invented by AI companies, for example Anthropic who embedded it in Claude Code. Likewise for the robot emoji at the end of GitHub PRs. It does not matter whether it's Co-authored-by, Generated-by, Assisted-by. It's the new "Sent from my iPhone" and neither provides useful information to me as a reviewer, even if it ends up in a Linux documentation file. If the submitter uses a "bad model" and is not able to correct that, it's a submitter problem. I'll watch out for the submitter even after they switch to a "good model", because they're not applying critical thinking to the AI's output. On the contrary, if the submitter uses a "good model" for scaffolding and does awesome work on top, I don't think I should penalize the submitter. > See https://github.com/anthropics/claude-code/issues/36105 for the open > issue with Claude. Good luck with that - Anthropic uses Co-authored-by so that the authors show up on GitHub as "foo and claude"[1]. It's a *blatant* marketing ploy, just like "Co-authored-by: Coke <nobody@coca-cola.com>" would be. For what it's worth, Claude agrees. When asked to "commit, but remove the tag as I don't want to give out free ad space" it says this: "There is one legitimate non-advertising function, and it's worth naming precisely so we don't overcorrect: disclosure where a project's policy requires it. Some projects now want to know whether code was AI-assisted — for DCO/sign-off validity, copyright-provenance reasons, license concerns. That's a real maintainer interest. But notice it doesn't rescue either default: 1. The interest is in the fact of AI assistance, not which model. The model name and version add nothing to that. 2. Disclosure should track the project's policy and be the submitter's conscious act — a line in the cover letter, written deliberately — not a trailer the tool injects regardless of whether the project cares. A non-adaptive default isn't disclosure, it's a watermark". Thanks, Paolo [1] see https://github.com/wiktor-k/ssh-agent-lib/pull/98/changes/a38760b6b4 for an example >> > Not using it is also a good way to sieve people who didn't bother to read >> > the policy, now that I think about it. >> >> In my experience, AI tools add "Co-developed-by:" tags by default. One >> has to read the policy to add Assisted-by. > > Co-authored-by: you mean? That's what claude adds for me. I had to take > extra effort to add the new Assisted-by: to my patch review. > > Warner ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-27 10:01 ` Paolo Bonzini ` (4 preceding siblings ...) 2026-05-27 14:14 ` Warner Losh @ 2026-05-27 16:39 ` Michael S. Tsirkin 5 siblings, 0 replies; 59+ messages in thread From: Michael S. Tsirkin @ 2026-05-27 16:39 UTC (permalink / raw) To: Paolo Bonzini; +Cc: Kevin Wolf, Warner Losh, qemu-devel, stefanha On Wed, May 27, 2026 at 12:01:10PM +0200, Paolo Bonzini wrote: > Commit messages for AI-assisted changes > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > When AI/LLM tools produce or substantively shape your patch, add an > ``AI-used-for:`` trailer. The text of the trailer could be one or more of > ``code``, ``tests``, ``docs``, ``research``, possibly followed by an > explanation in parentheses:: > > AI-used-for: tests, docs > AI-used-for: code > AI-used-for: code (refactoring) > AI-used-for: code (prototype) > AI-used-for: research > > The trailer is intended as a clarification of your DCO obligations as well > as to guide reviewers. It is not intended for minimal presence such as > autocomplete or asking for a pre-review of the patch, and it does not remove > your responsibility to understand the changes that you are submitting. > > Include the prompt in the commit message if it helps a reviewer judge the > result: > > * yes: "move field ``foo`` from ``struct aa`` to ``struct bb``. If a > function already has a local variable or parameter of type ``struct bb``, > use it instead of accessing ``aa.bb``." > > * yes: "add an implementation of the trait for ``Mutex<T: MyTrait>``, > forwarding the member functions to ``T`` while taking the lock around the > calls". > > * no: "write user-facing documentation for the new tool" > > * no: "write testcases for the new functions" I don't understand what these yes/no examples are trying to show. AI tools aren't really yet up to the task of generating a reasonable qemu patchset from a single prompt. As a reviewer, I am not really interested what kind of magic "think extra ultra hard" invocation was used to coax the output from the model. I actually would like to know which model was used, since I expect that with time I'll learn to trust output from specific models more just like I trust output from specific contributors more. -- MST ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-26 18:59 ` Kevin Wolf 2026-05-26 19:30 ` Michael S. Tsirkin @ 2026-05-26 19:50 ` Michael S. Tsirkin 2026-05-27 7:44 ` Kevin Wolf 1 sibling, 1 reply; 59+ messages in thread From: Michael S. Tsirkin @ 2026-05-26 19:50 UTC (permalink / raw) To: Kevin Wolf; +Cc: qemu-devel, stefanha On Tue, May 26, 2026 at 08:59:55PM +0200, Kevin Wolf wrote: > maybe practically speaking it has to be all or nothing in terms of > creativity (for lack of a better word). That's exactly what copyright is, right? creative expression. So e.g. adding an include to make a file compile is not creative. -- MST ^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: on ai generated and code provenance 2026-05-26 19:50 ` Michael S. Tsirkin @ 2026-05-27 7:44 ` Kevin Wolf 0 siblings, 0 replies; 59+ messages in thread From: Kevin Wolf @ 2026-05-27 7:44 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: qemu-devel, stefanha Am 26.05.2026 um 21:50 hat Michael S. Tsirkin geschrieben: > On Tue, May 26, 2026 at 08:59:55PM +0200, Kevin Wolf wrote: > > maybe practically speaking it has to be all or nothing in terms of > > creativity (for lack of a better word). > > That's exactly what copyright is, right? creative expression. > So e.g. adding an include to make a file compile is not creative. Yes, it's definitely similar to the question if something is copyrightable or not. The threshold could be different in a project specific policy, but that threshold is somewhat unclear anyway - and that was my point, I'm not sure if it can be made clear. It's easy to find examples that are clearly below the threshold (your adding an #include to fix the build) and examples that are clearly above it (say, a complex new device). But for a specific change somewhere in the middle of the range, it can still be quite hard to tell. Kevin ^ permalink raw reply [flat|nested] 59+ messages in thread
end of thread, other threads:[~2026-05-28 7:33 UTC | newest] Thread overview: 59+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-05-24 12:42 on ai generated and code provenance Michael S. Tsirkin 2026-05-24 17:06 ` Alex Bennée 2026-05-24 17:42 ` Michael S. Tsirkin 2026-05-24 18:26 ` Warner Losh 2026-05-24 20:04 ` Michael S. Tsirkin 2026-05-24 20:11 ` Michael S. Tsirkin 2026-05-24 20:44 ` Stefan Hajnoczi 2026-05-25 15:27 ` Stefan Hajnoczi 2026-05-25 16:32 ` Paolo Bonzini 2026-05-25 17:15 ` Warner Losh 2026-05-25 19:44 ` Stefan Hajnoczi 2026-05-25 22:36 ` Michael S. Tsirkin 2026-05-26 13:16 ` Stefan Hajnoczi 2026-05-25 19:56 ` Paolo Bonzini 2026-05-26 21:48 ` Philippe Mathieu-Daudé 2026-05-26 8:23 ` Peter Maydell 2026-05-26 9:28 ` Alex Bennée 2026-05-26 9:57 ` Paolo Bonzini 2026-05-26 11:27 ` BALATON Zoltan 2026-05-26 12:30 ` Michael S. Tsirkin 2026-05-26 12:37 ` Manos Pitsidianakis 2026-05-26 13:00 ` Michael S. Tsirkin 2026-05-26 13:22 ` Stefan Hajnoczi 2026-05-26 14:01 ` Warner Losh 2026-05-27 7:11 ` Philippe Mathieu-Daudé 2026-05-26 17:43 ` Kevin Wolf 2026-05-26 18:03 ` Michael S. Tsirkin 2026-05-26 18:59 ` Kevin Wolf 2026-05-26 19:30 ` Michael S. Tsirkin 2026-05-26 19:52 ` Warner Losh 2026-05-27 8:41 ` Kevin Wolf 2026-05-27 10:01 ` Paolo Bonzini 2026-05-27 10:43 ` Alex Bennée 2026-05-27 12:49 ` Kevin Wolf 2026-05-27 10:53 ` Kevin Wolf 2026-05-27 12:33 ` Paolo Bonzini 2026-05-27 12:43 ` Michael S. Tsirkin 2026-05-27 10:54 ` Alistair Francis 2026-05-27 14:21 ` Warner Losh 2026-05-28 1:59 ` Alistair Francis 2026-05-28 5:06 ` Michael S. Tsirkin 2026-05-28 7:32 ` Paolo Bonzini 2026-05-27 14:11 ` Michael S. Tsirkin 2026-05-27 14:14 ` Warner Losh 2026-05-27 14:51 ` Kevin Wolf 2026-05-27 16:41 ` Michael S. Tsirkin 2026-05-27 16:50 ` Kevin Wolf 2026-05-27 16:56 ` Michael S. Tsirkin 2026-05-27 17:06 ` Michael S. Tsirkin 2026-05-27 17:15 ` Warner Losh 2026-05-27 17:07 ` Warner Losh 2026-05-27 16:05 ` Paolo Bonzini 2026-05-27 16:48 ` Michael S. Tsirkin 2026-05-27 16:57 ` Warner Losh 2026-05-27 17:05 ` Michael S. Tsirkin 2026-05-27 17:48 ` Paolo Bonzini 2026-05-27 16:39 ` Michael S. Tsirkin 2026-05-26 19:50 ` Michael S. Tsirkin 2026-05-27 7:44 ` Kevin Wolf
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.