* [PATCH v5 1/3] docs: introduce dedicated page about code provenance / sign-off
2025-06-16 9:22 [PATCH v5 0/3] docs: define policy forbidding use of "AI" / LLM code generators Markus Armbruster
@ 2025-06-16 9:22 ` Markus Armbruster
2025-06-16 9:22 ` [PATCH v5 2/3] docs: define policy limiting the inclusion of generated files Markus Armbruster
` (3 subsequent siblings)
4 siblings, 0 replies; 21+ messages in thread
From: Markus Armbruster @ 2025-06-16 9:22 UTC (permalink / raw)
To: qemu-devel
Cc: Daniel P . Berrangé, Thomas Huth, Alex Bennée,
Michael S . Tsirkin, Gerd Hoffmann, Mark Cave-Ayland,
Philippe Mathieu-Daudé, Kevin Wolf, Stefan Hajnoczi,
Alexander Graf, Paolo Bonzini, Richard Henderson, Peter Maydell
From: Daniel P. Berrangé <berrange@redhat.com>
Currently we have a short paragraph saying that patches must include
a Signed-off-by line, and merely link to the kernel documentation.
The linked kernel docs have a lot of content beyond the part about
sign-off an thus are misleading/distracting to QEMU contributors.
This introduces a dedicated 'code-provenance' page in QEMU talking
about why we require sign-off, explaining the other tags we commonly
use, and what to do in some edge cases.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
docs/devel/code-provenance.rst | 230 ++++++++++++++++++++++++++++++
docs/devel/index-process.rst | 1 +
docs/devel/submitting-a-patch.rst | 23 +--
3 files changed, 233 insertions(+), 21 deletions(-)
create mode 100644 docs/devel/code-provenance.rst
diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
new file mode 100644
index 0000000000..95b2dd34e2
--- /dev/null
+++ b/docs/devel/code-provenance.rst
@@ -0,0 +1,230 @@
+.. _code-provenance:
+
+Code provenance
+===============
+
+Certifying patch submissions
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The QEMU community **mandates** all contributors to certify provenance of
+patch submissions they make to the project. To put it another way,
+contributors must indicate that they are legally permitted to contribute to
+the project.
+
+Certification is achieved with a low overhead by adding a single line to the
+bottom of every git commit::
+
+ Signed-off-by: YOUR NAME <YOUR@EMAIL>
+
+The addition of this line asserts that the author of the patch is contributing
+in accordance with the clauses specified in the
+`Developer's Certificate of Origin <https://developercertificate.org>`__:
+
+.. _dco:
+
+ Developer's Certificate of Origin 1.1
+
+ By making a contribution to this project, I certify that:
+
+ (a) The contribution was created in whole or in part by me and I
+ have the right to submit it under the open source license
+ indicated in the file; or
+
+ (b) The contribution is based upon previous work that, to the best
+ of my knowledge, is covered under an appropriate open source
+ license and I have the right under that license to submit that
+ work with modifications, whether created in whole or in part
+ by me, under the same open source license (unless I am
+ permitted to submit under a different license), as indicated
+ in the file; or
+
+ (c) The contribution was provided directly to me by some other
+ person who certified (a), (b) or (c) and I have not modified
+ it.
+
+ (d) I understand and agree that this project and the contribution
+ are public and that a record of the contribution (including all
+ personal information I submit with it, including my sign-off) is
+ maintained indefinitely and may be redistributed consistent with
+ this project or the open source license(s) involved.
+
+The name used with "Signed-off-by" does not need to be your legal name, nor
+birth name, nor appear on any government ID. It is the identity you choose to
+be known by in the community, but should not be anonymous, nor misrepresent
+whom you are.
+
+It is generally expected that the name and email addresses used in one of the
+``Signed-off-by`` lines, matches that of the git commit ``Author`` field.
+It's okay if you subscribe or contribute to the list via more than one
+address, but using multiple addresses in one commit just confuses
+things.
+
+If the person sending the mail is not one of the patch authors, they are
+nonetheless expected to add their own ``Signed-off-by`` to comply with the
+DCO clause (c).
+
+Multiple authorship
+~~~~~~~~~~~~~~~~~~~
+
+It is not uncommon for a patch to have contributions from multiple authors. In
+this scenario, git commits will usually be expected to have a ``Signed-off-by``
+line for each contributor involved in creation of the patch. Some edge cases:
+
+ * The non-primary author's contributions were so trivial that they can be
+ considered not subject to copyright. In this case the secondary authors
+ need not include a ``Signed-off-by``.
+
+ This case most commonly applies where QEMU reviewers give short snippets
+ of code as suggested fixes to a patch. The reviewers don't need to have
+ their own ``Signed-off-by`` added unless their code suggestion was
+ unusually large, but it is common to add ``Suggested-by`` as a credit
+ for non-trivial code.
+
+ * Both contributors work for the same employer and the employer requires
+ copyright assignment.
+
+ It can be said that in this case a ``Signed-off-by`` is indicating that
+ the person has permission to contribute from their employer who is the
+ copyright holder. It is nonetheless still preferable to include a
+ ``Signed-off-by`` for each contributor, as in some countries employees are
+ not able to assign copyright to their employer, and it also covers any
+ time invested outside working hours.
+
+When multiple ``Signed-off-by`` tags are present, they should be strictly kept
+in order of authorship, from oldest to newest.
+
+Other commit tags
+~~~~~~~~~~~~~~~~~
+
+While the ``Signed-off-by`` tag is mandatory, there are a number of other tags
+that are commonly used during QEMU development:
+
+ * **``Reviewed-by``**: when a QEMU community member reviews a patch on the
+ mailing list, if they consider the patch acceptable, they should send an
+ email reply containing a ``Reviewed-by`` tag. Subsystem maintainers who
+ review a patch should add this even if they are also adding their
+ ``Signed-off-by`` to the same commit.
+
+ * **``Acked-by``**: when a QEMU subsystem maintainer approves a patch that
+ touches their subsystem, but intends to allow a different maintainer to
+ queue it and send a pull request, they would send a mail containing a
+ ``Acked-by`` tag. Where a patch touches multiple subsystems, ``Acked-by``
+ only implies review of the maintainers' own areas of responsibility. If a
+ maintainer wants to indicate they have done a full review they should use
+ a ``Reviewed-by`` tag.
+
+ * **``Tested-by``**: when a QEMU community member has functionally tested the
+ behaviour of the patch in some manner, they should send an email reply
+ containing a ``Tested-by`` tag.
+
+ * **``Reported-by``**: when a QEMU community member reports a problem via the
+ mailing list, or some other informal channel that is not the issue tracker,
+ it is good practice to credit them by including a ``Reported-by`` tag on
+ any patch fixing the issue. When the problem is reported via the GitLab
+ issue tracker, however, it is sufficient to just include a link to the
+ issue.
+
+ * **``Suggested-by``**: when a reviewer or other 3rd party makes non-trivial
+ suggestions for how to change a patch, it is good practice to credit them
+ by including a ``Suggested-by`` tag.
+
+Subsystem maintainer requirements
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+When a subsystem maintainer accepts a patch from a contributor, in addition to
+the normal code review points, they are expected to validate the presence of
+suitable ``Signed-off-by`` tags.
+
+At the time they queue the patch in their subsystem tree, the maintainer
+**must** also then add their own ``Signed-off-by`` to indicate that they have
+done the aforementioned validation. This is in addition to any of their own
+``Reviewed-by`` tags the subsystem maintainer may wish to include.
+
+When the maintainer modifies the patch after pulling into their tree, they
+should record their contribution. This is typically done via a note in the
+commit message, just prior to the maintainer's ``Signed-off-by``::
+
+ Signed-off-by: Cory Contributor <cory.contributor@example.com>
+ [Comment rephrased for clarity]
+ Signed-off-by: Mary Maintainer <mary.maintainer@mycorp.test>
+
+
+Tools for adding ``Signed-off-by``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+There are a variety of ways tools can support adding ``Signed-off-by`` tags
+for patches, avoiding the need for contributors to manually type in this
+repetitive text each time.
+
+git commands
+^^^^^^^^^^^^
+
+When creating, or amending, a commit the ``-s`` flag to ``git commit`` will
+append a suitable line matching the configured git author details.
+
+If preparing patches using the ``git format-patch`` tool, the ``-s`` flag can
+be used to append a suitable line in the emails it creates, without modifying
+the local commits. Alternatively to modify all the local commits on a branch::
+
+ git rebase master -x 'git commit --amend --no-edit -s'
+
+emacs
+^^^^^
+
+In the file ``$HOME/.emacs.d/abbrev_defs`` add:
+
+.. code:: elisp
+
+ (define-abbrev-table 'global-abbrev-table
+ '(
+ ("8rev" "Reviewed-by: YOUR NAME <your@email.addr>" nil 1)
+ ("8ack" "Acked-by: YOUR NAME <your@email.addr>" nil 1)
+ ("8test" "Tested-by: YOUR NAME <your@email.addr>" nil 1)
+ ("8sob" "Signed-off-by: YOUR NAME <your@email.addr>" nil 1)
+ ))
+
+with this change, if you type (for example) ``8rev`` followed by ``<space>``
+or ``<enter>`` it will expand to the whole phrase.
+
+vim
+^^^
+
+In the file ``$HOME/.vimrc`` add::
+
+ iabbrev 8rev Reviewed-by: YOUR NAME <your@email.addr>
+ iabbrev 8ack Acked-by: YOUR NAME <your@email.addr>
+ iabbrev 8test Tested-by: YOUR NAME <your@email.addr>
+ iabbrev 8sob Signed-off-by: YOUR NAME <your@email.addr>
+
+with this change, if you type (for example) ``8rev`` followed by ``<space>``
+or ``<enter>`` it will expand to the whole phrase.
+
+Re-starting abandoned work
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+For a variety of reasons there are some patches that get submitted to QEMU but
+never merged. An unrelated contributor may decide (months or years later) to
+continue working from the abandoned patch and re-submit it with extra changes.
+
+The general principles when picking up abandoned work are:
+
+ * Continue to credit the original author for their work, by maintaining their
+ original ``Signed-off-by``
+ * Indicate where the original patch was obtained from (mailing list, bug
+ tracker, author's git repo, etc) when sending it for review
+ * Acknowledge the extra work of the new contributor by including their
+ ``Signed-off-by`` in the patch in addition to the orignal author's
+ * Indicate who is responsible for what parts of the patch. This is typically
+ done via a note in the commit message, just prior to the new contributor's
+ ``Signed-off-by``::
+
+ Signed-off-by: Some Person <some.person@example.com>
+ [Rebased and added support for 'foo']
+ Signed-off-by: New Person <new.person@mycorp.test>
+
+In complicated cases, or if otherwise unsure, ask for advice on the project
+mailing list.
+
+It is also recommended to attempt to contact the original author to let them
+know you are interested in taking over their work, in case they still intended
+to return to the work, or had any suggestions about the best way to continue.
diff --git a/docs/devel/index-process.rst b/docs/devel/index-process.rst
index cb7c6640fd..5807752d70 100644
--- a/docs/devel/index-process.rst
+++ b/docs/devel/index-process.rst
@@ -13,6 +13,7 @@ Notes about how to interact with the community and how and where to submit patch
maintainers
style
submitting-a-patch
+ code-provenance
trivial-patches
stable-process
submitting-a-pull-request
diff --git a/docs/devel/submitting-a-patch.rst b/docs/devel/submitting-a-patch.rst
index 65c64078cb..f7917b899f 100644
--- a/docs/devel/submitting-a-patch.rst
+++ b/docs/devel/submitting-a-patch.rst
@@ -344,28 +344,9 @@ Patch emails must include a ``Signed-off-by:`` line
Your patches **must** include a Signed-off-by: line. This is a hard
requirement because it's how you say "I'm legally okay to contribute
-this and happy for it to go into QEMU". The process is modelled after
-the `Linux kernel
-<http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches?id=f6f94e2ab1b33f0082ac22d71f66385a60d8157f#n297>`__
-policy.
+this and happy for it to go into QEMU". For full guidance, read the
+:ref:`code-provenance` documentation.
-If you wrote the patch, make sure your "From:" and "Signed-off-by:"
-lines use the same spelling. It's okay if you subscribe or contribute to
-the list via more than one address, but using multiple addresses in one
-commit just confuses things. If someone else wrote the patch, git will
-include a "From:" line in the body of the email (different from your
-envelope From:) that will give credit to the correct author; but again,
-that author's Signed-off-by: line is mandatory, with the same spelling.
-
-The name used with "Signed-off-by" does not need to be your legal name,
-nor birth name, nor appear on any government ID. It is the identity you
-choose to be known by in the community, but should not be anonymous,
-nor misrepresent whom you are.
-
-There are various tooling options for automatically adding these tags
-include using ``git commit -s`` or ``git format-patch -s``. For more
-information see `SubmittingPatches 1.12
-<http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches?id=f6f94e2ab1b33f0082ac22d71f66385a60d8157f#n297>`__.
.. _include_a_meaningful_cover_letter:
--
2.49.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v5 2/3] docs: define policy limiting the inclusion of generated files
2025-06-16 9:22 [PATCH v5 0/3] docs: define policy forbidding use of "AI" / LLM code generators Markus Armbruster
2025-06-16 9:22 ` [PATCH v5 1/3] docs: introduce dedicated page about code provenance / sign-off Markus Armbruster
@ 2025-06-16 9:22 ` Markus Armbruster
2025-06-16 9:22 ` [PATCH v5 3/3] docs: define policy forbidding use of AI code generators Markus Armbruster
` (2 subsequent siblings)
4 siblings, 0 replies; 21+ messages in thread
From: Markus Armbruster @ 2025-06-16 9:22 UTC (permalink / raw)
To: qemu-devel
Cc: Daniel P . Berrangé, Thomas Huth, Alex Bennée,
Michael S . Tsirkin, Gerd Hoffmann, Mark Cave-Ayland,
Philippe Mathieu-Daudé, Kevin Wolf, Stefan Hajnoczi,
Alexander Graf, Paolo Bonzini, Richard Henderson, Peter Maydell
From: Daniel P. Berrangé <berrange@redhat.com>
Files contributed to QEMU are generally expected to be provided in the
preferred format for manipulation. IOW, we generally don't expect to
have generated / compiled code included in the tree, rather, we expect
to run the code generator / compiler as part of the build process.
There are some obvious exceptions to this seen in our existing tree, the
biggest one being the inclusion of many binary firmware ROMs. A more
niche example is the inclusion of a generated eBPF program. Or the CI
dockerfiles which are mostly auto-generated. In these cases, however,
the preferred format source code is still required to be included,
alongside the generated output.
Tools which perform user defined algorithmic transformations on code are
not considered to be "code generators". ie, we permit use of coccinelle,
spell checkers, and sed/awk/etc to manipulate code. Such use of automated
manipulation should still be declared in the commit message.
One off generators which create a boilerplate file which the author then
fills in, are acceptable if their output has clear copyright and license
status. This could be where a contributor writes a throwaway python
script to automate creation of some mundane piece of code for example.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
docs/devel/code-provenance.rst | 55 ++++++++++++++++++++++++++++++++++
1 file changed, 55 insertions(+)
diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
index 95b2dd34e2..c25afed98d 100644
--- a/docs/devel/code-provenance.rst
+++ b/docs/devel/code-provenance.rst
@@ -228,3 +228,58 @@ mailing list.
It is also recommended to attempt to contact the original author to let them
know you are interested in taking over their work, in case they still intended
to return to the work, or had any suggestions about the best way to continue.
+
+Inclusion of generated files
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Files in patches contributed to QEMU are generally expected to be provided
+only in the preferred format for making modifications. The implication of
+this is that the output of code generators or compilers is usually not
+appropriate to contribute to QEMU.
+
+For reasons of practicality there are some exceptions to this rule, where
+generated code is permitted, provided it is also accompanied by the
+corresponding preferred source format. This is done where it is impractical
+to expect those building QEMU to run the code generation or compilation
+process. A non-exhaustive list of examples is:
+
+ * Images: where an bitmap image is created from a vector file it is common
+ to include the rendered bitmaps at desired resolution(s), since subtle
+ changes in the rasterization process / tools may affect quality. The
+ original vector file is expected to accompany any generated bitmaps.
+
+ * Firmware: QEMU includes pre-compiled binary ROMs for a variety of guest
+ firmwares. When such binary ROMs are contributed, the corresponding source
+ must also be provided, either directly, or through a git submodule link.
+
+ * Dockerfiles: the majority of the dockerfiles are automatically generated
+ from a canonical list of build dependencies maintained in tree, together
+ with the libvirt-ci git submodule link. The generated dockerfiles are
+ included in tree because it is desirable to be able to directly build
+ container images from a clean git checkout.
+
+ * eBPF: QEMU includes some generated eBPF machine code, since the required
+ eBPF compilation tools are not broadly available on all targetted OS
+ distributions. The corresponding eBPF C code for the binary is also
+ provided. This is a time-limited exception until the eBPF toolchain is
+ sufficiently broadly available in distros.
+
+In all cases above, the existence of generated files must be acknowledged
+and justified in the commit that introduces them.
+
+Tools which perform changes to existing code with deterministic algorithmic
+manipulation, driven by user specified inputs, are not generally considered
+to be "generators".
+
+For instance, using Coccinelle to convert code from one pattern to another
+pattern, or fixing documentation typos with a spell checker, or transforming
+code using sed / awk / etc, are not considered to be acts of code
+generation. Where an automated manipulation is performed on code, however,
+this should be declared in the commit message.
+
+At times contributors may use or create scripts/tools to generate an initial
+boilerplate code template which is then filled in to produce the final patch.
+The output of such a tool would still be considered the "preferred format",
+since it is intended to be a foundation for further human authored changes.
+Such tools are acceptable to use, provided there is clearly defined copyright
+and licensing for their output.
--
2.49.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v5 3/3] docs: define policy forbidding use of AI code generators
2025-06-16 9:22 [PATCH v5 0/3] docs: define policy forbidding use of "AI" / LLM code generators Markus Armbruster
2025-06-16 9:22 ` [PATCH v5 1/3] docs: introduce dedicated page about code provenance / sign-off Markus Armbruster
2025-06-16 9:22 ` [PATCH v5 2/3] docs: define policy limiting the inclusion of generated files Markus Armbruster
@ 2025-06-16 9:22 ` Markus Armbruster
2025-06-25 19:16 ` Michael S. Tsirkin
2025-06-26 6:34 ` Michael S. Tsirkin
2025-06-23 19:30 ` [PATCH v5 0/3] docs: define policy forbidding use of "AI" / LLM " Stefan Hajnoczi
2025-06-24 17:33 ` Stefan Hajnoczi
4 siblings, 2 replies; 21+ messages in thread
From: Markus Armbruster @ 2025-06-16 9:22 UTC (permalink / raw)
To: qemu-devel
Cc: Daniel P . Berrangé, Thomas Huth, Alex Bennée,
Michael S . Tsirkin, Gerd Hoffmann, Mark Cave-Ayland,
Philippe Mathieu-Daudé, Kevin Wolf, Stefan Hajnoczi,
Alexander Graf, Paolo Bonzini, Richard Henderson, Peter Maydell
From: Daniel P. Berrangé <berrange@redhat.com>
There has been an explosion of interest in so called AI code
generators. Thus far though, this is has not been matched by a broadly
accepted legal interpretation of the licensing implications for code
generator outputs. While the vendors may claim there is no problem and
a free choice of license is possible, they have an inherent conflict
of interest in promoting this interpretation. More broadly there is,
as yet, no broad consensus on the licensing implications of code
generators trained on inputs under a wide variety of licenses
The DCO requires contributors to assert they have the right to
contribute under the designated project license. Given the lack of
consensus on the licensing of AI code generator output, it is not
considered credible to assert compliance with the DCO clause (b) or (c)
where a patch includes such generated code.
This patch thus defines a policy that the QEMU project will currently
not accept contributions where use of AI code generators is either
known, or suspected.
These are early days of AI-assisted software development. The legal
questions will be resolved eventually. The tools will mature, and we
can expect some to become safely usable in free software projects.
The policy we set now must be for today, and be open to revision. It's
best to start strict and safe, then relax.
Meanwhile requests for exceptions can also be considered on a case by
case basis.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
docs/devel/code-provenance.rst | 55 +++++++++++++++++++++++++++++++++-
1 file changed, 54 insertions(+), 1 deletion(-)
diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
index c25afed98d..b5aae2e253 100644
--- a/docs/devel/code-provenance.rst
+++ b/docs/devel/code-provenance.rst
@@ -282,4 +282,57 @@ boilerplate code template which is then filled in to produce the final patch.
The output of such a tool would still be considered the "preferred format",
since it is intended to be a foundation for further human authored changes.
Such tools are acceptable to use, provided there is clearly defined copyright
-and licensing for their output.
+and licensing for their output. Note in particular the caveats applying to AI
+content generators below.
+
+Use of AI content generators
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+TL;DR:
+
+ **Current QEMU project policy is to DECLINE any contributions which are
+ believed to include or derive from AI generated content. This includes
+ ChatGPT, Claude, Copilot, Llama and similar tools.**
+
+The increasing prevalence of AI-assisted software development results in a
+number of difficult legal questions and risks for software projects, including
+QEMU. Of particular concern is content generated by `Large Language Models
+<https://en.wikipedia.org/wiki/Large_language_model>`__ (LLMs).
+
+The QEMU community requires that contributors certify their patch submissions
+are made in accordance with the rules of the `Developer's Certificate of
+Origin (DCO) <dco>`.
+
+To satisfy the DCO, the patch contributor has to fully understand the
+copyright and license status of content they are contributing to QEMU. With AI
+content generators, the copyright and license status of the output is
+ill-defined with no generally accepted, settled legal foundation.
+
+Where the training material is known, it is common for it to include large
+volumes of material under restrictive licensing/copyright terms. Even where
+the training material is all known to be under open source licenses, it is
+likely to be under a variety of terms, not all of which will be compatible
+with QEMU's licensing requirements.
+
+How contributors could comply with DCO terms (b) or (c) for the output of AI
+content generators commonly available today is unclear. The QEMU project is
+not willing or able to accept the legal risks of non-compliance.
+
+The QEMU project thus requires that contributors refrain from using AI content
+generators on patches intended to be submitted to the project, and will
+decline any contribution if use of AI is either known or suspected.
+
+This policy does not apply to other uses of AI, such as researching APIs or
+algorithms, static analysis, or debugging, provided their output is not to be
+included in contributions.
+
+Examples of tools impacted by this policy includes GitHub's CoPilot, OpenAI's
+ChatGPT, Anthropic's Claude, and Meta's Code Llama, and code/content
+generation agents which are built on top of such tools.
+
+This policy may evolve as AI tools mature and the legal situation is
+clarifed. In the meanwhile, requests for exceptions to this policy will be
+evaluated by the QEMU project on a case by case basis. To be granted an
+exception, a contributor will need to demonstrate clarity of the license and
+copyright status for the tool's output in relation to its training model and
+code, to the satisfaction of the project maintainers.
--
2.49.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v5 3/3] docs: define policy forbidding use of AI code generators
2025-06-16 9:22 ` [PATCH v5 3/3] docs: define policy forbidding use of AI code generators Markus Armbruster
@ 2025-06-25 19:16 ` Michael S. Tsirkin
2025-06-25 19:46 ` Daniel P. Berrangé
2025-06-25 20:38 ` Kevin Wolf
2025-06-26 6:34 ` Michael S. Tsirkin
1 sibling, 2 replies; 21+ messages in thread
From: Michael S. Tsirkin @ 2025-06-25 19:16 UTC (permalink / raw)
To: Markus Armbruster
Cc: qemu-devel, Daniel P . Berrangé, Thomas Huth,
Alex Bennée, Gerd Hoffmann, Mark Cave-Ayland,
Philippe Mathieu-Daudé, Kevin Wolf, Stefan Hajnoczi,
Alexander Graf, Paolo Bonzini, Richard Henderson, Peter Maydell
On Mon, Jun 16, 2025 at 11:22:41AM +0200, Markus Armbruster wrote:
> From: Daniel P. Berrangé <berrange@redhat.com>
>
> There has been an explosion of interest in so called AI code
> generators. Thus far though, this is has not been matched by a broadly
> accepted legal interpretation of the licensing implications for code
> generator outputs. While the vendors may claim there is no problem and
> a free choice of license is possible, they have an inherent conflict
> of interest in promoting this interpretation. More broadly there is,
> as yet, no broad consensus on the licensing implications of code
> generators trained on inputs under a wide variety of licenses
>
> The DCO requires contributors to assert they have the right to
> contribute under the designated project license. Given the lack of
> consensus on the licensing of AI code generator output, it is not
> considered credible to assert compliance with the DCO clause (b) or (c)
> where a patch includes such generated code.
>
> This patch thus defines a policy that the QEMU project will currently
> not accept contributions where use of AI code generators is either
> known, or suspected.
>
> These are early days of AI-assisted software development. The legal
> questions will be resolved eventually. The tools will mature, and we
> can expect some to become safely usable in free software projects.
> The policy we set now must be for today, and be open to revision. It's
> best to start strict and safe, then relax.
>
> Meanwhile requests for exceptions can also be considered on a case by
> case basis.
>
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> Reviewed-by: Kevin Wolf <kwolf@redhat.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
Sorry about only reacting now, was AFK.
So one usecase that to me seems entirely valid, is refactoring.
For example, change a function prototype, or a structure,
and have an LLM update all callers.
The only part of the patch that is expressive is the
actual change, the rest is a technicality and has IMHO nothing to do with
copyright. LLMs can just do it with no hassle.
Can we soften this to only apply to expressive code?
I feel a lot of cleanups would be enabled by this.
> ---
> docs/devel/code-provenance.rst | 55 +++++++++++++++++++++++++++++++++-
> 1 file changed, 54 insertions(+), 1 deletion(-)
>
> diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
> index c25afed98d..b5aae2e253 100644
> --- a/docs/devel/code-provenance.rst
> +++ b/docs/devel/code-provenance.rst
> @@ -282,4 +282,57 @@ boilerplate code template which is then filled in to produce the final patch.
> The output of such a tool would still be considered the "preferred format",
> since it is intended to be a foundation for further human authored changes.
> Such tools are acceptable to use, provided there is clearly defined copyright
> -and licensing for their output.
> +and licensing for their output. Note in particular the caveats applying to AI
> +content generators below.
> +
> +Use of AI content generators
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +TL;DR:
> +
> + **Current QEMU project policy is to DECLINE any contributions which are
> + believed to include or derive from AI generated content. This includes
> + ChatGPT, Claude, Copilot, Llama and similar tools.**
> +
> +The increasing prevalence of AI-assisted software development results in a
> +number of difficult legal questions and risks for software projects, including
> +QEMU. Of particular concern is content generated by `Large Language Models
> +<https://en.wikipedia.org/wiki/Large_language_model>`__ (LLMs).
> +
> +The QEMU community requires that contributors certify their patch submissions
> +are made in accordance with the rules of the `Developer's Certificate of
> +Origin (DCO) <dco>`.
> +
> +To satisfy the DCO, the patch contributor has to fully understand the
> +copyright and license status of content they are contributing to QEMU. With AI
> +content generators, the copyright and license status of the output is
> +ill-defined with no generally accepted, settled legal foundation.
> +
> +Where the training material is known, it is common for it to include large
> +volumes of material under restrictive licensing/copyright terms. Even where
> +the training material is all known to be under open source licenses, it is
> +likely to be under a variety of terms, not all of which will be compatible
> +with QEMU's licensing requirements.
> +
> +How contributors could comply with DCO terms (b) or (c) for the output of AI
> +content generators commonly available today is unclear. The QEMU project is
> +not willing or able to accept the legal risks of non-compliance.
> +
> +The QEMU project thus requires that contributors refrain from using AI content
> +generators on patches intended to be submitted to the project, and will
> +decline any contribution if use of AI is either known or suspected.
> +
> +This policy does not apply to other uses of AI, such as researching APIs or
> +algorithms, static analysis, or debugging, provided their output is not to be
> +included in contributions.
> +
> +Examples of tools impacted by this policy includes GitHub's CoPilot, OpenAI's
> +ChatGPT, Anthropic's Claude, and Meta's Code Llama, and code/content
> +generation agents which are built on top of such tools.
> +
> +This policy may evolve as AI tools mature and the legal situation is
> +clarifed. In the meanwhile, requests for exceptions to this policy will be
> +evaluated by the QEMU project on a case by case basis. To be granted an
> +exception, a contributor will need to demonstrate clarity of the license and
> +copyright status for the tool's output in relation to its training model and
> +code, to the satisfaction of the project maintainers.
> --
> 2.49.0
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v5 3/3] docs: define policy forbidding use of AI code generators
2025-06-25 19:16 ` Michael S. Tsirkin
@ 2025-06-25 19:46 ` Daniel P. Berrangé
2025-06-25 20:01 ` Michael S. Tsirkin
2025-06-25 20:38 ` Kevin Wolf
1 sibling, 1 reply; 21+ messages in thread
From: Daniel P. Berrangé @ 2025-06-25 19:46 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Markus Armbruster, qemu-devel, Thomas Huth, Alex Bennée,
Gerd Hoffmann, Mark Cave-Ayland, Philippe Mathieu-Daudé,
Kevin Wolf, Stefan Hajnoczi, Alexander Graf, Paolo Bonzini,
Richard Henderson, Peter Maydell
On Wed, Jun 25, 2025 at 03:16:52PM -0400, Michael S. Tsirkin wrote:
> On Mon, Jun 16, 2025 at 11:22:41AM +0200, Markus Armbruster wrote:
> > From: Daniel P. Berrangé <berrange@redhat.com>
> >
> > There has been an explosion of interest in so called AI code
> > generators. Thus far though, this is has not been matched by a broadly
> > accepted legal interpretation of the licensing implications for code
> > generator outputs. While the vendors may claim there is no problem and
> > a free choice of license is possible, they have an inherent conflict
> > of interest in promoting this interpretation. More broadly there is,
> > as yet, no broad consensus on the licensing implications of code
> > generators trained on inputs under a wide variety of licenses
> >
> > The DCO requires contributors to assert they have the right to
> > contribute under the designated project license. Given the lack of
> > consensus on the licensing of AI code generator output, it is not
> > considered credible to assert compliance with the DCO clause (b) or (c)
> > where a patch includes such generated code.
> >
> > This patch thus defines a policy that the QEMU project will currently
> > not accept contributions where use of AI code generators is either
> > known, or suspected.
> >
> > These are early days of AI-assisted software development. The legal
> > questions will be resolved eventually. The tools will mature, and we
> > can expect some to become safely usable in free software projects.
> > The policy we set now must be for today, and be open to revision. It's
> > best to start strict and safe, then relax.
> >
> > Meanwhile requests for exceptions can also be considered on a case by
> > case basis.
> >
> > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > Reviewed-by: Kevin Wolf <kwolf@redhat.com>
> > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> > Signed-off-by: Markus Armbruster <armbru@redhat.com>
>
> Sorry about only reacting now, was AFK.
>
> So one usecase that to me seems entirely valid, is refactoring.
>
> For example, change a function prototype, or a structure,
> and have an LLM update all callers.
>
> The only part of the patch that is expressive is the
> actual change, the rest is a technicality and has IMHO nothing to do with
> copyright. LLMs can just do it with no hassle.
Well the policy is defined in terms of requirements to comply with
the DCO, and that implicitly indicates that the code in question
is eligible for copyright protection to begin with.
IOW, if a change is such that it is not considered eligible for
copyright protection, then you can take the view that it is trivially
DCO compliant, whether you wrote the code, an arbitrary 3rd party
wrote the code, or whether an AI wrote the code.
> Can we soften this to only apply to expressive code?
>
> I feel a lot of cleanups would be enabled by this.
Trying to detail every possible scenario is impractical and would
make the document too onerous for people to read, remember & apply.
It is better to leave it up to the contributor to decide whether a
change is non-copyrightable, than to try to draw that line crudely
in text. Even for refactoring that line will be fuzzy and contextual,
so not a scenario where we should say any use of AI for reactoring
is OK, as that will lull contributors into having a false sense of
acceptibility, rather than being aware of need to question it.
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v5 3/3] docs: define policy forbidding use of AI code generators
2025-06-25 19:46 ` Daniel P. Berrangé
@ 2025-06-25 20:01 ` Michael S. Tsirkin
2025-06-26 10:41 ` Markus Armbruster
0 siblings, 1 reply; 21+ messages in thread
From: Michael S. Tsirkin @ 2025-06-25 20:01 UTC (permalink / raw)
To: Daniel P. Berrangé
Cc: Markus Armbruster, qemu-devel, Thomas Huth, Alex Bennée,
Gerd Hoffmann, Mark Cave-Ayland, Philippe Mathieu-Daudé,
Kevin Wolf, Stefan Hajnoczi, Alexander Graf, Paolo Bonzini,
Richard Henderson, Peter Maydell
On Wed, Jun 25, 2025 at 08:46:54PM +0100, Daniel P. Berrangé wrote:
> On Wed, Jun 25, 2025 at 03:16:52PM -0400, Michael S. Tsirkin wrote:
> > On Mon, Jun 16, 2025 at 11:22:41AM +0200, Markus Armbruster wrote:
> > > From: Daniel P. Berrangé <berrange@redhat.com>
> > >
> > > There has been an explosion of interest in so called AI code
> > > generators. Thus far though, this is has not been matched by a broadly
> > > accepted legal interpretation of the licensing implications for code
> > > generator outputs. While the vendors may claim there is no problem and
> > > a free choice of license is possible, they have an inherent conflict
> > > of interest in promoting this interpretation. More broadly there is,
> > > as yet, no broad consensus on the licensing implications of code
> > > generators trained on inputs under a wide variety of licenses
> > >
> > > The DCO requires contributors to assert they have the right to
> > > contribute under the designated project license. Given the lack of
> > > consensus on the licensing of AI code generator output, it is not
> > > considered credible to assert compliance with the DCO clause (b) or (c)
> > > where a patch includes such generated code.
> > >
> > > This patch thus defines a policy that the QEMU project will currently
> > > not accept contributions where use of AI code generators is either
> > > known, or suspected.
> > >
> > > These are early days of AI-assisted software development. The legal
> > > questions will be resolved eventually. The tools will mature, and we
> > > can expect some to become safely usable in free software projects.
> > > The policy we set now must be for today, and be open to revision. It's
> > > best to start strict and safe, then relax.
> > >
> > > Meanwhile requests for exceptions can also be considered on a case by
> > > case basis.
> > >
> > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > > Reviewed-by: Kevin Wolf <kwolf@redhat.com>
> > > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> > > Signed-off-by: Markus Armbruster <armbru@redhat.com>
> >
> > Sorry about only reacting now, was AFK.
> >
> > So one usecase that to me seems entirely valid, is refactoring.
> >
> > For example, change a function prototype, or a structure,
> > and have an LLM update all callers.
> >
> > The only part of the patch that is expressive is the
> > actual change, the rest is a technicality and has IMHO nothing to do with
> > copyright. LLMs can just do it with no hassle.
>
> Well the policy is defined in terms of requirements to comply with
> the DCO, and that implicitly indicates that the code in question
> is eligible for copyright protection to begin with.
>
> IOW, if a change is such that it is not considered eligible for
> copyright protection, then you can take the view that it is trivially
> DCO compliant, whether you wrote the code, an arbitrary 3rd party
> wrote the code, or whether an AI wrote the code.
Exactly. I agree! However the patch states:
+The QEMU project thus requires that contributors refrain from using AI content
+generators on patches intended to be submitted to the project, and will
+decline any contribution if use of AI is either known or suspected.
and makes no exception for non copyrighteable parts of the patch.
Or do I misunderstand?
> > Can we soften this to only apply to expressive code?
> >
> > I feel a lot of cleanups would be enabled by this.
>
> Trying to detail every possible scenario is impractical and would
> make the document too onerous for people to read, remember & apply.
> It is better to leave it up to the contributor to decide whether a
> change is non-copyrightable, than to try to draw that line crudely
> in text. Even for refactoring that line will be fuzzy and contextual,
> so not a scenario where we should say any use of AI for reactoring
> is OK, as that will lull contributors into having a false sense of
> acceptibility, rather than being aware of need to question it.
Agree again! What worries me is that the patch as posted here does
not make contributors question anything. It just flatly forbids using "AI
content generators".
--
MST
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v5 3/3] docs: define policy forbidding use of AI code generators
2025-06-25 20:01 ` Michael S. Tsirkin
@ 2025-06-26 10:41 ` Markus Armbruster
0 siblings, 0 replies; 21+ messages in thread
From: Markus Armbruster @ 2025-06-26 10:41 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Daniel P. Berrangé, qemu-devel, Thomas Huth,
Alex Bennée, Gerd Hoffmann, Mark Cave-Ayland,
Philippe Mathieu-Daudé, Kevin Wolf, Stefan Hajnoczi,
Alexander Graf, Paolo Bonzini, Richard Henderson, Peter Maydell
"Michael S. Tsirkin" <mst@redhat.com> writes:
> On Wed, Jun 25, 2025 at 08:46:54PM +0100, Daniel P. Berrangé wrote:
>> On Wed, Jun 25, 2025 at 03:16:52PM -0400, Michael S. Tsirkin wrote:
>> > On Mon, Jun 16, 2025 at 11:22:41AM +0200, Markus Armbruster wrote:
>> > > From: Daniel P. Berrangé <berrange@redhat.com>
>> > >
>> > > There has been an explosion of interest in so called AI code
>> > > generators. Thus far though, this is has not been matched by a broadly
>> > > accepted legal interpretation of the licensing implications for code
>> > > generator outputs. While the vendors may claim there is no problem and
>> > > a free choice of license is possible, they have an inherent conflict
>> > > of interest in promoting this interpretation. More broadly there is,
>> > > as yet, no broad consensus on the licensing implications of code
>> > > generators trained on inputs under a wide variety of licenses
>> > >
>> > > The DCO requires contributors to assert they have the right to
>> > > contribute under the designated project license. Given the lack of
>> > > consensus on the licensing of AI code generator output, it is not
>> > > considered credible to assert compliance with the DCO clause (b) or (c)
>> > > where a patch includes such generated code.
>> > >
>> > > This patch thus defines a policy that the QEMU project will currently
>> > > not accept contributions where use of AI code generators is either
>> > > known, or suspected.
>> > >
>> > > These are early days of AI-assisted software development. The legal
>> > > questions will be resolved eventually. The tools will mature, and we
>> > > can expect some to become safely usable in free software projects.
>> > > The policy we set now must be for today, and be open to revision. It's
>> > > best to start strict and safe, then relax.
>> > >
>> > > Meanwhile requests for exceptions can also be considered on a case by
>> > > case basis.
>> > >
>> > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
>> > > Reviewed-by: Kevin Wolf <kwolf@redhat.com>
>> > > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
>> > > Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
>> > > Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> >
>> > Sorry about only reacting now, was AFK.
>> >
>> > So one usecase that to me seems entirely valid, is refactoring.
>> >
>> > For example, change a function prototype, or a structure,
>> > and have an LLM update all callers.
>> >
>> > The only part of the patch that is expressive is the
>> > actual change, the rest is a technicality and has IMHO nothing to do with
>> > copyright. LLMs can just do it with no hassle.
>>
>> Well the policy is defined in terms of requirements to comply with
>> the DCO, and that implicitly indicates that the code in question
>> is eligible for copyright protection to begin with.
>>
>> IOW, if a change is such that it is not considered eligible for
>> copyright protection, then you can take the view that it is trivially
>> DCO compliant, whether you wrote the code, an arbitrary 3rd party
>> wrote the code, or whether an AI wrote the code.
>
> Exactly. I agree! However the patch states:
>
> +The QEMU project thus requires that contributors refrain from using AI content
> +generators on patches intended to be submitted to the project, and will
> +decline any contribution if use of AI is either known or suspected.
>
> and makes no exception for non copyrighteable parts of the patch.
>
> Or do I misunderstand?
>
>> > Can we soften this to only apply to expressive code?
>> >
>> > I feel a lot of cleanups would be enabled by this.
>>
>> Trying to detail every possible scenario is impractical and would
>> make the document too onerous for people to read, remember & apply.
>> It is better to leave it up to the contributor to decide whether a
>> change is non-copyrightable, than to try to draw that line crudely
>> in text. Even for refactoring that line will be fuzzy and contextual,
>> so not a scenario where we should say any use of AI for reactoring
>> is OK, as that will lull contributors into having a false sense of
>> acceptibility, rather than being aware of need to question it.
>
> Agree again! What worries me is that the patch as posted here does
> not make contributors question anything. It just flatly forbids using "AI
> content generators".
Only if you stop reading before the last paragraph :)
I agree with Daniel that trying to legislate exceptions is not going to
work. Instead, we put in this:
This policy may evolve as AI tools mature and the legal situation is
clarifed. In the meanwhile, requests for exceptions to this policy will be
evaluated by the QEMU project on a case by case basis. To be granted an
exception, a contributor will need to demonstrate clarity of the license and
copyright status for the tool's output in relation to its training model and
code, to the satisfaction of the project maintainers.
Last paragraph, i.e. a fairly prominent spot.
If you can make a convinving case that the tool's output is not
copyrightable, I like your chances of being granted an exception.
As always, if you think doc text is insufficiently clear, let's work on
improving it.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v5 3/3] docs: define policy forbidding use of AI code generators
2025-06-25 19:16 ` Michael S. Tsirkin
2025-06-25 19:46 ` Daniel P. Berrangé
@ 2025-06-25 20:38 ` Kevin Wolf
2025-06-25 20:45 ` Michael S. Tsirkin
2025-06-25 20:47 ` Stefan Hajnoczi
1 sibling, 2 replies; 21+ messages in thread
From: Kevin Wolf @ 2025-06-25 20:38 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Markus Armbruster, qemu-devel, Daniel P . Berrangé,
Thomas Huth, Alex Bennée, Gerd Hoffmann, Mark Cave-Ayland,
Philippe Mathieu-Daudé, Stefan Hajnoczi, Alexander Graf,
Paolo Bonzini, Richard Henderson, Peter Maydell
Am 25.06.2025 um 21:16 hat Michael S. Tsirkin geschrieben:
> On Mon, Jun 16, 2025 at 11:22:41AM +0200, Markus Armbruster wrote:
> > From: Daniel P. Berrangé <berrange@redhat.com>
> >
> > There has been an explosion of interest in so called AI code
> > generators. Thus far though, this is has not been matched by a broadly
> > accepted legal interpretation of the licensing implications for code
> > generator outputs. While the vendors may claim there is no problem and
> > a free choice of license is possible, they have an inherent conflict
> > of interest in promoting this interpretation. More broadly there is,
> > as yet, no broad consensus on the licensing implications of code
> > generators trained on inputs under a wide variety of licenses
> >
> > The DCO requires contributors to assert they have the right to
> > contribute under the designated project license. Given the lack of
> > consensus on the licensing of AI code generator output, it is not
> > considered credible to assert compliance with the DCO clause (b) or (c)
> > where a patch includes such generated code.
> >
> > This patch thus defines a policy that the QEMU project will currently
> > not accept contributions where use of AI code generators is either
> > known, or suspected.
> >
> > These are early days of AI-assisted software development. The legal
> > questions will be resolved eventually. The tools will mature, and we
> > can expect some to become safely usable in free software projects.
> > The policy we set now must be for today, and be open to revision. It's
> > best to start strict and safe, then relax.
> >
> > Meanwhile requests for exceptions can also be considered on a case by
> > case basis.
> >
> > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > Reviewed-by: Kevin Wolf <kwolf@redhat.com>
> > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> > Signed-off-by: Markus Armbruster <armbru@redhat.com>
>
> Sorry about only reacting now, was AFK.
>
> So one usecase that to me seems entirely valid, is refactoring.
>
> For example, change a function prototype, or a structure,
> and have an LLM update all callers.
>
> The only part of the patch that is expressive is the
> actual change, the rest is a technicality and has IMHO nothing to do with
> copyright. LLMs can just do it with no hassle.
>
>
> Can we soften this to only apply to expressive code?
>
> I feel a lot of cleanups would be enabled by this.
Hasn't refactoring been a (deterministically) solved problem long before
LLMs became capable to do the same with a good enough probability?
Kevin
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v5 3/3] docs: define policy forbidding use of AI code generators
2025-06-25 20:38 ` Kevin Wolf
@ 2025-06-25 20:45 ` Michael S. Tsirkin
2025-06-25 20:47 ` Stefan Hajnoczi
1 sibling, 0 replies; 21+ messages in thread
From: Michael S. Tsirkin @ 2025-06-25 20:45 UTC (permalink / raw)
To: Kevin Wolf
Cc: Markus Armbruster, qemu-devel, Daniel P . Berrangé,
Thomas Huth, Alex Bennée, Gerd Hoffmann, Mark Cave-Ayland,
Philippe Mathieu-Daudé, Stefan Hajnoczi, Alexander Graf,
Paolo Bonzini, Richard Henderson, Peter Maydell
On Wed, Jun 25, 2025 at 10:38:21PM +0200, Kevin Wolf wrote:
> Am 25.06.2025 um 21:16 hat Michael S. Tsirkin geschrieben:
> > On Mon, Jun 16, 2025 at 11:22:41AM +0200, Markus Armbruster wrote:
> > > From: Daniel P. Berrangé <berrange@redhat.com>
> > >
> > > There has been an explosion of interest in so called AI code
> > > generators. Thus far though, this is has not been matched by a broadly
> > > accepted legal interpretation of the licensing implications for code
> > > generator outputs. While the vendors may claim there is no problem and
> > > a free choice of license is possible, they have an inherent conflict
> > > of interest in promoting this interpretation. More broadly there is,
> > > as yet, no broad consensus on the licensing implications of code
> > > generators trained on inputs under a wide variety of licenses
> > >
> > > The DCO requires contributors to assert they have the right to
> > > contribute under the designated project license. Given the lack of
> > > consensus on the licensing of AI code generator output, it is not
> > > considered credible to assert compliance with the DCO clause (b) or (c)
> > > where a patch includes such generated code.
> > >
> > > This patch thus defines a policy that the QEMU project will currently
> > > not accept contributions where use of AI code generators is either
> > > known, or suspected.
> > >
> > > These are early days of AI-assisted software development. The legal
> > > questions will be resolved eventually. The tools will mature, and we
> > > can expect some to become safely usable in free software projects.
> > > The policy we set now must be for today, and be open to revision. It's
> > > best to start strict and safe, then relax.
> > >
> > > Meanwhile requests for exceptions can also be considered on a case by
> > > case basis.
> > >
> > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > > Reviewed-by: Kevin Wolf <kwolf@redhat.com>
> > > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> > > Signed-off-by: Markus Armbruster <armbru@redhat.com>
> >
> > Sorry about only reacting now, was AFK.
> >
> > So one usecase that to me seems entirely valid, is refactoring.
> >
> > For example, change a function prototype, or a structure,
> > and have an LLM update all callers.
> >
> > The only part of the patch that is expressive is the
> > actual change, the rest is a technicality and has IMHO nothing to do with
> > copyright. LLMs can just do it with no hassle.
> >
> >
> > Can we soften this to only apply to expressive code?
> >
> > I feel a lot of cleanups would be enabled by this.
>
> Hasn't refactoring been a (deterministically) solved problem long before
> LLMs became capable to do the same with a good enough probability?
>
> Kevin
Interesting. For example, I recently wanted to refector a bunch of bool
fields to bit flags. Know of any tool that would do it without major
pain?
--
MST
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v5 3/3] docs: define policy forbidding use of AI code generators
2025-06-25 20:38 ` Kevin Wolf
2025-06-25 20:45 ` Michael S. Tsirkin
@ 2025-06-25 20:47 ` Stefan Hajnoczi
2025-06-25 20:49 ` Michael S. Tsirkin
1 sibling, 1 reply; 21+ messages in thread
From: Stefan Hajnoczi @ 2025-06-25 20:47 UTC (permalink / raw)
To: Kevin Wolf
Cc: Michael S. Tsirkin, Markus Armbruster, qemu-devel,
Daniel P . Berrangé, Thomas Huth, Alex Bennée,
Gerd Hoffmann, Mark Cave-Ayland, Philippe Mathieu-Daudé,
Stefan Hajnoczi, Alexander Graf, Paolo Bonzini, Richard Henderson,
Peter Maydell
On Wed, Jun 25, 2025 at 4:39 PM Kevin Wolf <kwolf@redhat.com> wrote:
>
> Am 25.06.2025 um 21:16 hat Michael S. Tsirkin geschrieben:
> > On Mon, Jun 16, 2025 at 11:22:41AM +0200, Markus Armbruster wrote:
> > > From: Daniel P. Berrangé <berrange@redhat.com>
> > >
> > > There has been an explosion of interest in so called AI code
> > > generators. Thus far though, this is has not been matched by a broadly
> > > accepted legal interpretation of the licensing implications for code
> > > generator outputs. While the vendors may claim there is no problem and
> > > a free choice of license is possible, they have an inherent conflict
> > > of interest in promoting this interpretation. More broadly there is,
> > > as yet, no broad consensus on the licensing implications of code
> > > generators trained on inputs under a wide variety of licenses
> > >
> > > The DCO requires contributors to assert they have the right to
> > > contribute under the designated project license. Given the lack of
> > > consensus on the licensing of AI code generator output, it is not
> > > considered credible to assert compliance with the DCO clause (b) or (c)
> > > where a patch includes such generated code.
> > >
> > > This patch thus defines a policy that the QEMU project will currently
> > > not accept contributions where use of AI code generators is either
> > > known, or suspected.
> > >
> > > These are early days of AI-assisted software development. The legal
> > > questions will be resolved eventually. The tools will mature, and we
> > > can expect some to become safely usable in free software projects.
> > > The policy we set now must be for today, and be open to revision. It's
> > > best to start strict and safe, then relax.
> > >
> > > Meanwhile requests for exceptions can also be considered on a case by
> > > case basis.
> > >
> > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > > Reviewed-by: Kevin Wolf <kwolf@redhat.com>
> > > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> > > Signed-off-by: Markus Armbruster <armbru@redhat.com>
> >
> > Sorry about only reacting now, was AFK.
> >
> > So one usecase that to me seems entirely valid, is refactoring.
> >
> > For example, change a function prototype, or a structure,
> > and have an LLM update all callers.
> >
> > The only part of the patch that is expressive is the
> > actual change, the rest is a technicality and has IMHO nothing to do with
> > copyright. LLMs can just do it with no hassle.
> >
> >
> > Can we soften this to only apply to expressive code?
> >
> > I feel a lot of cleanups would be enabled by this.
>
> Hasn't refactoring been a (deterministically) solved problem long before
> LLMs became capable to do the same with a good enough probability?
It's easier to describe a desired refactoring to an LLM in natural
language than to figure out the regexes, semantic patches, etc needed
for traditional refactoring tools.
Also, LLMs can perform higher level refactorings that might not be
supported by traditional tools. Things like "split this interface into
callbacks that take a Foo * argument and implement the callbacks for
both a.c and b.c".
I think what Daniel mentioned is a good guide: if it's something that
you think it copyrightable, then avoid it.
Stefan
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v5 3/3] docs: define policy forbidding use of AI code generators
2025-06-25 20:47 ` Stefan Hajnoczi
@ 2025-06-25 20:49 ` Michael S. Tsirkin
2025-06-26 8:18 ` Daniel P. Berrangé
0 siblings, 1 reply; 21+ messages in thread
From: Michael S. Tsirkin @ 2025-06-25 20:49 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: Kevin Wolf, Markus Armbruster, qemu-devel,
Daniel P . Berrangé, Thomas Huth, Alex Bennée,
Gerd Hoffmann, Mark Cave-Ayland, Philippe Mathieu-Daudé,
Stefan Hajnoczi, Alexander Graf, Paolo Bonzini, Richard Henderson,
Peter Maydell
On Wed, Jun 25, 2025 at 04:47:06PM -0400, Stefan Hajnoczi wrote:
> On Wed, Jun 25, 2025 at 4:39 PM Kevin Wolf <kwolf@redhat.com> wrote:
> >
> > Am 25.06.2025 um 21:16 hat Michael S. Tsirkin geschrieben:
> > > On Mon, Jun 16, 2025 at 11:22:41AM +0200, Markus Armbruster wrote:
> > > > From: Daniel P. Berrangé <berrange@redhat.com>
> > > >
> > > > There has been an explosion of interest in so called AI code
> > > > generators. Thus far though, this is has not been matched by a broadly
> > > > accepted legal interpretation of the licensing implications for code
> > > > generator outputs. While the vendors may claim there is no problem and
> > > > a free choice of license is possible, they have an inherent conflict
> > > > of interest in promoting this interpretation. More broadly there is,
> > > > as yet, no broad consensus on the licensing implications of code
> > > > generators trained on inputs under a wide variety of licenses
> > > >
> > > > The DCO requires contributors to assert they have the right to
> > > > contribute under the designated project license. Given the lack of
> > > > consensus on the licensing of AI code generator output, it is not
> > > > considered credible to assert compliance with the DCO clause (b) or (c)
> > > > where a patch includes such generated code.
> > > >
> > > > This patch thus defines a policy that the QEMU project will currently
> > > > not accept contributions where use of AI code generators is either
> > > > known, or suspected.
> > > >
> > > > These are early days of AI-assisted software development. The legal
> > > > questions will be resolved eventually. The tools will mature, and we
> > > > can expect some to become safely usable in free software projects.
> > > > The policy we set now must be for today, and be open to revision. It's
> > > > best to start strict and safe, then relax.
> > > >
> > > > Meanwhile requests for exceptions can also be considered on a case by
> > > > case basis.
> > > >
> > > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > > > Reviewed-by: Kevin Wolf <kwolf@redhat.com>
> > > > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > > Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> > > > Signed-off-by: Markus Armbruster <armbru@redhat.com>
> > >
> > > Sorry about only reacting now, was AFK.
> > >
> > > So one usecase that to me seems entirely valid, is refactoring.
> > >
> > > For example, change a function prototype, or a structure,
> > > and have an LLM update all callers.
> > >
> > > The only part of the patch that is expressive is the
> > > actual change, the rest is a technicality and has IMHO nothing to do with
> > > copyright. LLMs can just do it with no hassle.
> > >
> > >
> > > Can we soften this to only apply to expressive code?
> > >
> > > I feel a lot of cleanups would be enabled by this.
> >
> > Hasn't refactoring been a (deterministically) solved problem long before
> > LLMs became capable to do the same with a good enough probability?
>
> It's easier to describe a desired refactoring to an LLM in natural
> language than to figure out the regexes, semantic patches, etc needed
> for traditional refactoring tools.
>
> Also, LLMs can perform higher level refactorings that might not be
> supported by traditional tools. Things like "split this interface into
> callbacks that take a Foo * argument and implement the callbacks for
> both a.c and b.c".
>
> I think what Daniel mentioned is a good guide: if it's something that
> you think it copyrightable, then avoid it.
>
> Stefan
Right. Let's put that in the doc?
--
MST
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v5 3/3] docs: define policy forbidding use of AI code generators
2025-06-25 20:49 ` Michael S. Tsirkin
@ 2025-06-26 8:18 ` Daniel P. Berrangé
2025-06-26 8:38 ` Michael S. Tsirkin
0 siblings, 1 reply; 21+ messages in thread
From: Daniel P. Berrangé @ 2025-06-26 8:18 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Stefan Hajnoczi, Kevin Wolf, Markus Armbruster, qemu-devel,
Thomas Huth, Alex Bennée, Gerd Hoffmann, Mark Cave-Ayland,
Philippe Mathieu-Daudé, Stefan Hajnoczi, Alexander Graf,
Paolo Bonzini, Richard Henderson, Peter Maydell
On Wed, Jun 25, 2025 at 04:49:17PM -0400, Michael S. Tsirkin wrote:
> On Wed, Jun 25, 2025 at 04:47:06PM -0400, Stefan Hajnoczi wrote:
> > On Wed, Jun 25, 2025 at 4:39 PM Kevin Wolf <kwolf@redhat.com> wrote:
> > >
> > > Am 25.06.2025 um 21:16 hat Michael S. Tsirkin geschrieben:
> > > > On Mon, Jun 16, 2025 at 11:22:41AM +0200, Markus Armbruster wrote:
> > > > > From: Daniel P. Berrangé <berrange@redhat.com>
> > > > >
> > > > > There has been an explosion of interest in so called AI code
> > > > > generators. Thus far though, this is has not been matched by a broadly
> > > > > accepted legal interpretation of the licensing implications for code
> > > > > generator outputs. While the vendors may claim there is no problem and
> > > > > a free choice of license is possible, they have an inherent conflict
> > > > > of interest in promoting this interpretation. More broadly there is,
> > > > > as yet, no broad consensus on the licensing implications of code
> > > > > generators trained on inputs under a wide variety of licenses
> > > > >
> > > > > The DCO requires contributors to assert they have the right to
> > > > > contribute under the designated project license. Given the lack of
> > > > > consensus on the licensing of AI code generator output, it is not
> > > > > considered credible to assert compliance with the DCO clause (b) or (c)
> > > > > where a patch includes such generated code.
> > > > >
> > > > > This patch thus defines a policy that the QEMU project will currently
> > > > > not accept contributions where use of AI code generators is either
> > > > > known, or suspected.
> > > > >
> > > > > These are early days of AI-assisted software development. The legal
> > > > > questions will be resolved eventually. The tools will mature, and we
> > > > > can expect some to become safely usable in free software projects.
> > > > > The policy we set now must be for today, and be open to revision. It's
> > > > > best to start strict and safe, then relax.
> > > > >
> > > > > Meanwhile requests for exceptions can also be considered on a case by
> > > > > case basis.
> > > > >
> > > > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > > > > Reviewed-by: Kevin Wolf <kwolf@redhat.com>
> > > > > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > > > Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> > > > > Signed-off-by: Markus Armbruster <armbru@redhat.com>
> > > >
> > > > Sorry about only reacting now, was AFK.
> > > >
> > > > So one usecase that to me seems entirely valid, is refactoring.
> > > >
> > > > For example, change a function prototype, or a structure,
> > > > and have an LLM update all callers.
> > > >
> > > > The only part of the patch that is expressive is the
> > > > actual change, the rest is a technicality and has IMHO nothing to do with
> > > > copyright. LLMs can just do it with no hassle.
> > > >
> > > >
> > > > Can we soften this to only apply to expressive code?
> > > >
> > > > I feel a lot of cleanups would be enabled by this.
> > >
> > > Hasn't refactoring been a (deterministically) solved problem long before
> > > LLMs became capable to do the same with a good enough probability?
> >
> > It's easier to describe a desired refactoring to an LLM in natural
> > language than to figure out the regexes, semantic patches, etc needed
> > for traditional refactoring tools.
> >
> > Also, LLMs can perform higher level refactorings that might not be
> > supported by traditional tools. Things like "split this interface into
> > callbacks that take a Foo * argument and implement the callbacks for
> > both a.c and b.c".
> >
> > I think what Daniel mentioned is a good guide: if it's something that
> > you think it copyrightable, then avoid it.
>
> Right. Let's put that in the doc?
In terms of mitigating risk I think it is better to avoid saying that
explicitly, and be seen to actively encourage acceptance of AI generated
code. The boundary between copyrightable and non-copyrightable code is
always pretty fuzzy and a matter of differing opinions.
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v5 3/3] docs: define policy forbidding use of AI code generators
2025-06-26 8:18 ` Daniel P. Berrangé
@ 2025-06-26 8:38 ` Michael S. Tsirkin
0 siblings, 0 replies; 21+ messages in thread
From: Michael S. Tsirkin @ 2025-06-26 8:38 UTC (permalink / raw)
To: Daniel P. Berrangé
Cc: Stefan Hajnoczi, Kevin Wolf, Markus Armbruster, qemu-devel,
Thomas Huth, Alex Bennée, Gerd Hoffmann, Mark Cave-Ayland,
Philippe Mathieu-Daudé, Stefan Hajnoczi, Alexander Graf,
Paolo Bonzini, Richard Henderson, Peter Maydell
On Thu, Jun 26, 2025 at 09:18:22AM +0100, Daniel P. Berrangé wrote:
> On Wed, Jun 25, 2025 at 04:49:17PM -0400, Michael S. Tsirkin wrote:
> > On Wed, Jun 25, 2025 at 04:47:06PM -0400, Stefan Hajnoczi wrote:
> > > On Wed, Jun 25, 2025 at 4:39 PM Kevin Wolf <kwolf@redhat.com> wrote:
> > > >
> > > > Am 25.06.2025 um 21:16 hat Michael S. Tsirkin geschrieben:
> > > > > On Mon, Jun 16, 2025 at 11:22:41AM +0200, Markus Armbruster wrote:
> > > > > > From: Daniel P. Berrangé <berrange@redhat.com>
> > > > > >
> > > > > > There has been an explosion of interest in so called AI code
> > > > > > generators. Thus far though, this is has not been matched by a broadly
> > > > > > accepted legal interpretation of the licensing implications for code
> > > > > > generator outputs. While the vendors may claim there is no problem and
> > > > > > a free choice of license is possible, they have an inherent conflict
> > > > > > of interest in promoting this interpretation. More broadly there is,
> > > > > > as yet, no broad consensus on the licensing implications of code
> > > > > > generators trained on inputs under a wide variety of licenses
> > > > > >
> > > > > > The DCO requires contributors to assert they have the right to
> > > > > > contribute under the designated project license. Given the lack of
> > > > > > consensus on the licensing of AI code generator output, it is not
> > > > > > considered credible to assert compliance with the DCO clause (b) or (c)
> > > > > > where a patch includes such generated code.
> > > > > >
> > > > > > This patch thus defines a policy that the QEMU project will currently
> > > > > > not accept contributions where use of AI code generators is either
> > > > > > known, or suspected.
> > > > > >
> > > > > > These are early days of AI-assisted software development. The legal
> > > > > > questions will be resolved eventually. The tools will mature, and we
> > > > > > can expect some to become safely usable in free software projects.
> > > > > > The policy we set now must be for today, and be open to revision. It's
> > > > > > best to start strict and safe, then relax.
> > > > > >
> > > > > > Meanwhile requests for exceptions can also be considered on a case by
> > > > > > case basis.
> > > > > >
> > > > > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > > > > > Reviewed-by: Kevin Wolf <kwolf@redhat.com>
> > > > > > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > > > > Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> > > > > > Signed-off-by: Markus Armbruster <armbru@redhat.com>
> > > > >
> > > > > Sorry about only reacting now, was AFK.
> > > > >
> > > > > So one usecase that to me seems entirely valid, is refactoring.
> > > > >
> > > > > For example, change a function prototype, or a structure,
> > > > > and have an LLM update all callers.
> > > > >
> > > > > The only part of the patch that is expressive is the
> > > > > actual change, the rest is a technicality and has IMHO nothing to do with
> > > > > copyright. LLMs can just do it with no hassle.
> > > > >
> > > > >
> > > > > Can we soften this to only apply to expressive code?
> > > > >
> > > > > I feel a lot of cleanups would be enabled by this.
> > > >
> > > > Hasn't refactoring been a (deterministically) solved problem long before
> > > > LLMs became capable to do the same with a good enough probability?
> > >
> > > It's easier to describe a desired refactoring to an LLM in natural
> > > language than to figure out the regexes, semantic patches, etc needed
> > > for traditional refactoring tools.
> > >
> > > Also, LLMs can perform higher level refactorings that might not be
> > > supported by traditional tools. Things like "split this interface into
> > > callbacks that take a Foo * argument and implement the callbacks for
> > > both a.c and b.c".
> > >
> > > I think what Daniel mentioned is a good guide: if it's something that
> > > you think it copyrightable, then avoid it.
> >
> > Right. Let's put that in the doc?
>
> In terms of mitigating risk I think it is better to avoid saying that
> explicitly, and be seen to actively encourage acceptance of AI generated
> code. The boundary between copyrightable and non-copyrightable code is
> always pretty fuzzy and a matter of differing opinions.
>
> With regards,
> Daniel
Well fuzzy is not what this doc does...
--
MST
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v5 3/3] docs: define policy forbidding use of AI code generators
2025-06-16 9:22 ` [PATCH v5 3/3] docs: define policy forbidding use of AI code generators Markus Armbruster
2025-06-25 19:16 ` Michael S. Tsirkin
@ 2025-06-26 6:34 ` Michael S. Tsirkin
2025-06-26 7:56 ` Daniel P. Berrangé
1 sibling, 1 reply; 21+ messages in thread
From: Michael S. Tsirkin @ 2025-06-26 6:34 UTC (permalink / raw)
To: Markus Armbruster
Cc: qemu-devel, Daniel P . Berrangé, Thomas Huth,
Alex Bennée, Gerd Hoffmann, Mark Cave-Ayland,
Philippe Mathieu-Daudé, Kevin Wolf, Stefan Hajnoczi,
Alexander Graf, Paolo Bonzini, Richard Henderson, Peter Maydell
On Mon, Jun 16, 2025 at 11:22:41AM +0200, Markus Armbruster wrote:
> +The QEMU project thus requires that contributors refrain from using AI content
> +generators on patches intended to be submitted to the project, and will
> +decline any contribution if use of AI is either known or suspected.
What is this suspected thing by the way? Suspected by whom? You do not
think this is draconian?
--
MST
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v5 3/3] docs: define policy forbidding use of AI code generators
2025-06-26 6:34 ` Michael S. Tsirkin
@ 2025-06-26 7:56 ` Daniel P. Berrangé
0 siblings, 0 replies; 21+ messages in thread
From: Daniel P. Berrangé @ 2025-06-26 7:56 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Markus Armbruster, qemu-devel, Thomas Huth, Alex Bennée,
Gerd Hoffmann, Mark Cave-Ayland, Philippe Mathieu-Daudé,
Kevin Wolf, Stefan Hajnoczi, Alexander Graf, Paolo Bonzini,
Richard Henderson, Peter Maydell
On Thu, Jun 26, 2025 at 02:34:57AM -0400, Michael S. Tsirkin wrote:
> On Mon, Jun 16, 2025 at 11:22:41AM +0200, Markus Armbruster wrote:
> > +The QEMU project thus requires that contributors refrain from using AI content
> > +generators on patches intended to be submitted to the project, and will
> > +decline any contribution if use of AI is either known or suspected.
>
> What is this suspected thing by the way? Suspected by whom? You do not
> think this is draconian?
Suspected as in, as a reviewer you see obvious signs of LLM slop and
or hallucinations in the contributions, while the contributor has not
declared such.
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v5 0/3] docs: define policy forbidding use of "AI" / LLM code generators
2025-06-16 9:22 [PATCH v5 0/3] docs: define policy forbidding use of "AI" / LLM code generators Markus Armbruster
` (2 preceding siblings ...)
2025-06-16 9:22 ` [PATCH v5 3/3] docs: define policy forbidding use of AI code generators Markus Armbruster
@ 2025-06-23 19:30 ` Stefan Hajnoczi
2025-06-23 22:25 ` Alex Bennée
2025-06-24 17:33 ` Stefan Hajnoczi
4 siblings, 1 reply; 21+ messages in thread
From: Stefan Hajnoczi @ 2025-06-23 19:30 UTC (permalink / raw)
To: Markus Armbruster
Cc: qemu-devel, Daniel P . Berrangé, Thomas Huth,
Alex Bennée, Michael S . Tsirkin, Gerd Hoffmann,
Mark Cave-Ayland, Philippe Mathieu-Daudé, Kevin Wolf,
Stefan Hajnoczi, Alexander Graf, Paolo Bonzini, Richard Henderson,
Peter Maydell
On Mon, Jun 16, 2025 at 5:27 AM Markus Armbruster <armbru@redhat.com> wrote:
>
> More than a year ago, Daniel posted patches to put an AI policy in
> writing. Reception was mostly positive. A v2 to address feedback
> followed with some delay. But no pull request.
>
> I asked Daniel why, and he told me he was concerned it might go too
> far in its interpretation of the DCO requirements. After a bit of
> discussion, I think Daniel's text is basically fine. The policy it
> describes is simple and strict. Relaxing policy is easier than
> tightening it. I softened the phrasing slightly, addressed open
> review comments, and fixed a few minor things I found myself.
>
> Here's Daniel's cover letter for v2:
>
> This patch kicks the hornet's nest of AI / LLM code generators.
>
> With the increasing interest in code generators in recent times,
> it is inevitable that QEMU contributions will include AI generated
> code. Thus far we have remained silent on the matter. Given that
> everyone knows these tools exist, our current position has to be
> considered tacit acceptance of the use of AI generated code in QEMU.
>
> The question for the project is whether that is a good position for
> QEMU to take or not ?
>
> IANAL, but I like to think I'm reasonably proficient at understanding
> open source licensing. I am not inherantly against the use of AI tools,
> rather I am anti-risk. I also want to see OSS licenses respected and
> complied with.
>
> AFAICT at its current state of (im)maturity the question of licensing
> of AI code generator output does not have a broadly accepted / settled
> legal position. This is an inherant bias/self-interest from the vendors
> promoting their usage, who tend to minimize/dismiss the legal questions.
> >From my POV, this puts such tools in a position of elevated legal risk.
>
> Given the fuzziness over the legal position of generated code from
> such tools, I don't consider it credible (today) for a contributor
> to assert compliance with the DCO terms (b) or (c) (which is a stated
> pre-requisite for QEMU accepting patches) when a patch includes (or is
> derived from) AI generated code.
>
> By implication, I think that QEMU must (for now) explicitly decline
> to (knowingly) accept AI generated code.
>
> Perhaps a few years down the line the legal uncertainty will have
> reduced and we can re-evaluate this policy.
>
> Discuss...
Any final comments before I merge this?
Stefan
>
> Changes in v4 [Markus Armbruster]:
> * PATCH 2:
> - Drop "follow a deterministic process" clause [Peter]
>
> Changes in v4 [Markus Armbruster]:
> * PATCH 1:
> - Revert v3's "known identity", and instead move existing paragraph
> from submitting-a-patch.rst to code-provenance.rst [Philippe]
> - Add a paragraph on recording maintainer modifications [Alex]
> * PATCH 3:
> - Talk about "AI-assisted software development", "AI content
> generators", and "content", not just "AI code generators" and
> "code" [Stefan, Daniel]
> - Fix spelling of Copilot, and mention Claude [Stefan]
> - Fix link text for reference to the DCO
> - Reiterate the policy does not apply to other uses of AI [Stefan,
> Daniel]
> - Add agents to the examples of tools impacted by the policy
> [Daniel]
>
> Changes in v3 [Markus Armbruster]:
>
> * PATCH 1:
> - Require "known identity" (phrasing stolen from Linux kernel docs)
> [Peter]
> - Clarify use of multiple addresses [Michael]
> - Improve markup
> - Fix a few misspellings
> - Left for later: explain our use of Message-Id: [Alex]
> * PATCH 2:
> - Minor phrasing tweaks and spelling fixes
> * PATCH 3:
> - Don't claim DCO compliance is currently impossible, do point out
> it's unclear how, and that we consider the legal risk not
> acceptable.
> - Stress that the policy is open to revision some more by adding
> "as AI tools mature". Also rephrase the commit message.
> - Improve markup
>
> Changes in v2 [Daniel Berrangé]:
>
> * Fix a huge number of typos in docs
> * Clarify that maintainers should still add R-b where relevant, even
> if they are already adding their own S-oB.
> * Clarify situation when contributor re-starts previously abandoned
> work from another contributor.
> * Add info about Suggested-by tag
> * Add new docs section dealing with the broad topic of "generated
> files" (whether code generators or compilers)
> * Simplify the section related to prohibition of AI generated files
> and give further examples of tools considered covered
> * Remove repeated references to "LLM" as a specific technology, just
> use the broad "AI" term, except for one use of LLM as an example.
> * Add note that the policy may evolve if the legal clarity improves
> * Add note that exceptions can be requested on case-by-case basis
> if contributor thinks they can demonstrate a credible copyright
> and licensing status
>
> Daniel P. Berrangé (3):
> docs: introduce dedicated page about code provenance / sign-off
> docs: define policy limiting the inclusion of generated files
> docs: define policy forbidding use of AI code generators
>
> docs/devel/code-provenance.rst | 338 ++++++++++++++++++++++++++++++
> docs/devel/index-process.rst | 1 +
> docs/devel/submitting-a-patch.rst | 23 +-
> 3 files changed, 341 insertions(+), 21 deletions(-)
> create mode 100644 docs/devel/code-provenance.rst
>
> --
> 2.49.0
>
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v5 0/3] docs: define policy forbidding use of "AI" / LLM code generators
2025-06-23 19:30 ` [PATCH v5 0/3] docs: define policy forbidding use of "AI" / LLM " Stefan Hajnoczi
@ 2025-06-23 22:25 ` Alex Bennée
2025-06-24 5:02 ` Markus Armbruster
0 siblings, 1 reply; 21+ messages in thread
From: Alex Bennée @ 2025-06-23 22:25 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: Markus Armbruster, qemu-devel, Daniel P . Berrangé,
Thomas Huth, Michael S . Tsirkin, Gerd Hoffmann, Mark Cave-Ayland,
Philippe Mathieu-Daudé, Kevin Wolf, Stefan Hajnoczi,
Alexander Graf, Paolo Bonzini, Richard Henderson, Peter Maydell
Stefan Hajnoczi <stefanha@gmail.com> writes:
> On Mon, Jun 16, 2025 at 5:27 AM Markus Armbruster <armbru@redhat.com> wrote:
>>
>> More than a year ago, Daniel posted patches to put an AI policy in
>> writing. Reception was mostly positive. A v2 to address feedback
>> followed with some delay. But no pull request.
>>
>> I asked Daniel why, and he told me he was concerned it might go too
>> far in its interpretation of the DCO requirements. After a bit of
>> discussion, I think Daniel's text is basically fine. The policy it
>> describes is simple and strict. Relaxing policy is easier than
>> tightening it. I softened the phrasing slightly, addressed open
>> review comments, and fixed a few minor things I found myself.
>>
>> Here's Daniel's cover letter for v2:
>>
>> This patch kicks the hornet's nest of AI / LLM code generators.
>>
>> With the increasing interest in code generators in recent times,
>> it is inevitable that QEMU contributions will include AI generated
>> code. Thus far we have remained silent on the matter. Given that
>> everyone knows these tools exist, our current position has to be
>> considered tacit acceptance of the use of AI generated code in QEMU.
>>
>> The question for the project is whether that is a good position for
>> QEMU to take or not ?
>>
>> IANAL, but I like to think I'm reasonably proficient at understanding
>> open source licensing. I am not inherantly against the use of AI tools,
>> rather I am anti-risk. I also want to see OSS licenses respected and
>> complied with.
>>
>> AFAICT at its current state of (im)maturity the question of licensing
>> of AI code generator output does not have a broadly accepted / settled
>> legal position. This is an inherant bias/self-interest from the vendors
>> promoting their usage, who tend to minimize/dismiss the legal questions.
>> >From my POV, this puts such tools in a position of elevated legal risk.
>>
>> Given the fuzziness over the legal position of generated code from
>> such tools, I don't consider it credible (today) for a contributor
>> to assert compliance with the DCO terms (b) or (c) (which is a stated
>> pre-requisite for QEMU accepting patches) when a patch includes (or is
>> derived from) AI generated code.
>>
>> By implication, I think that QEMU must (for now) explicitly decline
>> to (knowingly) accept AI generated code.
>>
>> Perhaps a few years down the line the legal uncertainty will have
>> reduced and we can re-evaluate this policy.
>>
>> Discuss...
>
> Any final comments before I merge this?
It's well reviewed lets get it merged.
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v5 0/3] docs: define policy forbidding use of "AI" / LLM code generators
2025-06-23 22:25 ` Alex Bennée
@ 2025-06-24 5:02 ` Markus Armbruster
2025-06-24 10:41 ` Stefan Hajnoczi
0 siblings, 1 reply; 21+ messages in thread
From: Markus Armbruster @ 2025-06-24 5:02 UTC (permalink / raw)
To: Alex Bennée
Cc: Stefan Hajnoczi, qemu-devel, Daniel P . Berrangé,
Thomas Huth, Michael S . Tsirkin, Gerd Hoffmann, Mark Cave-Ayland,
Philippe Mathieu-Daudé, Kevin Wolf, Stefan Hajnoczi,
Alexander Graf, Paolo Bonzini, Richard Henderson, Peter Maydell
Alex Bennée <alex.bennee@linaro.org> writes:
> Stefan Hajnoczi <stefanha@gmail.com> writes:
>
>> Any final comments before I merge this?
>
> It's well reviewed lets get it merged.
Stefan, would you like a PR from me?
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v5 0/3] docs: define policy forbidding use of "AI" / LLM code generators
2025-06-24 5:02 ` Markus Armbruster
@ 2025-06-24 10:41 ` Stefan Hajnoczi
0 siblings, 0 replies; 21+ messages in thread
From: Stefan Hajnoczi @ 2025-06-24 10:41 UTC (permalink / raw)
To: Markus Armbruster
Cc: Alex Bennée, qemu-devel, Daniel P . Berrangé,
Thomas Huth, Michael S . Tsirkin, Gerd Hoffmann, Mark Cave-Ayland,
Philippe Mathieu-Daudé, Kevin Wolf, Stefan Hajnoczi,
Alexander Graf, Paolo Bonzini, Richard Henderson, Peter Maydell
On Tue, Jun 24, 2025 at 1:02 AM Markus Armbruster <armbru@redhat.com> wrote:
>
> Alex Bennée <alex.bennee@linaro.org> writes:
>
> > Stefan Hajnoczi <stefanha@gmail.com> writes:
> >
> >> Any final comments before I merge this?
> >
> > It's well reviewed lets get it merged.
>
> Stefan, would you like a PR from me?
No, that won't be necessary. I will merge the series directly.
Stefan
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v5 0/3] docs: define policy forbidding use of "AI" / LLM code generators
2025-06-16 9:22 [PATCH v5 0/3] docs: define policy forbidding use of "AI" / LLM code generators Markus Armbruster
` (3 preceding siblings ...)
2025-06-23 19:30 ` [PATCH v5 0/3] docs: define policy forbidding use of "AI" / LLM " Stefan Hajnoczi
@ 2025-06-24 17:33 ` Stefan Hajnoczi
4 siblings, 0 replies; 21+ messages in thread
From: Stefan Hajnoczi @ 2025-06-24 17:33 UTC (permalink / raw)
To: Markus Armbruster
Cc: qemu-devel, Daniel P . Berrangé, Thomas Huth,
Alex Bennée, Michael S . Tsirkin, Gerd Hoffmann,
Mark Cave-Ayland, Philippe Mathieu-Daudé, Kevin Wolf,
Alexander Graf, Paolo Bonzini, Richard Henderson, Peter Maydell
[-- Attachment #1: Type: text/plain, Size: 5626 bytes --]
On Mon, Jun 16, 2025 at 11:22:38AM +0200, Markus Armbruster wrote:
> More than a year ago, Daniel posted patches to put an AI policy in
> writing. Reception was mostly positive. A v2 to address feedback
> followed with some delay. But no pull request.
>
> I asked Daniel why, and he told me he was concerned it might go too
> far in its interpretation of the DCO requirements. After a bit of
> discussion, I think Daniel's text is basically fine. The policy it
> describes is simple and strict. Relaxing policy is easier than
> tightening it. I softened the phrasing slightly, addressed open
> review comments, and fixed a few minor things I found myself.
>
> Here's Daniel's cover letter for v2:
>
> This patch kicks the hornet's nest of AI / LLM code generators.
>
> With the increasing interest in code generators in recent times,
> it is inevitable that QEMU contributions will include AI generated
> code. Thus far we have remained silent on the matter. Given that
> everyone knows these tools exist, our current position has to be
> considered tacit acceptance of the use of AI generated code in QEMU.
>
> The question for the project is whether that is a good position for
> QEMU to take or not ?
>
> IANAL, but I like to think I'm reasonably proficient at understanding
> open source licensing. I am not inherantly against the use of AI tools,
> rather I am anti-risk. I also want to see OSS licenses respected and
> complied with.
>
> AFAICT at its current state of (im)maturity the question of licensing
> of AI code generator output does not have a broadly accepted / settled
> legal position. This is an inherant bias/self-interest from the vendors
> promoting their usage, who tend to minimize/dismiss the legal questions.
> >From my POV, this puts such tools in a position of elevated legal risk.
>
> Given the fuzziness over the legal position of generated code from
> such tools, I don't consider it credible (today) for a contributor
> to assert compliance with the DCO terms (b) or (c) (which is a stated
> pre-requisite for QEMU accepting patches) when a patch includes (or is
> derived from) AI generated code.
>
> By implication, I think that QEMU must (for now) explicitly decline
> to (knowingly) accept AI generated code.
>
> Perhaps a few years down the line the legal uncertainty will have
> reduced and we can re-evaluate this policy.
>
> Discuss...
>
> Changes in v4 [Markus Armbruster]:
> * PATCH 2:
> - Drop "follow a deterministic process" clause [Peter]
>
> Changes in v4 [Markus Armbruster]:
> * PATCH 1:
> - Revert v3's "known identity", and instead move existing paragraph
> from submitting-a-patch.rst to code-provenance.rst [Philippe]
> - Add a paragraph on recording maintainer modifications [Alex]
> * PATCH 3:
> - Talk about "AI-assisted software development", "AI content
> generators", and "content", not just "AI code generators" and
> "code" [Stefan, Daniel]
> - Fix spelling of Copilot, and mention Claude [Stefan]
> - Fix link text for reference to the DCO
> - Reiterate the policy does not apply to other uses of AI [Stefan,
> Daniel]
> - Add agents to the examples of tools impacted by the policy
> [Daniel]
>
> Changes in v3 [Markus Armbruster]:
>
> * PATCH 1:
> - Require "known identity" (phrasing stolen from Linux kernel docs)
> [Peter]
> - Clarify use of multiple addresses [Michael]
> - Improve markup
> - Fix a few misspellings
> - Left for later: explain our use of Message-Id: [Alex]
> * PATCH 2:
> - Minor phrasing tweaks and spelling fixes
> * PATCH 3:
> - Don't claim DCO compliance is currently impossible, do point out
> it's unclear how, and that we consider the legal risk not
> acceptable.
> - Stress that the policy is open to revision some more by adding
> "as AI tools mature". Also rephrase the commit message.
> - Improve markup
>
> Changes in v2 [Daniel Berrangé]:
>
> * Fix a huge number of typos in docs
> * Clarify that maintainers should still add R-b where relevant, even
> if they are already adding their own S-oB.
> * Clarify situation when contributor re-starts previously abandoned
> work from another contributor.
> * Add info about Suggested-by tag
> * Add new docs section dealing with the broad topic of "generated
> files" (whether code generators or compilers)
> * Simplify the section related to prohibition of AI generated files
> and give further examples of tools considered covered
> * Remove repeated references to "LLM" as a specific technology, just
> use the broad "AI" term, except for one use of LLM as an example.
> * Add note that the policy may evolve if the legal clarity improves
> * Add note that exceptions can be requested on case-by-case basis
> if contributor thinks they can demonstrate a credible copyright
> and licensing status
>
> Daniel P. Berrangé (3):
> docs: introduce dedicated page about code provenance / sign-off
> docs: define policy limiting the inclusion of generated files
> docs: define policy forbidding use of AI code generators
>
> docs/devel/code-provenance.rst | 338 ++++++++++++++++++++++++++++++
> docs/devel/index-process.rst | 1 +
> docs/devel/submitting-a-patch.rst | 23 +-
> 3 files changed, 341 insertions(+), 21 deletions(-)
> create mode 100644 docs/devel/code-provenance.rst
>
> --
> 2.49.0
>
Thanks, applied:
https://gitlab.com/qemu-project/qemu/-/commits/master
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread