[PATCH v3 0/3] docs: define policy forbidding use of "AI" / LLM code generators

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3 0/3] docs: define policy forbidding use of "AI" / LLM code generators
@ 2025-06-03 14:25 Markus Armbruster
  2025-06-03 14:25 ` [PATCH v3 1/3] docs: introduce dedicated page about code provenance / sign-off Markus Armbruster
                   ` (3 more replies)
  0 siblings, 4 replies; 29+ messages in thread
From: Markus Armbruster @ 2025-06-03 14:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P . Berrangé, Thomas Huth, Alex Bennée,
	Michael S . Tsirkin, Gerd Hoffmann, Mark Cave-Ayland,
	Philippe Mathieu-Daudé, Kevin Wolf, Stefan Hajnoczi,
	Alexander Graf, Paolo Bonzini, Richard Henderson, Peter Maydell

More than a year ago, Daniel posted patches to put an AI policy in
writing.  Reception was mostly positive.  A v2 to address feedback
followed with some delay.  But no pull request.

I asked Daniel why, and he told me he was concerned it might go too
far in its interpretation of the DCO requirements.  After a bit of
discussion, I think Daniel's text is basically fine.  The policy it
describes is simple and strict.  Relaxing policy is easier than
tightening it.  I softened the phrasing slightly, addressed open
review comments, and fixed a few minor things I found myself.

Here's Daniel's cover letter for v2:

This patch kicks the hornet's nest of AI / LLM code generators.

With the increasing interest in code generators in recent times,
it is inevitable that QEMU contributions will include AI generated
code. Thus far we have remained silent on the matter. Given that
everyone knows these tools exist, our current position has to be
considered tacit acceptance of the use of AI generated code in QEMU.

The question for the project is whether that is a good position for
QEMU to take or not ?

IANAL, but I like to think I'm reasonably proficient at understanding
open source licensing. I am not inherantly against the use of AI tools,
rather I am anti-risk. I also want to see OSS licenses respected and
complied with.

AFAICT at its current state of (im)maturity the question of licensing
of AI code generator output does not have a broadly accepted / settled
legal position. This is an inherant bias/self-interest from the vendors
promoting their usage, who tend to minimize/dismiss the legal questions.
>From my POV, this puts such tools in a position of elevated legal risk.

Given the fuzziness over the legal position of generated code from
such tools, I don't consider it credible (today) for a contributor
to assert compliance with the DCO terms (b) or (c) (which is a stated
pre-requisite for QEMU accepting patches) when a patch includes (or is
derived from) AI generated code.

By implication, I think that QEMU must (for now) explicitly decline
to (knowingly) accept AI generated code.

Perhaps a few years down the line the legal uncertainty will have
reduced and we can re-evaluate this policy.

Discuss...

Changes in v3 [Markus Armbruster]:

 * PATCH 1:
   - Require "known identify" (phrasing stolen from Linux kernel docs)
     [Peter]
   - Clarify use of multiple addresses [Michael]
   - Improve markup
   - Fix a few misspellings
   - Left for later: explain our use of Message-Id: [Alex]
 * PATCH 2:
   - Minor phrasing tweaks and spelling fixes
 * PATCH 3:
   - Don't claim DCO compliance is currently impossible, do point out
     it's unclear how, and that we consider the legal risk not
     acceptable.
   - Stress that the policy is open to revision some more by adding
     "as AI tools mature".  Also rephrase the commit message.
   - Improve markup

Changes in v2 [Daniel Berrangé]:

 * Fix a huge number of typos in docs
 * Clarify that maintainers should still add R-b where relevant, even
   if they are already adding their own S-oB.
 * Clarify situation when contributor re-starts previously abandoned
   work from another contributor.
 * Add info about Suggested-by tag
 * Add new docs section dealing with the broad topic of "generated
   files" (whether code generators or compilers)
 * Simplify the section related to prohibition of AI generated files
   and give further examples of tools considered covered
 * Remove repeated references to "LLM" as a specific technology, just
   use the broad "AI" term, except for one use of LLM as an example.
 * Add note that the policy may evolve if the legal clarity improves
 * Add note that exceptions can be requested on case-by-case basis
   if contributor thinks they can demonstrate a credible copyright
   and licensing status

Daniel P. Berrangé (3):
  docs: introduce dedicated page about code provenance / sign-off
  docs: define policy limiting the inclusion of generated files
  docs: define policy forbidding use of AI code generators

 docs/devel/code-provenance.rst    | 321 ++++++++++++++++++++++++++++++
 docs/devel/index-process.rst      |   1 +
 docs/devel/submitting-a-patch.rst |  18 +-
 3 files changed, 324 insertions(+), 16 deletions(-)
 create mode 100644 docs/devel/code-provenance.rst

-- 
2.48.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v3 1/3] docs: introduce dedicated page about code provenance / sign-off
  2025-06-03 14:25 [PATCH v3 0/3] docs: define policy forbidding use of "AI" / LLM code generators Markus Armbruster
@ 2025-06-03 14:25 ` Markus Armbruster
  2025-06-03 16:53   ` Alex Bennée
  2025-06-03 14:25 ` [PATCH v3 2/3] docs: define policy limiting the inclusion of generated files Markus Armbruster
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 29+ messages in thread
From: Markus Armbruster @ 2025-06-03 14:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P . Berrangé, Thomas Huth, Alex Bennée,
	Michael S . Tsirkin, Gerd Hoffmann, Mark Cave-Ayland,
	Philippe Mathieu-Daudé, Kevin Wolf, Stefan Hajnoczi,
	Alexander Graf, Paolo Bonzini, Richard Henderson, Peter Maydell

From: Daniel P. Berrangé <berrange@redhat.com>

Currently we have a short paragraph saying that patches must include
a Signed-off-by line, and merely link to the kernel documentation.
The linked kernel docs have a lot of content beyond the part about
sign-off an thus are misleading/distracting to QEMU contributors.

This introduces a dedicated 'code-provenance' page in QEMU talking
about why we require sign-off, explaining the other tags we commonly
use, and what to do in some edge cases.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 docs/devel/code-provenance.rst    | 218 ++++++++++++++++++++++++++++++
 docs/devel/index-process.rst      |   1 +
 docs/devel/submitting-a-patch.rst |  18 +--
 3 files changed, 221 insertions(+), 16 deletions(-)
 create mode 100644 docs/devel/code-provenance.rst

diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
new file mode 100644
index 0000000000..4fc12061b5
--- /dev/null
+++ b/docs/devel/code-provenance.rst
@@ -0,0 +1,218 @@
+.. _code-provenance:
+
+Code provenance
+===============
+
+Certifying patch submissions
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The QEMU community **mandates** all contributors to certify provenance of
+patch submissions they make to the project. To put it another way,
+contributors must indicate that they are legally permitted to contribute to
+the project.
+
+Certification is achieved with a low overhead by adding a single line to the
+bottom of every git commit::
+
+   Signed-off-by: YOUR NAME <YOUR@EMAIL>
+
+using a known identity (sorry, no anonymous contributions.)
+
+The addition of this line asserts that the author of the patch is contributing
+in accordance with the clauses specified in the
+`Developer's Certificate of Origin <https://developercertificate.org>`__:
+
+.. _dco:
+
+  Developer's Certificate of Origin 1.1
+
+  By making a contribution to this project, I certify that:
+
+  (a) The contribution was created in whole or in part by me and I
+      have the right to submit it under the open source license
+      indicated in the file; or
+
+  (b) The contribution is based upon previous work that, to the best
+      of my knowledge, is covered under an appropriate open source
+      license and I have the right under that license to submit that
+      work with modifications, whether created in whole or in part
+      by me, under the same open source license (unless I am
+      permitted to submit under a different license), as indicated
+      in the file; or
+
+  (c) The contribution was provided directly to me by some other
+      person who certified (a), (b) or (c) and I have not modified
+      it.
+
+  (d) I understand and agree that this project and the contribution
+      are public and that a record of the contribution (including all
+      personal information I submit with it, including my sign-off) is
+      maintained indefinitely and may be redistributed consistent with
+      this project or the open source license(s) involved.
+
+It is generally expected that the name and email addresses used in one of the
+``Signed-off-by`` lines, matches that of the git commit ``Author`` field.
+It's okay if you subscribe or contribute to the list via more than one
+address, but using multiple addresses in one commit just confuses
+things.
+
+If the person sending the mail is not one of the patch authors, they are
+nonetheless expected to add their own ``Signed-off-by`` to comply with the
+DCO clause (c).
+
+Multiple authorship
+~~~~~~~~~~~~~~~~~~~
+
+It is not uncommon for a patch to have contributions from multiple authors. In
+this scenario, git commits will usually be expected to have a ``Signed-off-by``
+line for each contributor involved in creation of the patch. Some edge cases:
+
+  * The non-primary author's contributions were so trivial that they can be
+    considered not subject to copyright. In this case the secondary authors
+    need not include a ``Signed-off-by``.
+
+    This case most commonly applies where QEMU reviewers give short snippets
+    of code as suggested fixes to a patch. The reviewers don't need to have
+    their own ``Signed-off-by`` added unless their code suggestion was
+    unusually large, but it is common to add ``Suggested-by`` as a credit
+    for non-trivial code.
+
+  * Both contributors work for the same employer and the employer requires
+    copyright assignment.
+
+    It can be said that in this case a ``Signed-off-by`` is indicating that
+    the person has permission to contribute from their employer who is the
+    copyright holder. It is nonetheless still preferable to include a
+    ``Signed-off-by`` for each contributor, as in some countries employees are
+    not able to assign copyright to their employer, and it also covers any
+    time invested outside working hours.
+
+When multiple ``Signed-off-by`` tags are present, they should be strictly kept
+in order of authorship, from oldest to newest.
+
+Other commit tags
+~~~~~~~~~~~~~~~~~
+
+While the ``Signed-off-by`` tag is mandatory, there are a number of other tags
+that are commonly used during QEMU development:
+
+ * **``Reviewed-by``**: when a QEMU community member reviews a patch on the
+   mailing list, if they consider the patch acceptable, they should send an
+   email reply containing a ``Reviewed-by`` tag. Subsystem maintainers who
+   review a patch should add this even if they are also adding their
+   ``Signed-off-by`` to the same commit.
+
+ * **``Acked-by``**: when a QEMU subsystem maintainer approves a patch that
+   touches their subsystem, but intends to allow a different maintainer to
+   queue it and send a pull request, they would send a mail containing a
+   ``Acked-by`` tag. Where a patch touches multiple subsystems, ``Acked-by``
+   only implies review of the maintainers' own areas of responsibility. If a
+   maintainer wants to indicate they have done a full review they should use
+   a ``Reviewed-by`` tag.
+
+ * **``Tested-by``**: when a QEMU community member has functionally tested the
+   behaviour of the patch in some manner, they should send an email reply
+   containing a ``Tested-by`` tag.
+
+ * **``Reported-by``**: when a QEMU community member reports a problem via the
+   mailing list, or some other informal channel that is not the issue tracker,
+   it is good practice to credit them by including a ``Reported-by`` tag on
+   any patch fixing the issue. When the problem is reported via the GitLab
+   issue tracker, however, it is sufficient to just include a link to the
+   issue.
+
+ * **``Suggested-by``**: when a reviewer or other 3rd party makes non-trivial
+   suggestions for how to change a patch, it is good practice to credit them
+   by including a ``Suggested-by`` tag.
+
+Subsystem maintainer requirements
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+When a subsystem maintainer accepts a patch from a contributor, in addition to
+the normal code review points, they are expected to validate the presence of
+suitable ``Signed-off-by`` tags.
+
+At the time they queue the patch in their subsystem tree, the maintainer
+**must** also then add their own ``Signed-off-by`` to indicate that they have
+done the aforementioned validation. This is in addition to any of their own
+``Reviewed-by`` tags the subsystem maintainer may wish to include.
+
+Tools for adding ``Signed-off-by``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+There are a variety of ways tools can support adding ``Signed-off-by`` tags
+for patches, avoiding the need for contributors to manually type in this
+repetitive text each time.
+
+git commands
+^^^^^^^^^^^^
+
+When creating, or amending, a commit the ``-s`` flag to ``git commit`` will
+append a suitable line matching the configured git author details.
+
+If preparing patches using the ``git format-patch`` tool, the ``-s`` flag can
+be used to append a suitable line in the emails it creates, without modifying
+the local commits. Alternatively to modify all the local commits on a branch::
+
+  git rebase master -x 'git commit --amend --no-edit -s'
+
+emacs
+^^^^^
+
+In the file ``$HOME/.emacs.d/abbrev_defs`` add:
+
+.. code:: elisp
+
+  (define-abbrev-table 'global-abbrev-table
+    '(
+      ("8rev" "Reviewed-by: YOUR NAME <your@email.addr>" nil 1)
+      ("8ack" "Acked-by: YOUR NAME <your@email.addr>" nil 1)
+      ("8test" "Tested-by: YOUR NAME <your@email.addr>" nil 1)
+      ("8sob" "Signed-off-by: YOUR NAME <your@email.addr>" nil 1)
+     ))
+
+with this change, if you type (for example) ``8rev`` followed by ``<space>``
+or ``<enter>`` it will expand to the whole phrase.
+
+vim
+^^^
+
+In the file ``$HOME/.vimrc`` add::
+
+  iabbrev 8rev Reviewed-by: YOUR NAME <your@email.addr>
+  iabbrev 8ack Acked-by: YOUR NAME <your@email.addr>
+  iabbrev 8test Tested-by: YOUR NAME <your@email.addr>
+  iabbrev 8sob Signed-off-by: YOUR NAME <your@email.addr>
+
+with this change, if you type (for example) ``8rev`` followed by ``<space>``
+or ``<enter>`` it will expand to the whole phrase.
+
+Re-starting abandoned work
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+For a variety of reasons there are some patches that get submitted to QEMU but
+never merged. An unrelated contributor may decide (months or years later) to
+continue working from the abandoned patch and re-submit it with extra changes.
+
+The general principles when picking up abandoned work are:
+
+ * Continue to credit the original author for their work, by maintaining their
+   original ``Signed-off-by``
+ * Indicate where the original patch was obtained from (mailing list, bug
+   tracker, author's git repo, etc) when sending it for review
+ * Acknowledge the extra work of the new contributor by including their
+   ``Signed-off-by`` in the patch in addition to the orignal author's
+ * Indicate who is responsible for what parts of the patch. This is typically
+   done via a note in the commit message, just prior to the new contributor's
+   ``Signed-off-by``::
+
+    Signed-off-by: Some Person <some.person@example.com>
+    [Rebased and added support for 'foo']
+    Signed-off-by: New Person <new.person@mycorp.test>
+
+In complicated cases, or if otherwise unsure, ask for advice on the project
+mailing list.
+
+It is also recommended to attempt to contact the original author to let them
+know you are interested in taking over their work, in case they still intended
+to return to the work, or had any suggestions about the best way to continue.
diff --git a/docs/devel/index-process.rst b/docs/devel/index-process.rst
index cb7c6640fd..5807752d70 100644
--- a/docs/devel/index-process.rst
+++ b/docs/devel/index-process.rst
@@ -13,6 +13,7 @@ Notes about how to interact with the community and how and where to submit patch
    maintainers
    style
    submitting-a-patch
+   code-provenance
    trivial-patches
    stable-process
    submitting-a-pull-request
diff --git a/docs/devel/submitting-a-patch.rst b/docs/devel/submitting-a-patch.rst
index 65c64078cb..8624f21673 100644
--- a/docs/devel/submitting-a-patch.rst
+++ b/docs/devel/submitting-a-patch.rst
@@ -344,28 +344,14 @@ Patch emails must include a ``Signed-off-by:`` line
 
 Your patches **must** include a Signed-off-by: line. This is a hard
 requirement because it's how you say "I'm legally okay to contribute
-this and happy for it to go into QEMU". The process is modelled after
-the `Linux kernel
-<http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches?id=f6f94e2ab1b33f0082ac22d71f66385a60d8157f#n297>`__
-policy.
-
-If you wrote the patch, make sure your "From:" and "Signed-off-by:"
-lines use the same spelling. It's okay if you subscribe or contribute to
-the list via more than one address, but using multiple addresses in one
-commit just confuses things. If someone else wrote the patch, git will
-include a "From:" line in the body of the email (different from your
-envelope From:) that will give credit to the correct author; but again,
-that author's Signed-off-by: line is mandatory, with the same spelling.
+this and happy for it to go into QEMU". For full guidance, read the
+:ref:`code-provenance` documentation.
 
 The name used with "Signed-off-by" does not need to be your legal name,
 nor birth name, nor appear on any government ID. It is the identity you
 choose to be known by in the community, but should not be anonymous,
 nor misrepresent whom you are.
 
-There are various tooling options for automatically adding these tags
-include using ``git commit -s`` or ``git format-patch -s``. For more
-information see `SubmittingPatches 1.12
-<http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches?id=f6f94e2ab1b33f0082ac22d71f66385a60d8157f#n297>`__.
 
 .. _include_a_meaningful_cover_letter:
 
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 2/3] docs: define policy limiting the inclusion of generated files
  2025-06-03 14:25 [PATCH v3 0/3] docs: define policy forbidding use of "AI" / LLM code generators Markus Armbruster
  2025-06-03 14:25 ` [PATCH v3 1/3] docs: introduce dedicated page about code provenance / sign-off Markus Armbruster
@ 2025-06-03 14:25 ` Markus Armbruster
  2025-06-03 14:25 ` [PATCH v3 3/3] docs: define policy forbidding use of AI code generators Markus Armbruster
  2025-06-03 15:25 ` [PATCH v3 0/3] docs: define policy forbidding use of "AI" / LLM " Kevin Wolf
  3 siblings, 0 replies; 29+ messages in thread
From: Markus Armbruster @ 2025-06-03 14:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P . Berrangé, Thomas Huth, Alex Bennée,
	Michael S . Tsirkin, Gerd Hoffmann, Mark Cave-Ayland,
	Philippe Mathieu-Daudé, Kevin Wolf, Stefan Hajnoczi,
	Alexander Graf, Paolo Bonzini, Richard Henderson, Peter Maydell

From: Daniel P. Berrangé <berrange@redhat.com>

Files contributed to QEMU are generally expected to be provided in the
preferred format for manipulation. IOW, we generally don't expect to
have generated / compiled code included in the tree, rather, we expect
to run the code generator / compiler as part of the build process.

There are some obvious exceptions to this seen in our existing tree, the
biggest one being the inclusion of many binary firmware ROMs. A more
niche example is the inclusion of a generated eBPF program. Or the CI
dockerfiles which are mostly auto-generated. In these cases, however,
the preferred format source code is still required to be included,
alongside the generated output.

Tools which perform user defined algorithmic transformations on code are
not considered to be "code generators". ie, we permit use of coccinelle,
spell checkers, and sed/awk/etc to manipulate code. Such use of automated
manipulation should still be declared in the commit message.

One off generators which create a boilerplate file which the author then
fills in, are acceptable if their output has clear copyright and license
status. This could be where a contributor writes a throwaway python
script to automate creation of some mundane piece of code for example.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 docs/devel/code-provenance.rst | 55 ++++++++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)

diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
index 4fc12061b5..c27d8fe649 100644
--- a/docs/devel/code-provenance.rst
+++ b/docs/devel/code-provenance.rst
@@ -216,3 +216,58 @@ mailing list.
 It is also recommended to attempt to contact the original author to let them
 know you are interested in taking over their work, in case they still intended
 to return to the work, or had any suggestions about the best way to continue.
+
+Inclusion of generated files
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Files in patches contributed to QEMU are generally expected to be provided
+only in the preferred format for making modifications. The implication of
+this is that the output of code generators or compilers is usually not
+appropriate to contribute to QEMU.
+
+For reasons of practicality there are some exceptions to this rule, where
+generated code is permitted, provided it is also accompanied by the
+corresponding preferred source format. This is done where it is impractical
+to expect those building QEMU to run the code generation or compilation
+process. A non-exhaustive list of examples is:
+
+ * Images: where an bitmap image is created from a vector file it is common
+   to include the rendered bitmaps at desired resolution(s), since subtle
+   changes in the rasterization process / tools may affect quality. The
+   original vector file is expected to accompany any generated bitmaps.
+
+ * Firmware: QEMU includes pre-compiled binary ROMs for a variety of guest
+   firmwares. When such binary ROMs are contributed, the corresponding source
+   must also be provided, either directly, or through a git submodule link.
+
+ * Dockerfiles: the majority of the dockerfiles are automatically generated
+   from a canonical list of build dependencies maintained in tree, together
+   with the libvirt-ci git submodule link. The generated dockerfiles are
+   included in tree because it is desirable to be able to directly build
+   container images from a clean git checkout.
+
+ * eBPF: QEMU includes some generated eBPF machine code, since the required
+   eBPF compilation tools are not broadly available on all targetted OS
+   distributions. The corresponding eBPF C code for the binary is also
+   provided. This is a time-limited exception until the eBPF toolchain is
+   sufficiently broadly available in distros.
+
+In all cases above, the existence of generated files must be acknowledged
+and justified in the commit that introduces them.
+
+Tools which perform changes to existing code with deterministic algorithmic
+manipulation, driven by user specified inputs, are not generally considered
+to be "generators".
+
+For instance, using Coccinelle to convert code from one pattern to another
+pattern, or fixing documentation typos with a spell checker, or transforming
+code using sed / awk / etc, are not considered to be acts of code
+generation. Where an automated manipulation is performed on code, however,
+this should be declared in the commit message.
+
+At times contributors may use or create scripts/tools to generate an initial
+boilerplate code template which is then filled in to produce the final patch.
+The output of such a tool would still be considered the "preferred format",
+since it is intended to be a foundation for further human authored changes.
+Such tools are acceptable to use, provided they follow a deterministic process
+and there is clearly defined copyright and licensing for their output.
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 3/3] docs: define policy forbidding use of AI code generators
  2025-06-03 14:25 [PATCH v3 0/3] docs: define policy forbidding use of "AI" / LLM code generators Markus Armbruster
  2025-06-03 14:25 ` [PATCH v3 1/3] docs: introduce dedicated page about code provenance / sign-off Markus Armbruster
  2025-06-03 14:25 ` [PATCH v3 2/3] docs: define policy limiting the inclusion of generated files Markus Armbruster
@ 2025-06-03 14:25 ` Markus Armbruster
  2025-06-03 15:37   ` Kevin Wolf
  2025-06-03 18:25   ` Stefan Hajnoczi
  2025-06-03 15:25 ` [PATCH v3 0/3] docs: define policy forbidding use of "AI" / LLM " Kevin Wolf
  3 siblings, 2 replies; 29+ messages in thread
From: Markus Armbruster @ 2025-06-03 14:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Daniel P . Berrangé, Thomas Huth, Alex Bennée,
	Michael S . Tsirkin, Gerd Hoffmann, Mark Cave-Ayland,
	Philippe Mathieu-Daudé, Kevin Wolf, Stefan Hajnoczi,
	Alexander Graf, Paolo Bonzini, Richard Henderson, Peter Maydell,
	Stefan Hajnoczi

From: Daniel P. Berrangé <berrange@redhat.com>

There has been an explosion of interest in so called AI code
generators. Thus far though, this is has not been matched by a broadly
accepted legal interpretation of the licensing implications for code
generator outputs. While the vendors may claim there is no problem and
a free choice of license is possible, they have an inherent conflict
of interest in promoting this interpretation. More broadly there is,
as yet, no broad consensus on the licensing implications of code
generators trained on inputs under a wide variety of licenses

The DCO requires contributors to assert they have the right to
contribute under the designated project license. Given the lack of
consensus on the licensing of AI code generator output, it is not
considered credible to assert compliance with the DCO clause (b) or (c)
where a patch includes such generated code.

This patch thus defines a policy that the QEMU project will currently
not accept contributions where use of AI code generators is either
known, or suspected.

These are early days of AI-assisted software development. The legal
questions will be resolved eventually. The tools will mature, and we
can expect some to become safely usable in free software projects.
The policy we set now must be for today, and be open to revision. It's
best to start strict and safe, then relax.

Meanwhile requests for exceptions can also be considered on a case by
case basis.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@gmail.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 docs/devel/code-provenance.rst | 50 +++++++++++++++++++++++++++++++++-
 1 file changed, 49 insertions(+), 1 deletion(-)

diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
index c27d8fe649..261263cfba 100644
--- a/docs/devel/code-provenance.rst
+++ b/docs/devel/code-provenance.rst
@@ -270,4 +270,52 @@ boilerplate code template which is then filled in to produce the final patch.
 The output of such a tool would still be considered the "preferred format",
 since it is intended to be a foundation for further human authored changes.
 Such tools are acceptable to use, provided they follow a deterministic process
-and there is clearly defined copyright and licensing for their output.
+and there is clearly defined copyright and licensing for their output. Note
+in particular the caveats applying to AI code generators below.
+
+Use of AI code generators
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+TL;DR:
+
+  **Current QEMU project policy is to DECLINE any contributions which are
+  believed to include or derive from AI generated code. This includes ChatGPT,
+  CoPilot, Llama and similar tools**
+
+The increasing prevalence of AI code generators, most notably but not limited
+to, `Large Language Models <https://en.wikipedia.org/wiki/Large_language_model>`__
+(LLMs) results in a number of difficult legal questions and risks for software
+projects, including QEMU.
+
+The QEMU community requires that contributors certify their patch submissions
+are made in accordance with the rules of the dco_ (DCO).
+
+To satisfy the DCO, the patch contributor has to fully understand the
+copyright and license status of code they are contributing to QEMU. With AI
+code generators, the copyright and license status of the output is ill-defined
+with no generally accepted, settled legal foundation.
+
+Where the training material is known, it is common for it to include large
+volumes of material under restrictive licensing/copyright terms. Even where
+the training material is all known to be under open source licenses, it is
+likely to be under a variety of terms, not all of which will be compatible
+with QEMU's licensing requirements.
+
+How contributors could comply with DCO terms (b) or (c) for the output of AI
+code generators commonly available today is unclear.  The QEMU project is not
+willing or able to accept the legal risks of non-compliance.
+
+The QEMU project thus requires that contributors refrain from using AI code
+generators on patches intended to be submitted to the project, and will
+decline any contribution if use of AI is either known or suspected.
+
+Examples of tools impacted by this policy includes both GitHub's CoPilot,
+OpenAI's ChatGPT, and Meta's Code Llama, amongst many others which are less
+well known.
+
+This policy may evolve as AI tools mature and the legal situation is
+clarifed. In the meanwhile, requests for exceptions to this policy will be
+evaluated by the QEMU project on a case by case basis. To be granted an
+exception, a contributor will need to demonstrate clarity of the license and
+copyright status for the tool's output in relation to its training model and
+code, to the satisfaction of the project maintainers.
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 0/3] docs: define policy forbidding use of "AI" / LLM code generators
  2025-06-03 14:25 [PATCH v3 0/3] docs: define policy forbidding use of "AI" / LLM code generators Markus Armbruster
                   ` (2 preceding siblings ...)
  2025-06-03 14:25 ` [PATCH v3 3/3] docs: define policy forbidding use of AI code generators Markus Armbruster
@ 2025-06-03 15:25 ` Kevin Wolf
  3 siblings, 0 replies; 29+ messages in thread
From: Kevin Wolf @ 2025-06-03 15:25 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: qemu-devel, Daniel P . Berrangé, Thomas Huth,
	Alex Bennée, Michael S . Tsirkin, Gerd Hoffmann,
	Mark Cave-Ayland, Philippe Mathieu-Daudé, Stefan Hajnoczi,
	Alexander Graf, Paolo Bonzini, Richard Henderson, Peter Maydell

Am 03.06.2025 um 16:25 hat Markus Armbruster geschrieben:
> More than a year ago, Daniel posted patches to put an AI policy in
> writing.  Reception was mostly positive.  A v2 to address feedback
> followed with some delay.  But no pull request.
> 
> I asked Daniel why, and he told me he was concerned it might go too
> far in its interpretation of the DCO requirements.  After a bit of
> discussion, I think Daniel's text is basically fine.  The policy it
> describes is simple and strict.  Relaxing policy is easier than
> tightening it.  I softened the phrasing slightly, addressed open
> review comments, and fixed a few minor things I found myself.
>
> [...]

Reviewed-by: Kevin Wolf <kwolf@redhat.com>



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 3/3] docs: define policy forbidding use of AI code generators
  2025-06-03 14:25 ` [PATCH v3 3/3] docs: define policy forbidding use of AI code generators Markus Armbruster
@ 2025-06-03 15:37   ` Kevin Wolf
  2025-06-04  6:18     ` Markus Armbruster
  2025-06-03 18:25   ` Stefan Hajnoczi
  1 sibling, 1 reply; 29+ messages in thread
From: Kevin Wolf @ 2025-06-03 15:37 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: qemu-devel, Daniel P . Berrangé, Thomas Huth,
	Alex Bennée, Michael S . Tsirkin, Gerd Hoffmann,
	Mark Cave-Ayland, Philippe Mathieu-Daudé, Stefan Hajnoczi,
	Alexander Graf, Paolo Bonzini, Richard Henderson, Peter Maydell,
	Stefan Hajnoczi

Am 03.06.2025 um 16:25 hat Markus Armbruster geschrieben:
> +TL;DR:
> +
> +  **Current QEMU project policy is to DECLINE any contributions which are
> +  believed to include or derive from AI generated code. This includes ChatGPT,
> +  CoPilot, Llama and similar tools**

[...]

> +Examples of tools impacted by this policy includes both GitHub's CoPilot,
> +OpenAI's ChatGPT, and Meta's Code Llama, amongst many others which are less
> +well known.

I wonder if the best list of examples is still the same now, a year
after the original version of the document was written. In particular,
maybe including an example of popular vibe coding IDEs like Cursor would
make sense?

But it's only examples anyway, so either way is fine.

Kevin



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 1/3] docs: introduce dedicated page about code provenance / sign-off
  2025-06-03 14:25 ` [PATCH v3 1/3] docs: introduce dedicated page about code provenance / sign-off Markus Armbruster
@ 2025-06-03 16:53   ` Alex Bennée
  2025-06-04  6:44     ` Markus Armbruster
  0 siblings, 1 reply; 29+ messages in thread
From: Alex Bennée @ 2025-06-03 16:53 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: qemu-devel, Daniel P . Berrangé, Thomas Huth,
	Michael S . Tsirkin, Gerd Hoffmann, Mark Cave-Ayland,
	Philippe Mathieu-Daudé, Kevin Wolf, Stefan Hajnoczi,
	Alexander Graf, Paolo Bonzini, Richard Henderson, Peter Maydell

Markus Armbruster <armbru@redhat.com> writes:

> From: Daniel P. Berrangé <berrange@redhat.com>
>
> Currently we have a short paragraph saying that patches must include
> a Signed-off-by line, and merely link to the kernel documentation.
> The linked kernel docs have a lot of content beyond the part about
> sign-off an thus are misleading/distracting to QEMU contributors.
>
> This introduces a dedicated 'code-provenance' page in QEMU talking
> about why we require sign-off, explaining the other tags we commonly
> use, and what to do in some edge cases.
>
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>  docs/devel/code-provenance.rst    | 218 ++++++++++++++++++++++++++++++
>  docs/devel/index-process.rst      |   1 +
>  docs/devel/submitting-a-patch.rst |  18 +--
>  3 files changed, 221 insertions(+), 16 deletions(-)
>  create mode 100644 docs/devel/code-provenance.rst
>
> diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
> new file mode 100644
> index 0000000000..4fc12061b5
> --- /dev/null
> +++ b/docs/devel/code-provenance.rst
> @@ -0,0 +1,218 @@
> +.. _code-provenance:
> +
> +Code provenance
> +===============
> +
> +Certifying patch submissions
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The QEMU community **mandates** all contributors to certify provenance of
> +patch submissions they make to the project. To put it another way,
> +contributors must indicate that they are legally permitted to contribute to
> +the project.
> +
> +Certification is achieved with a low overhead by adding a single line to the
> +bottom of every git commit::

s/git commit/commit/ throughout?

> +
> +   Signed-off-by: YOUR NAME <YOUR@EMAIL>
> +
> +using a known identity (sorry, no anonymous contributions.)
> +

maybe "(contributions cannot be anonymous)" is more direct?

> +The addition of this line asserts that the author of the patch is contributing
> +in accordance with the clauses specified in the
> +`Developer's Certificate of Origin <https://developercertificate.org>`__:
> +
> +.. _dco:
> +
> +  Developer's Certificate of Origin 1.1
> +
> +  By making a contribution to this project, I certify that:
> +
> +  (a) The contribution was created in whole or in part by me and I
> +      have the right to submit it under the open source license
> +      indicated in the file; or
> +
> +  (b) The contribution is based upon previous work that, to the best
> +      of my knowledge, is covered under an appropriate open source
> +      license and I have the right under that license to submit that
> +      work with modifications, whether created in whole or in part
> +      by me, under the same open source license (unless I am
> +      permitted to submit under a different license), as indicated
> +      in the file; or
> +
> +  (c) The contribution was provided directly to me by some other
> +      person who certified (a), (b) or (c) and I have not modified
> +      it.
> +
> +  (d) I understand and agree that this project and the contribution
> +      are public and that a record of the contribution (including all
> +      personal information I submit with it, including my sign-off) is
> +      maintained indefinitely and may be redistributed consistent with
> +      this project or the open source license(s) involved.
> +
> +It is generally expected that the name and email addresses used in one of the
> +``Signed-off-by`` lines, matches that of the git commit ``Author`` field.
> +It's okay if you subscribe or contribute to the list via more than one
> +address, but using multiple addresses in one commit just confuses
> +things.
> +
> +If the person sending the mail is not one of the patch authors, they are
> +nonetheless expected to add their own ``Signed-off-by`` to comply with the
> +DCO clause (c).

We should probably mention that sometimes the committer may update the
patch after they have pulled it into the tree. In those cases we preface
the S-o-B tag with a comment:

  Signed-off-by: Original Hacker <hacker@domain>
  [MH: tweaked the commit message for clarity]
  Signed-off-by: Maintainer Hacker <hacker@another.com>

> +
> +Multiple authorship
> +~~~~~~~~~~~~~~~~~~~
> +
> +It is not uncommon for a patch to have contributions from multiple authors. In
> +this scenario, git commits will usually be expected to have a ``Signed-off-by``
> +line for each contributor involved in creation of the patch. Some edge cases:
> +
> +  * The non-primary author's contributions were so trivial that they can be
> +    considered not subject to copyright. In this case the secondary authors
> +    need not include a ``Signed-off-by``.
> +
> +    This case most commonly applies where QEMU reviewers give short snippets
> +    of code as suggested fixes to a patch. The reviewers don't need to have
> +    their own ``Signed-off-by`` added unless their code suggestion was
> +    unusually large, but it is common to add ``Suggested-by`` as a credit
> +    for non-trivial code.
> +
> +  * Both contributors work for the same employer and the employer requires
> +    copyright assignment.
> +
> +    It can be said that in this case a ``Signed-off-by`` is indicating that
> +    the person has permission to contribute from their employer who is the
> +    copyright holder. It is nonetheless still preferable to include a
> +    ``Signed-off-by`` for each contributor, as in some countries employees are
> +    not able to assign copyright to their employer, and it also covers any
> +    time invested outside working hours.
> +
> +When multiple ``Signed-off-by`` tags are present, they should be strictly kept
> +in order of authorship, from oldest to newest.
> +
> +Other commit tags
> +~~~~~~~~~~~~~~~~~
> +
> +While the ``Signed-off-by`` tag is mandatory, there are a number of other tags
> +that are commonly used during QEMU development:
> +
> + * **``Reviewed-by``**: when a QEMU community member reviews a patch on the
> +   mailing list, if they consider the patch acceptable, they should send an
> +   email reply containing a ``Reviewed-by`` tag. Subsystem maintainers who
> +   review a patch should add this even if they are also adding their
> +   ``Signed-off-by`` to the same commit.
> +
> + * **``Acked-by``**: when a QEMU subsystem maintainer approves a patch that
> +   touches their subsystem, but intends to allow a different maintainer to
> +   queue it and send a pull request, they would send a mail containing a
> +   ``Acked-by`` tag. Where a patch touches multiple subsystems, ``Acked-by``
> +   only implies review of the maintainers' own areas of responsibility. If a
> +   maintainer wants to indicate they have done a full review they should use
> +   a ``Reviewed-by`` tag.
> +
> + * **``Tested-by``**: when a QEMU community member has functionally tested the
> +   behaviour of the patch in some manner, they should send an email reply
> +   containing a ``Tested-by`` tag.
> +
> + * **``Reported-by``**: when a QEMU community member reports a problem via the
> +   mailing list, or some other informal channel that is not the issue tracker,
> +   it is good practice to credit them by including a ``Reported-by`` tag on
> +   any patch fixing the issue. When the problem is reported via the GitLab
> +   issue tracker, however, it is sufficient to just include a link to the
> +   issue.

We don't mention the Link: or Message-Id: tags.

> +
> + * **``Suggested-by``**: when a reviewer or other 3rd party makes non-trivial
> +   suggestions for how to change a patch, it is good practice to credit them
> +   by including a ``Suggested-by`` tag.
> +
> +Subsystem maintainer requirements
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +When a subsystem maintainer accepts a patch from a contributor, in addition to
> +the normal code review points, they are expected to validate the presence of
> +suitable ``Signed-off-by`` tags.
> +
> +At the time they queue the patch in their subsystem tree, the maintainer
> +**must** also then add their own ``Signed-off-by`` to indicate that they have
> +done the aforementioned validation. This is in addition to any of their own
> +``Reviewed-by`` tags the subsystem maintainer may wish to include.
> +
> +Tools for adding ``Signed-off-by``
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +There are a variety of ways tools can support adding ``Signed-off-by`` tags
> +for patches, avoiding the need for contributors to manually type in this
> +repetitive text each time.
> +
> +git commands
> +^^^^^^^^^^^^
> +
> +When creating, or amending, a commit the ``-s`` flag to ``git commit`` will
> +append a suitable line matching the configured git author details.
> +
> +If preparing patches using the ``git format-patch`` tool, the ``-s`` flag can
> +be used to append a suitable line in the emails it creates, without modifying
> +the local commits. Alternatively to modify all the local commits on a branch::
> +
> +  git rebase master -x 'git commit --amend --no-edit -s'
> +

Much as I love Emacs I wonder if this next section is worth it given the
multiple ways you can solve this (I use yas-snippet expansions for
example).

If we do want to mention the editors we should probably also mention b4.

> +emacs
> +^^^^^
> +
> +In the file ``$HOME/.emacs.d/abbrev_defs`` add:
> +
> +.. code:: elisp
> +
> +  (define-abbrev-table 'global-abbrev-table
> +    '(
> +      ("8rev" "Reviewed-by: YOUR NAME <your@email.addr>" nil 1)
> +      ("8ack" "Acked-by: YOUR NAME <your@email.addr>" nil 1)
> +      ("8test" "Tested-by: YOUR NAME <your@email.addr>" nil 1)
> +      ("8sob" "Signed-off-by: YOUR NAME <your@email.addr>" nil 1)
> +     ))
> +
> +with this change, if you type (for example) ``8rev`` followed by ``<space>``
> +or ``<enter>`` it will expand to the whole phrase.
> +
> +vim
> +^^^
> +
> +In the file ``$HOME/.vimrc`` add::
> +
> +  iabbrev 8rev Reviewed-by: YOUR NAME <your@email.addr>
> +  iabbrev 8ack Acked-by: YOUR NAME <your@email.addr>
> +  iabbrev 8test Tested-by: YOUR NAME <your@email.addr>
> +  iabbrev 8sob Signed-off-by: YOUR NAME <your@email.addr>
> +
> +with this change, if you type (for example) ``8rev`` followed by ``<space>``
> +or ``<enter>`` it will expand to the whole phrase.
> +
> +Re-starting abandoned work
> +~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +For a variety of reasons there are some patches that get submitted to QEMU but
> +never merged. An unrelated contributor may decide (months or years later) to
> +continue working from the abandoned patch and re-submit it with extra changes.
> +
> +The general principles when picking up abandoned work are:
> +
> + * Continue to credit the original author for their work, by maintaining their
> +   original ``Signed-off-by``
> + * Indicate where the original patch was obtained from (mailing list, bug
> +   tracker, author's git repo, etc) when sending it for review
> + * Acknowledge the extra work of the new contributor by including their
> +   ``Signed-off-by`` in the patch in addition to the orignal author's
> + * Indicate who is responsible for what parts of the patch. This is typically
> +   done via a note in the commit message, just prior to the new contributor's
> +   ``Signed-off-by``::
> +
> +    Signed-off-by: Some Person <some.person@example.com>
> +    [Rebased and added support for 'foo']
> +    Signed-off-by: New Person <new.person@mycorp.test>
> +
> +In complicated cases, or if otherwise unsure, ask for advice on the project
> +mailing list.
> +
> +It is also recommended to attempt to contact the original author to let them
> +know you are interested in taking over their work, in case they still intended
> +to return to the work, or had any suggestions about the best way to continue.
> diff --git a/docs/devel/index-process.rst b/docs/devel/index-process.rst
> index cb7c6640fd..5807752d70 100644
> --- a/docs/devel/index-process.rst
> +++ b/docs/devel/index-process.rst
> @@ -13,6 +13,7 @@ Notes about how to interact with the community and how and where to submit patch
>     maintainers
>     style
>     submitting-a-patch
> +   code-provenance
>     trivial-patches
>     stable-process
>     submitting-a-pull-request
> diff --git a/docs/devel/submitting-a-patch.rst b/docs/devel/submitting-a-patch.rst
> index 65c64078cb..8624f21673 100644
> --- a/docs/devel/submitting-a-patch.rst
> +++ b/docs/devel/submitting-a-patch.rst
> @@ -344,28 +344,14 @@ Patch emails must include a ``Signed-off-by:`` line
>  
>  Your patches **must** include a Signed-off-by: line. This is a hard
>  requirement because it's how you say "I'm legally okay to contribute
> -this and happy for it to go into QEMU". The process is modelled after
> -the `Linux kernel
> -<http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches?id=f6f94e2ab1b33f0082ac22d71f66385a60d8157f#n297>`__
> -policy.
> -
> -If you wrote the patch, make sure your "From:" and "Signed-off-by:"
> -lines use the same spelling. It's okay if you subscribe or contribute to
> -the list via more than one address, but using multiple addresses in one
> -commit just confuses things. If someone else wrote the patch, git will
> -include a "From:" line in the body of the email (different from your
> -envelope From:) that will give credit to the correct author; but again,
> -that author's Signed-off-by: line is mandatory, with the same spelling.
> +this and happy for it to go into QEMU". For full guidance, read the
> +:ref:`code-provenance` documentation.
>  
>  The name used with "Signed-off-by" does not need to be your legal name,
>  nor birth name, nor appear on any government ID. It is the identity you
>  choose to be known by in the community, but should not be anonymous,
>  nor misrepresent whom you are.
>  
> -There are various tooling options for automatically adding these tags
> -include using ``git commit -s`` or ``git format-patch -s``. For more
> -information see `SubmittingPatches 1.12
> -<http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches?id=f6f94e2ab1b33f0082ac22d71f66385a60d8157f#n297>`__.
>  
>  .. _include_a_meaningful_cover_letter:

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 3/3] docs: define policy forbidding use of AI code generators
  2025-06-03 14:25 ` [PATCH v3 3/3] docs: define policy forbidding use of AI code generators Markus Armbruster
  2025-06-03 15:37   ` Kevin Wolf
@ 2025-06-03 18:25   ` Stefan Hajnoczi
  2025-06-04  6:17     ` Markus Armbruster
  2025-06-04  9:10     ` Daniel P. Berrangé
  1 sibling, 2 replies; 29+ messages in thread
From: Stefan Hajnoczi @ 2025-06-03 18:25 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: qemu-devel, Daniel P . Berrangé, Thomas Huth,
	Alex Bennée, Michael S . Tsirkin, Gerd Hoffmann,
	Mark Cave-Ayland, Philippe Mathieu-Daudé, Kevin Wolf,
	Stefan Hajnoczi, Alexander Graf, Paolo Bonzini, Richard Henderson,
	Peter Maydell

On Tue, Jun 3, 2025 at 10:25 AM Markus Armbruster <armbru@redhat.com> wrote:
>
> From: Daniel P. Berrangé <berrange@redhat.com>
>
> There has been an explosion of interest in so called AI code
> generators. Thus far though, this is has not been matched by a broadly
> accepted legal interpretation of the licensing implications for code
> generator outputs. While the vendors may claim there is no problem and
> a free choice of license is possible, they have an inherent conflict
> of interest in promoting this interpretation. More broadly there is,
> as yet, no broad consensus on the licensing implications of code
> generators trained on inputs under a wide variety of licenses
>
> The DCO requires contributors to assert they have the right to
> contribute under the designated project license. Given the lack of
> consensus on the licensing of AI code generator output, it is not
> considered credible to assert compliance with the DCO clause (b) or (c)
> where a patch includes such generated code.
>
> This patch thus defines a policy that the QEMU project will currently
> not accept contributions where use of AI code generators is either
> known, or suspected.
>
> These are early days of AI-assisted software development. The legal
> questions will be resolved eventually. The tools will mature, and we
> can expect some to become safely usable in free software projects.
> The policy we set now must be for today, and be open to revision. It's
> best to start strict and safe, then relax.
>
> Meanwhile requests for exceptions can also be considered on a case by
> case basis.
>
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> Acked-by: Stefan Hajnoczi <stefanha@gmail.com>
> Reviewed-by: Kevin Wolf <kwolf@redhat.com>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>  docs/devel/code-provenance.rst | 50 +++++++++++++++++++++++++++++++++-
>  1 file changed, 49 insertions(+), 1 deletion(-)
>
> diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
> index c27d8fe649..261263cfba 100644
> --- a/docs/devel/code-provenance.rst
> +++ b/docs/devel/code-provenance.rst
> @@ -270,4 +270,52 @@ boilerplate code template which is then filled in to produce the final patch.
>  The output of such a tool would still be considered the "preferred format",
>  since it is intended to be a foundation for further human authored changes.
>  Such tools are acceptable to use, provided they follow a deterministic process
> -and there is clearly defined copyright and licensing for their output.
> +and there is clearly defined copyright and licensing for their output. Note
> +in particular the caveats applying to AI code generators below.
> +
> +Use of AI code generators
> +~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +TL;DR:
> +
> +  **Current QEMU project policy is to DECLINE any contributions which are
> +  believed to include or derive from AI generated code. This includes ChatGPT,
> +  CoPilot, Llama and similar tools**

GitHub spells it "Copilot".

Claude is very popular for coding at the moment and probably worth mentioning.

> +
> +The increasing prevalence of AI code generators, most notably but not limited

More detail is needed on what an "AI code generator" is. Coding
assistant tools range from autocompletion to linters to automatic code
generators. In addition there are other AI-related tools like ChatGPT
or Gemini as a chatbot that can people use like Stackoverflow or an
API documentation summarizer.

I think the intent is to say: do not put code that comes from _any_ AI
tool into QEMU.

It would be okay to use AI to research APIs, algorithms, brainstorm
ideas, debug the code, analyze the code, etc but the actual code
changes must not be generated by AI.

> +to, `Large Language Models <https://en.wikipedia.org/wiki/Large_language_model>`__
> +(LLMs) results in a number of difficult legal questions and risks for software
> +projects, including QEMU.
> +
> +The QEMU community requires that contributors certify their patch submissions
> +are made in accordance with the rules of the dco_ (DCO).
> +
> +To satisfy the DCO, the patch contributor has to fully understand the
> +copyright and license status of code they are contributing to QEMU. With AI
> +code generators, the copyright and license status of the output is ill-defined
> +with no generally accepted, settled legal foundation.
> +
> +Where the training material is known, it is common for it to include large
> +volumes of material under restrictive licensing/copyright terms. Even where
> +the training material is all known to be under open source licenses, it is
> +likely to be under a variety of terms, not all of which will be compatible
> +with QEMU's licensing requirements.
> +
> +How contributors could comply with DCO terms (b) or (c) for the output of AI
> +code generators commonly available today is unclear.  The QEMU project is not
> +willing or able to accept the legal risks of non-compliance.
> +
> +The QEMU project thus requires that contributors refrain from using AI code
> +generators on patches intended to be submitted to the project, and will
> +decline any contribution if use of AI is either known or suspected.
> +
> +Examples of tools impacted by this policy includes both GitHub's CoPilot,

Copilot

> +OpenAI's ChatGPT, and Meta's Code Llama, amongst many others which are less
> +well known.
> +
> +This policy may evolve as AI tools mature and the legal situation is
> +clarifed. In the meanwhile, requests for exceptions to this policy will be
> +evaluated by the QEMU project on a case by case basis. To be granted an
> +exception, a contributor will need to demonstrate clarity of the license and
> +copyright status for the tool's output in relation to its training model and
> +code, to the satisfaction of the project maintainers.
> --
> 2.48.1
>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 3/3] docs: define policy forbidding use of AI code generators
  2025-06-03 18:25   ` Stefan Hajnoczi
@ 2025-06-04  6:17     ` Markus Armbruster
  2025-06-04  7:15       ` Daniel P. Berrangé
  2025-06-04  9:10     ` Daniel P. Berrangé
  1 sibling, 1 reply; 29+ messages in thread
From: Markus Armbruster @ 2025-06-04  6:17 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Daniel P . Berrangé, Thomas Huth,
	Alex Bennée, Michael S . Tsirkin, Gerd Hoffmann,
	Mark Cave-Ayland, Philippe Mathieu-Daudé, Kevin Wolf,
	Stefan Hajnoczi, Alexander Graf, Paolo Bonzini, Richard Henderson,
	Peter Maydell

Stefan Hajnoczi <stefanha@gmail.com> writes:

> On Tue, Jun 3, 2025 at 10:25 AM Markus Armbruster <armbru@redhat.com> wrote:
>>
>> From: Daniel P. Berrangé <berrange@redhat.com>
>>
>> There has been an explosion of interest in so called AI code
>> generators. Thus far though, this is has not been matched by a broadly
>> accepted legal interpretation of the licensing implications for code
>> generator outputs. While the vendors may claim there is no problem and
>> a free choice of license is possible, they have an inherent conflict
>> of interest in promoting this interpretation. More broadly there is,
>> as yet, no broad consensus on the licensing implications of code
>> generators trained on inputs under a wide variety of licenses
>>
>> The DCO requires contributors to assert they have the right to
>> contribute under the designated project license. Given the lack of
>> consensus on the licensing of AI code generator output, it is not
>> considered credible to assert compliance with the DCO clause (b) or (c)
>> where a patch includes such generated code.
>>
>> This patch thus defines a policy that the QEMU project will currently
>> not accept contributions where use of AI code generators is either
>> known, or suspected.
>>
>> These are early days of AI-assisted software development. The legal
>> questions will be resolved eventually. The tools will mature, and we
>> can expect some to become safely usable in free software projects.
>> The policy we set now must be for today, and be open to revision. It's
>> best to start strict and safe, then relax.
>>
>> Meanwhile requests for exceptions can also be considered on a case by
>> case basis.
>>
>> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
>> Acked-by: Stefan Hajnoczi <stefanha@gmail.com>
>> Reviewed-by: Kevin Wolf <kwolf@redhat.com>
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>  docs/devel/code-provenance.rst | 50 +++++++++++++++++++++++++++++++++-
>>  1 file changed, 49 insertions(+), 1 deletion(-)
>>
>> diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
>> index c27d8fe649..261263cfba 100644
>> --- a/docs/devel/code-provenance.rst
>> +++ b/docs/devel/code-provenance.rst
>> @@ -270,4 +270,52 @@ boilerplate code template which is then filled in to produce the final patch.
>>  The output of such a tool would still be considered the "preferred format",
>>  since it is intended to be a foundation for further human authored changes.
>>  Such tools are acceptable to use, provided they follow a deterministic process
>> -and there is clearly defined copyright and licensing for their output.
>> +and there is clearly defined copyright and licensing for their output. Note
>> +in particular the caveats applying to AI code generators below.
>> +
>> +Use of AI code generators
>> +~~~~~~~~~~~~~~~~~~~~~~~~~
>> +
>> +TL;DR:
>> +
>> +  **Current QEMU project policy is to DECLINE any contributions which are
>> +  believed to include or derive from AI generated code. This includes ChatGPT,
>> +  CoPilot, Llama and similar tools**
>
> GitHub spells it "Copilot".

I'll fix it.

> Claude is very popular for coding at the moment and probably worth mentioning.

Will do.

>> +
>> +The increasing prevalence of AI code generators, most notably but not limited
>
> More detail is needed on what an "AI code generator" is. Coding
> assistant tools range from autocompletion to linters to automatic code
> generators. In addition there are other AI-related tools like ChatGPT
> or Gemini as a chatbot that can people use like Stackoverflow or an
> API documentation summarizer.
>
> I think the intent is to say: do not put code that comes from _any_ AI
> tool into QEMU.
>
> It would be okay to use AI to research APIs, algorithms, brainstorm
> ideas, debug the code, analyze the code, etc but the actual code
> changes must not be generated by AI.

The existing text is about "AI code generators".  However, the "most
notably LLMs" that follows it could lead readers to believe it's about
more than just code generation, because LLMs are in fact used for more.
I figure this is your concern.

We could instead start wide, then narrow the focus to code generation.
Here's my try:

  The increasing prevalence of AI-assisted software development results
  in a number of difficult legal questions and risks for software
  projects, including QEMU.  Of particular concern is code generated by
  `Large Language Models
  <https://en.wikipedia.org/wiki/Large_language_model>`__ (LLMs).

If we want to mention uses of AI we consider okay, I'd do so further
down, to not distract from the main point here.  Perhaps:

  The QEMU project thus requires that contributors refrain from using AI code
  generators on patches intended to be submitted to the project, and will
  decline any contribution if use of AI is either known or suspected.

  This policy does not apply to other uses of AI, such as researching APIs or
  algorithms, static analysis, or debugging.

  Examples of tools impacted by this policy includes both GitHub's CoPilot,
  OpenAI's ChatGPT, and Meta's Code Llama, amongst many others which are less
  well known.

The paragraph in the middle is new, the other two are unchanged.

Thoughts?

>> +to, `Large Language Models <https://en.wikipedia.org/wiki/Large_language_model>`__
>> +(LLMs) results in a number of difficult legal questions and risks for software
>> +projects, including QEMU.

Thanks!

[...]



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 3/3] docs: define policy forbidding use of AI code generators
  2025-06-03 15:37   ` Kevin Wolf
@ 2025-06-04  6:18     ` Markus Armbruster
  0 siblings, 0 replies; 29+ messages in thread
From: Markus Armbruster @ 2025-06-04  6:18 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-devel, Daniel P . Berrangé, Thomas Huth,
	Alex Bennée, Michael S . Tsirkin, Gerd Hoffmann,
	Mark Cave-Ayland, Philippe Mathieu-Daudé, Stefan Hajnoczi,
	Alexander Graf, Paolo Bonzini, Richard Henderson, Peter Maydell,
	Stefan Hajnoczi

Kevin Wolf <kwolf@redhat.com> writes:

> Am 03.06.2025 um 16:25 hat Markus Armbruster geschrieben:
>> +TL;DR:
>> +
>> +  **Current QEMU project policy is to DECLINE any contributions which are
>> +  believed to include or derive from AI generated code. This includes ChatGPT,
>> +  CoPilot, Llama and similar tools**
>
> [...]
>
>> +Examples of tools impacted by this policy includes both GitHub's CoPilot,
>> +OpenAI's ChatGPT, and Meta's Code Llama, amongst many others which are less
>> +well known.
>
> I wonder if the best list of examples is still the same now, a year
> after the original version of the document was written. In particular,
> maybe including an example of popular vibe coding IDEs like Cursor would
> make sense?
>
> But it's only examples anyway, so either way is fine.

Stefan suggested a few more, and I'll add them.

Thanks!



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 1/3] docs: introduce dedicated page about code provenance / sign-off
  2025-06-03 16:53   ` Alex Bennée
@ 2025-06-04  6:44     ` Markus Armbruster
  2025-06-04  7:18       ` Daniel P. Berrangé
                         ` (3 more replies)
  0 siblings, 4 replies; 29+ messages in thread
From: Markus Armbruster @ 2025-06-04  6:44 UTC (permalink / raw)
  To: Alex Bennée
  Cc: qemu-devel, Daniel P . Berrangé, Thomas Huth,
	Michael S . Tsirkin, Gerd Hoffmann, Mark Cave-Ayland,
	Philippe Mathieu-Daudé, Kevin Wolf, Stefan Hajnoczi,
	Alexander Graf, Paolo Bonzini, Richard Henderson, Peter Maydell

Alex Bennée <alex.bennee@linaro.org> writes:

> Markus Armbruster <armbru@redhat.com> writes:
>
>> From: Daniel P. Berrangé <berrange@redhat.com>
>>
>> Currently we have a short paragraph saying that patches must include
>> a Signed-off-by line, and merely link to the kernel documentation.
>> The linked kernel docs have a lot of content beyond the part about
>> sign-off an thus are misleading/distracting to QEMU contributors.
>>
>> This introduces a dedicated 'code-provenance' page in QEMU talking
>> about why we require sign-off, explaining the other tags we commonly
>> use, and what to do in some edge cases.
>>
>> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
>> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>  docs/devel/code-provenance.rst    | 218 ++++++++++++++++++++++++++++++
>>  docs/devel/index-process.rst      |   1 +
>>  docs/devel/submitting-a-patch.rst |  18 +--
>>  3 files changed, 221 insertions(+), 16 deletions(-)
>>  create mode 100644 docs/devel/code-provenance.rst
>>
>> diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
>> new file mode 100644
>> index 0000000000..4fc12061b5
>> --- /dev/null
>> +++ b/docs/devel/code-provenance.rst
>> @@ -0,0 +1,218 @@
>> +.. _code-provenance:
>> +
>> +Code provenance
>> +===============
>> +
>> +Certifying patch submissions
>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> +
>> +The QEMU community **mandates** all contributors to certify provenance of
>> +patch submissions they make to the project. To put it another way,
>> +contributors must indicate that they are legally permitted to contribute to
>> +the project.
>> +
>> +Certification is achieved with a low overhead by adding a single line to the
>> +bottom of every git commit::
>
> s/git commit/commit/ throughout?

Yes.

>> +
>> +   Signed-off-by: YOUR NAME <YOUR@EMAIL>
>> +
>> +using a known identity (sorry, no anonymous contributions.)
>> +
>
> maybe "(contributions cannot be anonymous)" is more direct?

If we're deviating from the kernel's text (which is *fine*), let's get
rid of the parenthesis:

    using a known identity.  Contributions cannot be anonymous.

or in active voice:

    using a known identity.  We cannot accept anonymous contributions.

I like this one the best.

>> +The addition of this line asserts that the author of the patch is contributing
>> +in accordance with the clauses specified in the
>> +`Developer's Certificate of Origin <https://developercertificate.org>`__:
>> +
>> +.. _dco:
>> +
>> +  Developer's Certificate of Origin 1.1
>> +
>> +  By making a contribution to this project, I certify that:
>> +
>> +  (a) The contribution was created in whole or in part by me and I
>> +      have the right to submit it under the open source license
>> +      indicated in the file; or
>> +
>> +  (b) The contribution is based upon previous work that, to the best
>> +      of my knowledge, is covered under an appropriate open source
>> +      license and I have the right under that license to submit that
>> +      work with modifications, whether created in whole or in part
>> +      by me, under the same open source license (unless I am
>> +      permitted to submit under a different license), as indicated
>> +      in the file; or
>> +
>> +  (c) The contribution was provided directly to me by some other
>> +      person who certified (a), (b) or (c) and I have not modified
>> +      it.
>> +
>> +  (d) I understand and agree that this project and the contribution
>> +      are public and that a record of the contribution (including all
>> +      personal information I submit with it, including my sign-off) is
>> +      maintained indefinitely and may be redistributed consistent with
>> +      this project or the open source license(s) involved.
>> +
>> +It is generally expected that the name and email addresses used in one of the
>> +``Signed-off-by`` lines, matches that of the git commit ``Author`` field.
>> +It's okay if you subscribe or contribute to the list via more than one
>> +address, but using multiple addresses in one commit just confuses
>> +things.
>> +
>> +If the person sending the mail is not one of the patch authors, they are
>> +nonetheless expected to add their own ``Signed-off-by`` to comply with the
>> +DCO clause (c).
>
> We should probably mention that sometimes the committer may update the
> patch after they have pulled it into the tree. In those cases we preface
> the S-o-B tag with a comment:
>
>   Signed-off-by: Original Hacker <hacker@domain>
>   [MH: tweaked the commit message for clarity]
>   Signed-off-by: Maintainer Hacker <hacker@another.com>

Good idea.  Should this go here or under "Subsystem maintainer
requirements"?

>> +
>> +Multiple authorship
>> +~~~~~~~~~~~~~~~~~~~
>> +
>> +It is not uncommon for a patch to have contributions from multiple authors. In
>> +this scenario, git commits will usually be expected to have a ``Signed-off-by``
>> +line for each contributor involved in creation of the patch. Some edge cases:
>> +
>> +  * The non-primary author's contributions were so trivial that they can be
>> +    considered not subject to copyright. In this case the secondary authors
>> +    need not include a ``Signed-off-by``.
>> +
>> +    This case most commonly applies where QEMU reviewers give short snippets
>> +    of code as suggested fixes to a patch. The reviewers don't need to have
>> +    their own ``Signed-off-by`` added unless their code suggestion was
>> +    unusually large, but it is common to add ``Suggested-by`` as a credit
>> +    for non-trivial code.
>> +
>> +  * Both contributors work for the same employer and the employer requires
>> +    copyright assignment.
>> +
>> +    It can be said that in this case a ``Signed-off-by`` is indicating that
>> +    the person has permission to contribute from their employer who is the
>> +    copyright holder. It is nonetheless still preferable to include a
>> +    ``Signed-off-by`` for each contributor, as in some countries employees are
>> +    not able to assign copyright to their employer, and it also covers any
>> +    time invested outside working hours.
>> +
>> +When multiple ``Signed-off-by`` tags are present, they should be strictly kept
>> +in order of authorship, from oldest to newest.
>> +
>> +Other commit tags
>> +~~~~~~~~~~~~~~~~~
>> +
>> +While the ``Signed-off-by`` tag is mandatory, there are a number of other tags
>> +that are commonly used during QEMU development:
>> +
>> + * **``Reviewed-by``**: when a QEMU community member reviews a patch on the
>> +   mailing list, if they consider the patch acceptable, they should send an
>> +   email reply containing a ``Reviewed-by`` tag. Subsystem maintainers who
>> +   review a patch should add this even if they are also adding their
>> +   ``Signed-off-by`` to the same commit.
>> +
>> + * **``Acked-by``**: when a QEMU subsystem maintainer approves a patch that
>> +   touches their subsystem, but intends to allow a different maintainer to
>> +   queue it and send a pull request, they would send a mail containing a
>> +   ``Acked-by`` tag. Where a patch touches multiple subsystems, ``Acked-by``
>> +   only implies review of the maintainers' own areas of responsibility. If a
>> +   maintainer wants to indicate they have done a full review they should use
>> +   a ``Reviewed-by`` tag.
>> +
>> + * **``Tested-by``**: when a QEMU community member has functionally tested the
>> +   behaviour of the patch in some manner, they should send an email reply
>> +   containing a ``Tested-by`` tag.
>> +
>> + * **``Reported-by``**: when a QEMU community member reports a problem via the
>> +   mailing list, or some other informal channel that is not the issue tracker,
>> +   it is good practice to credit them by including a ``Reported-by`` tag on
>> +   any patch fixing the issue. When the problem is reported via the GitLab
>> +   issue tracker, however, it is sufficient to just include a link to the
>> +   issue.
>
> We don't mention the Link: or Message-Id: tags.

Yes, but should it go into code-provenance.rst or
submitting-a-patch.rst?

You asked for guidance on use of "Message-Id:" in your review of v2.  I
understand the practice, and can write guidance, but I wanted to get
this out before my vacation next week, so I left it for later, as
mentioned in the cover letter.

How do we use "Link:"?  What about "Closes:"?

Here's what the kernel's submitting-patches.rst has to say:

    Describe your changes
    ---------------------

    [...]

    If related discussions or any other background information behind the change
    can be found on the web, add 'Link:' tags pointing to it. If the patch is a
    result of some earlier mailing list discussions or something documented on the
    web, point to it.

    When linking to mailing list archives, preferably use the lore.kernel.org
    message archiver service. To create the link URL, use the contents of the
    ``Message-ID`` header of the message without the surrounding angle brackets.
    For example::

        Link: https://lore.kernel.org/30th.anniversary.repost@klaava.Helsinki.FI

    Please check the link to make sure that it is actually working and points
    to the relevant message.

    However, try to make your explanation understandable without external
    resources. In addition to giving a URL to a mailing list archive or bug,
    summarize the relevant points of the discussion that led to the
    patch as submitted.

    In case your patch fixes a bug, use the 'Closes:' tag with a URL referencing
    the report in the mailing list archives or a public bug tracker. For example::

            Closes: https://example.com/issues/1234

    Some bug trackers have the ability to close issues automatically when a
    commit with such a tag is applied. Some bots monitoring mailing lists can
    also track such tags and take certain actions. Private bug trackers and
    invalid URLs are forbidden.

and

    Using Reported-by:, Tested-by:, Reviewed-by:, Suggested-by: and Fixes:
    ----------------------------------------------------------------------

    The Reported-by tag gives credit to people who find bugs and report them and it
    hopefully inspires them to help us again in the future. The tag is intended for
    bugs; please do not use it to credit feature requests. The tag should be
    followed by a Closes: tag pointing to the report, unless the report is not
    available on the web. The Link: tag can be used instead of Closes: if the patch
    fixes a part of the issue(s) being reported. Note, the Reported-by tag is one
    of only three tags you might be able to use without explicit permission of the
    person named (see 'Tagging people requires permission' below for details).


>> +
>> + * **``Suggested-by``**: when a reviewer or other 3rd party makes non-trivial
>> +   suggestions for how to change a patch, it is good practice to credit them
>> +   by including a ``Suggested-by`` tag.
>> +
>> +Subsystem maintainer requirements
>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> +
>> +When a subsystem maintainer accepts a patch from a contributor, in addition to
>> +the normal code review points, they are expected to validate the presence of
>> +suitable ``Signed-off-by`` tags.
>> +
>> +At the time they queue the patch in their subsystem tree, the maintainer
>> +**must** also then add their own ``Signed-off-by`` to indicate that they have
>> +done the aforementioned validation. This is in addition to any of their own
>> +``Reviewed-by`` tags the subsystem maintainer may wish to include.
>> +
>> +Tools for adding ``Signed-off-by``
>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> +
>> +There are a variety of ways tools can support adding ``Signed-off-by`` tags
>> +for patches, avoiding the need for contributors to manually type in this
>> +repetitive text each time.
>> +
>> +git commands
>> +^^^^^^^^^^^^
>> +
>> +When creating, or amending, a commit the ``-s`` flag to ``git commit`` will
>> +append a suitable line matching the configured git author details.
>> +
>> +If preparing patches using the ``git format-patch`` tool, the ``-s`` flag can
>> +be used to append a suitable line in the emails it creates, without modifying
>> +the local commits. Alternatively to modify all the local commits on a branch::
>> +
>> +  git rebase master -x 'git commit --amend --no-edit -s'
>> +
>
> Much as I love Emacs I wonder if this next section is worth it given the
> multiple ways you can solve this (I use yas-snippet expansions for
> example).

Showing one of them could still be useful for less experienced Emacs
users.  We could mention it's just of many ways.

> If we do want to mention the editors we should probably also mention b4.

Can do if somebody contributes a suitable configuration snippet.

Thanks!

[...]



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 3/3] docs: define policy forbidding use of AI code generators
  2025-06-04  6:17     ` Markus Armbruster
@ 2025-06-04  7:15       ` Daniel P. Berrangé
  2025-06-04  7:54         ` Philippe Mathieu-Daudé
  2025-06-04  8:58         ` Markus Armbruster
  0 siblings, 2 replies; 29+ messages in thread
From: Daniel P. Berrangé @ 2025-06-04  7:15 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Stefan Hajnoczi, qemu-devel, Thomas Huth, Alex Bennée,
	Michael S . Tsirkin, Gerd Hoffmann, Mark Cave-Ayland,
	Philippe Mathieu-Daudé, Kevin Wolf, Stefan Hajnoczi,
	Alexander Graf, Paolo Bonzini, Richard Henderson, Peter Maydell

On Wed, Jun 04, 2025 at 08:17:27AM +0200, Markus Armbruster wrote:
> Stefan Hajnoczi <stefanha@gmail.com> writes:
> 
> > On Tue, Jun 3, 2025 at 10:25 AM Markus Armbruster <armbru@redhat.com> wrote:
> >>
> >> From: Daniel P. Berrangé <berrange@redhat.com>
 >> +
> >> +The increasing prevalence of AI code generators, most notably but not limited
> >
> > More detail is needed on what an "AI code generator" is. Coding
> > assistant tools range from autocompletion to linters to automatic code
> > generators. In addition there are other AI-related tools like ChatGPT
> > or Gemini as a chatbot that can people use like Stackoverflow or an
> > API documentation summarizer.
> >
> > I think the intent is to say: do not put code that comes from _any_ AI
> > tool into QEMU.
> >
> > It would be okay to use AI to research APIs, algorithms, brainstorm
> > ideas, debug the code, analyze the code, etc but the actual code
> > changes must not be generated by AI.

The scope of the policy is around contributions we receive as
patches with SoB. Researching / brainstorming / analysis etc
are not contribution activities, so not covered by the policy
IMHO.

> 
> The existing text is about "AI code generators".  However, the "most
> notably LLMs" that follows it could lead readers to believe it's about
> more than just code generation, because LLMs are in fact used for more.
> I figure this is your concern.
> 
> We could instead start wide, then narrow the focus to code generation.
> Here's my try:
> 
>   The increasing prevalence of AI-assisted software development results
>   in a number of difficult legal questions and risks for software
>   projects, including QEMU.  Of particular concern is code generated by
>   `Large Language Models
>   <https://en.wikipedia.org/wiki/Large_language_model>`__ (LLMs).

Documentation we maintain has the same concerns as code.
So I'd suggest to substitute 'code' with 'code / content'.

> If we want to mention uses of AI we consider okay, I'd do so further
> down, to not distract from the main point here.  Perhaps:
> 
>   The QEMU project thus requires that contributors refrain from using AI code
>   generators on patches intended to be submitted to the project, and will
>   decline any contribution if use of AI is either known or suspected.
> 
>   This policy does not apply to other uses of AI, such as researching APIs or
>   algorithms, static analysis, or debugging.
> 
>   Examples of tools impacted by this policy includes both GitHub's CoPilot,
>   OpenAI's ChatGPT, and Meta's Code Llama, amongst many others which are less
>   well known.
> 
> The paragraph in the middle is new, the other two are unchanged.
> 
> Thoughts?

IMHO its redundant, as the policy is expressly around contribution of
code/content, and those activities as not contribution related, so
outside the scope already.

> 
> >> +to, `Large Language Models <https://en.wikipedia.org/wiki/Large_language_model>`__
> >> +(LLMs) results in a number of difficult legal questions and risks for software
> >> +projects, including QEMU.
> 
> Thanks!
> 
> [...]
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 1/3] docs: introduce dedicated page about code provenance / sign-off
  2025-06-04  6:44     ` Markus Armbruster
@ 2025-06-04  7:18       ` Daniel P. Berrangé
  2025-06-04  7:46       ` Philippe Mathieu-Daudé
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 29+ messages in thread
From: Daniel P. Berrangé @ 2025-06-04  7:18 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Alex Bennée, qemu-devel, Thomas Huth, Michael S . Tsirkin,
	Gerd Hoffmann, Mark Cave-Ayland, Philippe Mathieu-Daudé,
	Kevin Wolf, Stefan Hajnoczi, Alexander Graf, Paolo Bonzini,
	Richard Henderson, Peter Maydell

On Wed, Jun 04, 2025 at 08:44:55AM +0200, Markus Armbruster wrote:
> Alex Bennée <alex.bennee@linaro.org> writes:
> 
> > Markus Armbruster <armbru@redhat.com> writes:
> >
> >> From: Daniel P. Berrangé <berrange@redhat.com>
> >>
> >> Currently we have a short paragraph saying that patches must include
> >> a Signed-off-by line, and merely link to the kernel documentation.
> >> The linked kernel docs have a lot of content beyond the part about
> >> sign-off an thus are misleading/distracting to QEMU contributors.
> >>
> >> This introduces a dedicated 'code-provenance' page in QEMU talking
> >> about why we require sign-off, explaining the other tags we commonly
> >> use, and what to do in some edge cases.
> >>
> >> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> >> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
> >> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> >> ---
> >>  docs/devel/code-provenance.rst    | 218 ++++++++++++++++++++++++++++++
> >>  docs/devel/index-process.rst      |   1 +
> >>  docs/devel/submitting-a-patch.rst |  18 +--
> >>  3 files changed, 221 insertions(+), 16 deletions(-)
> >>  create mode 100644 docs/devel/code-provenance.rst
> >>
> >> diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
> >> new file mode 100644
> >> index 0000000000..4fc12061b5
> >> --- /dev/null
> >> +++ b/docs/devel/code-provenance.rst

> >> +Other commit tags
> >> +~~~~~~~~~~~~~~~~~
> >> +
> >> +While the ``Signed-off-by`` tag is mandatory, there are a number of other tags
> >> +that are commonly used during QEMU development:
> >> +
> >> + * **``Reviewed-by``**: when a QEMU community member reviews a patch on the
> >> +   mailing list, if they consider the patch acceptable, they should send an
> >> +   email reply containing a ``Reviewed-by`` tag. Subsystem maintainers who
> >> +   review a patch should add this even if they are also adding their
> >> +   ``Signed-off-by`` to the same commit.
> >> +
> >> + * **``Acked-by``**: when a QEMU subsystem maintainer approves a patch that
> >> +   touches their subsystem, but intends to allow a different maintainer to
> >> +   queue it and send a pull request, they would send a mail containing a
> >> +   ``Acked-by`` tag. Where a patch touches multiple subsystems, ``Acked-by``
> >> +   only implies review of the maintainers' own areas of responsibility. If a
> >> +   maintainer wants to indicate they have done a full review they should use
> >> +   a ``Reviewed-by`` tag.
> >> +
> >> + * **``Tested-by``**: when a QEMU community member has functionally tested the
> >> +   behaviour of the patch in some manner, they should send an email reply
> >> +   containing a ``Tested-by`` tag.
> >> +
> >> + * **``Reported-by``**: when a QEMU community member reports a problem via the
> >> +   mailing list, or some other informal channel that is not the issue tracker,
> >> +   it is good practice to credit them by including a ``Reported-by`` tag on
> >> +   any patch fixing the issue. When the problem is reported via the GitLab
> >> +   issue tracker, however, it is sufficient to just include a link to the
> >> +   issue.
> >
> > We don't mention the Link: or Message-Id: tags.
> 
> Yes, but should it go into code-provenance.rst or
> submitting-a-patch.rst?

I considered those other general tags to be under the scope
of submitting-a-patch.rst, as they're not directly related
to the legal code provenance.

> 
> You asked for guidance on use of "Message-Id:" in your review of v2.  I
> understand the practice, and can write guidance, but I wanted to get
> this out before my vacation next week, so I left it for later, as
> mentioned in the cover letter.
> 
> How do we use "Link:"?  What about "Closes:"?
> 
> Here's what the kernel's submitting-patches.rst has to say:
> 
>     Describe your changes
>     ---------------------
> 
>     [...]
> 
>     If related discussions or any other background information behind the change
>     can be found on the web, add 'Link:' tags pointing to it. If the patch is a
>     result of some earlier mailing list discussions or something documented on the
>     web, point to it.
> 
>     When linking to mailing list archives, preferably use the lore.kernel.org
>     message archiver service. To create the link URL, use the contents of the
>     ``Message-ID`` header of the message without the surrounding angle brackets.
>     For example::
> 
>         Link: https://lore.kernel.org/30th.anniversary.repost@klaava.Helsinki.FI
> 
>     Please check the link to make sure that it is actually working and points
>     to the relevant message.
> 
>     However, try to make your explanation understandable without external
>     resources. In addition to giving a URL to a mailing list archive or bug,
>     summarize the relevant points of the discussion that led to the
>     patch as submitted.
> 
>     In case your patch fixes a bug, use the 'Closes:' tag with a URL referencing
>     the report in the mailing list archives or a public bug tracker. For example::
> 
>             Closes: https://example.com/issues/1234
> 
>     Some bug trackers have the ability to close issues automatically when a
>     commit with such a tag is applied. Some bots monitoring mailing lists can
>     also track such tags and take certain actions. Private bug trackers and
>     invalid URLs are forbidden.
> 
> and
> 
>     Using Reported-by:, Tested-by:, Reviewed-by:, Suggested-by: and Fixes:
>     ----------------------------------------------------------------------
> 
>     The Reported-by tag gives credit to people who find bugs and report them and it
>     hopefully inspires them to help us again in the future. The tag is intended for
>     bugs; please do not use it to credit feature requests. The tag should be
>     followed by a Closes: tag pointing to the report, unless the report is not
>     available on the web. The Link: tag can be used instead of Closes: if the patch
>     fixes a part of the issue(s) being reported. Note, the Reported-by tag is one
>     of only three tags you might be able to use without explicit permission of the
>     person named (see 'Tagging people requires permission' below for details).
> 

> >> +git commands
> >> +^^^^^^^^^^^^
> >> +
> >> +When creating, or amending, a commit the ``-s`` flag to ``git commit`` will
> >> +append a suitable line matching the configured git author details.
> >> +
> >> +If preparing patches using the ``git format-patch`` tool, the ``-s`` flag can
> >> +be used to append a suitable line in the emails it creates, without modifying
> >> +the local commits. Alternatively to modify all the local commits on a branch::
> >> +
> >> +  git rebase master -x 'git commit --amend --no-edit -s'
> >> +
> >
> > Much as I love Emacs I wonder if this next section is worth it given the
> > multiple ways you can solve this (I use yas-snippet expansions for
> > example).
> 
> Showing one of them could still be useful for less experienced Emacs
> users.  We could mention it's just of many ways.

Yep, IMHO it is worth guiding users to a simple example that works.
If they are advanced users of emacs or other editors wanting to figure
out other options they can ignore this guidance.

> 
> > If we do want to mention the editors we should probably also mention b4.
> 
> Can do if somebody contributes a suitable configuration snippet.
> 
> Thanks!
> 
> [...]
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 1/3] docs: introduce dedicated page about code provenance / sign-off
  2025-06-04  6:44     ` Markus Armbruster
  2025-06-04  7:18       ` Daniel P. Berrangé
@ 2025-06-04  7:46       ` Philippe Mathieu-Daudé
  2025-06-04  8:52         ` Markus Armbruster
  2025-06-04  7:58       ` Gerd Hoffmann
  2025-06-05 14:52       ` Markus Armbruster
  3 siblings, 1 reply; 29+ messages in thread
From: Philippe Mathieu-Daudé @ 2025-06-04  7:46 UTC (permalink / raw)
  To: Markus Armbruster, Alex Bennée
  Cc: qemu-devel, Daniel P.Berrangé, Thomas Huth,
	Michael S . Tsirkin, Gerd Hoffmann, Mark Cave-Ayland, Kevin Wolf,
	Stefan Hajnoczi, Alexander Graf, Paolo Bonzini, Richard Henderson,
	Peter Maydell

On 4/6/25 08:44, Markus Armbruster wrote:
> Alex Bennée <alex.bennee@linaro.org> writes:
> 
>> Markus Armbruster <armbru@redhat.com> writes:
>>
>>> From: Daniel P. Berrangé <berrange@redhat.com>
>>>
>>> Currently we have a short paragraph saying that patches must include
>>> a Signed-off-by line, and merely link to the kernel documentation.
>>> The linked kernel docs have a lot of content beyond the part about
>>> sign-off an thus are misleading/distracting to QEMU contributors.
>>>
>>> This introduces a dedicated 'code-provenance' page in QEMU talking
>>> about why we require sign-off, explaining the other tags we commonly
>>> use, and what to do in some edge cases.
>>>
>>> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
>>> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
>>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>>> ---
>>>   docs/devel/code-provenance.rst    | 218 ++++++++++++++++++++++++++++++
>>>   docs/devel/index-process.rst      |   1 +
>>>   docs/devel/submitting-a-patch.rst |  18 +--
>>>   3 files changed, 221 insertions(+), 16 deletions(-)
>>>   create mode 100644 docs/devel/code-provenance.rst


>>> +
>>> +   Signed-off-by: YOUR NAME <YOUR@EMAIL>
>>> +
>>> +using a known identity (sorry, no anonymous contributions.)
>>> +
>>
>> maybe "(contributions cannot be anonymous)" is more direct?
> 
> If we're deviating from the kernel's text (which is *fine*), let's get
> rid of the parenthesis:
> 
>      using a known identity.  Contributions cannot be anonymous.
> 
> or in active voice:
> 
>      using a known identity.  We cannot accept anonymous contributions.

I'd add an anchor in the "commonly known identity" paragraph added in
commit 270c81b7d59 and here link to it.

> 
> I like this one the best.



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 3/3] docs: define policy forbidding use of AI code generators
  2025-06-04  7:15       ` Daniel P. Berrangé
@ 2025-06-04  7:54         ` Philippe Mathieu-Daudé
  2025-06-04  8:40           ` Daniel P. Berrangé
  2025-06-04  9:04           ` Markus Armbruster
  2025-06-04  8:58         ` Markus Armbruster
  1 sibling, 2 replies; 29+ messages in thread
From: Philippe Mathieu-Daudé @ 2025-06-04  7:54 UTC (permalink / raw)
  To: Daniel P. Berrangé, Markus Armbruster
  Cc: Stefan Hajnoczi, qemu-devel, Thomas Huth, Alex Bennée,
	Michael S . Tsirkin, Gerd Hoffmann, Mark Cave-Ayland, Kevin Wolf,
	Stefan Hajnoczi, Alexander Graf, Paolo Bonzini, Richard Henderson,
	Peter Maydell, Pierrick Bouvier

On 4/6/25 09:15, Daniel P. Berrangé wrote:
> On Wed, Jun 04, 2025 at 08:17:27AM +0200, Markus Armbruster wrote:
>> Stefan Hajnoczi <stefanha@gmail.com> writes:
>>
>>> On Tue, Jun 3, 2025 at 10:25 AM Markus Armbruster <armbru@redhat.com> wrote:
>>>>
>>>> From: Daniel P. Berrangé <berrange@redhat.com>
>   >> +
>>>> +The increasing prevalence of AI code generators, most notably but not limited
>>>
>>> More detail is needed on what an "AI code generator" is. Coding
>>> assistant tools range from autocompletion to linters to automatic code
>>> generators. In addition there are other AI-related tools like ChatGPT
>>> or Gemini as a chatbot that can people use like Stackoverflow or an
>>> API documentation summarizer.
>>>
>>> I think the intent is to say: do not put code that comes from _any_ AI
>>> tool into QEMU.
>>>
>>> It would be okay to use AI to research APIs, algorithms, brainstorm
>>> ideas, debug the code, analyze the code, etc but the actual code
>>> changes must not be generated by AI.
> 
> The scope of the policy is around contributions we receive as
> patches with SoB. Researching / brainstorming / analysis etc
> are not contribution activities, so not covered by the policy
> IMHO.
> 
>>
>> The existing text is about "AI code generators".  However, the "most
>> notably LLMs" that follows it could lead readers to believe it's about
>> more than just code generation, because LLMs are in fact used for more.
>> I figure this is your concern.
>>
>> We could instead start wide, then narrow the focus to code generation.
>> Here's my try:
>>
>>    The increasing prevalence of AI-assisted software development results
>>    in a number of difficult legal questions and risks for software
>>    projects, including QEMU.  Of particular concern is code generated by
>>    `Large Language Models
>>    <https://en.wikipedia.org/wiki/Large_language_model>`__ (LLMs).
> 
> Documentation we maintain has the same concerns as code.
> So I'd suggest to substitute 'code' with 'code / content'.

Why couldn't we accept documentation patches improved using LLM?

As a non-native English speaker being often stuck trying to describe
function APIs, I'm very tempted to use a LLM to review my sentences
and make them better understandable.

>> If we want to mention uses of AI we consider okay, I'd do so further
>> down, to not distract from the main point here.  Perhaps:
>>
>>    The QEMU project thus requires that contributors refrain from using AI code
>>    generators on patches intended to be submitted to the project, and will
>>    decline any contribution if use of AI is either known or suspected.
>>
>>    This policy does not apply to other uses of AI, such as researching APIs or
>>    algorithms, static analysis, or debugging.
>>
>>    Examples of tools impacted by this policy includes both GitHub's CoPilot,
>>    OpenAI's ChatGPT, and Meta's Code Llama, amongst many others which are less
>>    well known.
>>
>> The paragraph in the middle is new, the other two are unchanged.
>>
>> Thoughts?
> 
> IMHO its redundant, as the policy is expressly around contribution of
> code/content, and those activities as not contribution related, so
> outside the scope already.
> 
>>
>>>> +to, `Large Language Models <https://en.wikipedia.org/wiki/Large_language_model>`__
>>>> +(LLMs) results in a number of difficult legal questions and risks for software
>>>> +projects, including QEMU.
>>
>> Thanks!
>>
>> [...]
>>
> 
> With regards,
> Daniel



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 1/3] docs: introduce dedicated page about code provenance / sign-off
  2025-06-04  6:44     ` Markus Armbruster
  2025-06-04  7:18       ` Daniel P. Berrangé
  2025-06-04  7:46       ` Philippe Mathieu-Daudé
@ 2025-06-04  7:58       ` Gerd Hoffmann
  2025-06-05 14:52       ` Markus Armbruster
  3 siblings, 0 replies; 29+ messages in thread
From: Gerd Hoffmann @ 2025-06-04  7:58 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Alex Bennée, qemu-devel, Daniel P . Berrangé,
	Thomas Huth, Michael S . Tsirkin, Mark Cave-Ayland,
	Philippe Mathieu-Daudé, Kevin Wolf, Stefan Hajnoczi,
	Alexander Graf, Paolo Bonzini, Richard Henderson, Peter Maydell

  Hi,

> > If we do want to mention the editors we should probably also mention b4.
> 
> Can do if somebody contributes a suitable configuration snippet.

Nothing to configure ;)

Simplest usage is 'b4 shazam $msgid' and b4 will go fetch the complete
thread from lore.kernel.org, collect all the review tags from the
replies, add them to the patches and apply the whole series to the
current branch.

You can also ask b4 to generate a mbox file you can feed to 'git am'
yourself (this is 'b4 am $msgid'), which can be useful if you want build
your maintainer scripting workflow around it.

take care,
  Gerd

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 3/3] docs: define policy forbidding use of AI code generators
  2025-06-04  7:54         ` Philippe Mathieu-Daudé
@ 2025-06-04  8:40           ` Daniel P. Berrangé
  2025-06-04  9:19             ` Philippe Mathieu-Daudé
  2025-06-04  9:04           ` Markus Armbruster
  1 sibling, 1 reply; 29+ messages in thread
From: Daniel P. Berrangé @ 2025-06-04  8:40 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: Markus Armbruster, Stefan Hajnoczi, qemu-devel, Thomas Huth,
	Alex Bennée, Michael S . Tsirkin, Gerd Hoffmann,
	Mark Cave-Ayland, Kevin Wolf, Stefan Hajnoczi, Alexander Graf,
	Paolo Bonzini, Richard Henderson, Peter Maydell, Pierrick Bouvier

On Wed, Jun 04, 2025 at 09:54:33AM +0200, Philippe Mathieu-Daudé wrote:
> On 4/6/25 09:15, Daniel P. Berrangé wrote:
> > On Wed, Jun 04, 2025 at 08:17:27AM +0200, Markus Armbruster wrote:
> > > Stefan Hajnoczi <stefanha@gmail.com> writes:
> > > 
> > > > On Tue, Jun 3, 2025 at 10:25 AM Markus Armbruster <armbru@redhat.com> wrote:
> > > > > 
> > > > > From: Daniel P. Berrangé <berrange@redhat.com>
> >   >> +
> > > > > +The increasing prevalence of AI code generators, most notably but not limited
> > > > 
> > > > More detail is needed on what an "AI code generator" is. Coding
> > > > assistant tools range from autocompletion to linters to automatic code
> > > > generators. In addition there are other AI-related tools like ChatGPT
> > > > or Gemini as a chatbot that can people use like Stackoverflow or an
> > > > API documentation summarizer.
> > > > 
> > > > I think the intent is to say: do not put code that comes from _any_ AI
> > > > tool into QEMU.
> > > > 
> > > > It would be okay to use AI to research APIs, algorithms, brainstorm
> > > > ideas, debug the code, analyze the code, etc but the actual code
> > > > changes must not be generated by AI.
> > 
> > The scope of the policy is around contributions we receive as
> > patches with SoB. Researching / brainstorming / analysis etc
> > are not contribution activities, so not covered by the policy
> > IMHO.
> > 
> > > 
> > > The existing text is about "AI code generators".  However, the "most
> > > notably LLMs" that follows it could lead readers to believe it's about
> > > more than just code generation, because LLMs are in fact used for more.
> > > I figure this is your concern.
> > > 
> > > We could instead start wide, then narrow the focus to code generation.
> > > Here's my try:
> > > 
> > >    The increasing prevalence of AI-assisted software development results
> > >    in a number of difficult legal questions and risks for software
> > >    projects, including QEMU.  Of particular concern is code generated by
> > >    `Large Language Models
> > >    <https://en.wikipedia.org/wiki/Large_language_model>`__ (LLMs).
> > 
> > Documentation we maintain has the same concerns as code.
> > So I'd suggest to substitute 'code' with 'code / content'.
> 
> Why couldn't we accept documentation patches improved using LLM?

I would flip it around and ask why would documentation not be held
to the same standard as code, when it comes to licensing and legal
compliance ?

This is all copyright content that we merge & distribute under the
same QEMU licensing terms, and we have the same legal obligations
whether it is "source code" or "documentation" or other content
that is not traditional "source code" (images for example).

> As a non-native English speaker being often stuck trying to describe
> function APIs, I'm very tempted to use a LLM to review my sentences
> and make them better understandable.

I can understand that desire, and it is an admittedly tricky situation
and tradeoff for which I don't have a great answer.

As a starting point we (as reviewers/maintainers) must be broadly
very tolerant & accepting of content that is not perfect English,
because we know many (probably even the majority of) contributors
won't have English as their first language.

As a reviewer I don't mind imperfect language in submissions. Even
if language is not perfect it is at least a direct expression of
the author's understanding and thus we can have a level of trust
in the docs based on our community experience with the contributor.

If docs have been altered in any significant manner by an LLM,
even if they are linguistically improved, IMHO, knowing that use
of LLM would reduce my personal trust in the technically accuracy
of the contribution.

This is straying into the debate around the accuracy of LLMs though,
which is interesting, but tangential from the purpose of this policy
which aims to focus on the code provenance / legal side. 

So, back on track, a important point is that this policy (& the
legal concerns/risks it attempts to address) are implicitly
around contributions that can be considered copyrightable.

Some so called "trivial" work can be so simplistic as to not meet
the threshold for copyright protection, and it is thus easy for the
DCO requirements to be satisfied.

As a person, when you write the API documentation from scratch,
your output would generally be considered to be copyrightable
contribution by the author.

When a reviewer then suggests changes to your docs, most of the
time those changes are so trivial, that the reviewer wouldn't be
claiming copyright over the resulting work.

If the reviewer completely rewrites entire sentences in the
docs though, though would be able to claim copyright over part
of the resulting work.

The tippping point between copyrightable/non-copyrightable is
hard to define in a policy. It is inherantly fuzzy, and somewhat
of a "you'll know it when you see it" or "lets debate it in court"
situation...

So back to LLMs.

If you ask the LLM (or an agent using an LLM) to entirely write
the API docs from scratch, I think that should be expected to
fall under this proposed contribution policy in general.

If you write the API docs yourself and ask the LLM to review and
suggest improvements, that MAY or MAY NOT fall under this policy.

If the LLM suggested tweaks were minor enough to be considered
not to meet the threshold to be copyrightable it would be fine,
this is little different to a human reviewer suggesting tweaks.

If the LLM suggested large scale rewriting that would be harder
to draw the line, but would tend towards falling under this
contribution policy.

So it depends on the scope of what the LLM suggested as a change
to your docs.

IOW, LLM-as-sparkling-auto-correct is probably OK, but
LLM-as-book-editor / LLM-as-ghost-writer is probably NOT OK

This is a scenario where the QEMU contributor has to use their
personal judgement as to whether their use of LLM in a docs context
is compliant with this policy, or not. I don't think we should try
to describe this in the policy given how fuzzy the situation is.

NB, this copyrightable/non-copyrightable situation applies to source
code too, not just docs.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 1/3] docs: introduce dedicated page about code provenance / sign-off
  2025-06-04  7:46       ` Philippe Mathieu-Daudé
@ 2025-06-04  8:52         ` Markus Armbruster
  2025-06-05  9:04           ` Markus Armbruster
  0 siblings, 1 reply; 29+ messages in thread
From: Markus Armbruster @ 2025-06-04  8:52 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: Alex Bennée, qemu-devel, Daniel P.Berrangé, Thomas Huth,
	Michael S . Tsirkin, Gerd Hoffmann, Mark Cave-Ayland, Kevin Wolf,
	Stefan Hajnoczi, Alexander Graf, Paolo Bonzini, Richard Henderson,
	Peter Maydell

Philippe Mathieu-Daudé <philmd@linaro.org> writes:

> On 4/6/25 08:44, Markus Armbruster wrote:
>> Alex Bennée <alex.bennee@linaro.org> writes:
>> 
>>> Markus Armbruster <armbru@redhat.com> writes:
>>>
>>>> From: Daniel P. Berrangé <berrange@redhat.com>
>>>>
>>>> Currently we have a short paragraph saying that patches must include
>>>> a Signed-off-by line, and merely link to the kernel documentation.
>>>> The linked kernel docs have a lot of content beyond the part about
>>>> sign-off an thus are misleading/distracting to QEMU contributors.
>>>>
>>>> This introduces a dedicated 'code-provenance' page in QEMU talking
>>>> about why we require sign-off, explaining the other tags we commonly
>>>> use, and what to do in some edge cases.
>>>>
>>>> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
>>>> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
>>>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>>>> ---
>>>>   docs/devel/code-provenance.rst    | 218 ++++++++++++++++++++++++++++++
>>>>   docs/devel/index-process.rst      |   1 +
>>>>   docs/devel/submitting-a-patch.rst |  18 +--
>>>>   3 files changed, 221 insertions(+), 16 deletions(-)
>>>>   create mode 100644 docs/devel/code-provenance.rst
>
>
>>>> +
>>>> +   Signed-off-by: YOUR NAME <YOUR@EMAIL>
>>>> +
>>>> +using a known identity (sorry, no anonymous contributions.)
>>>> +
>>>
>>> maybe "(contributions cannot be anonymous)" is more direct?
>> If we're deviating from the kernel's text (which is *fine*), let's get
>> rid of the parenthesis:
>>      using a known identity.  Contributions cannot be anonymous.
>> or in active voice:
>>      using a known identity.  We cannot accept anonymous contributions.
>
> I'd add an anchor in the "commonly known identity" paragraph added in
> commit 270c81b7d59 and here link to it.

Makes sense, thanks!

>> I like this one the best.



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 3/3] docs: define policy forbidding use of AI code generators
  2025-06-04  7:15       ` Daniel P. Berrangé
  2025-06-04  7:54         ` Philippe Mathieu-Daudé
@ 2025-06-04  8:58         ` Markus Armbruster
  2025-06-04  9:22           ` Daniel P. Berrangé
  1 sibling, 1 reply; 29+ messages in thread
From: Markus Armbruster @ 2025-06-04  8:58 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Stefan Hajnoczi, qemu-devel, Thomas Huth, Alex Bennée,
	Michael S . Tsirkin, Gerd Hoffmann, Mark Cave-Ayland,
	Philippe Mathieu-Daudé, Kevin Wolf, Stefan Hajnoczi,
	Alexander Graf, Paolo Bonzini, Richard Henderson, Peter Maydell

Daniel P. Berrangé <berrange@redhat.com> writes:

> On Wed, Jun 04, 2025 at 08:17:27AM +0200, Markus Armbruster wrote:
>> Stefan Hajnoczi <stefanha@gmail.com> writes:
>> 
>> > On Tue, Jun 3, 2025 at 10:25 AM Markus Armbruster <armbru@redhat.com> wrote:
>> >>
>> >> From: Daniel P. Berrangé <berrange@redhat.com>
>  >> +
>> >> +The increasing prevalence of AI code generators, most notably but not limited
>> >
>> > More detail is needed on what an "AI code generator" is. Coding
>> > assistant tools range from autocompletion to linters to automatic code
>> > generators. In addition there are other AI-related tools like ChatGPT
>> > or Gemini as a chatbot that can people use like Stackoverflow or an
>> > API documentation summarizer.
>> >
>> > I think the intent is to say: do not put code that comes from _any_ AI
>> > tool into QEMU.
>> >
>> > It would be okay to use AI to research APIs, algorithms, brainstorm
>> > ideas, debug the code, analyze the code, etc but the actual code
>> > changes must not be generated by AI.
>
> The scope of the policy is around contributions we receive as
> patches with SoB. Researching / brainstorming / analysis etc
> are not contribution activities, so not covered by the policy
> IMHO.

Yes.  More below.

>> The existing text is about "AI code generators".  However, the "most
>> notably LLMs" that follows it could lead readers to believe it's about
>> more than just code generation, because LLMs are in fact used for more.
>> I figure this is your concern.
>> 
>> We could instead start wide, then narrow the focus to code generation.
>> Here's my try:
>> 
>>   The increasing prevalence of AI-assisted software development results
>>   in a number of difficult legal questions and risks for software
>>   projects, including QEMU.  Of particular concern is code generated by
>>   `Large Language Models
>>   <https://en.wikipedia.org/wiki/Large_language_model>`__ (LLMs).
>
> Documentation we maintain has the same concerns as code.
> So I'd suggest to substitute 'code' with 'code / content'.

Makes sense, thanks!

>> If we want to mention uses of AI we consider okay, I'd do so further
>> down, to not distract from the main point here.  Perhaps:
>> 
>>   The QEMU project thus requires that contributors refrain from using AI code
>>   generators on patches intended to be submitted to the project, and will
>>   decline any contribution if use of AI is either known or suspected.
>> 
>>   This policy does not apply to other uses of AI, such as researching APIs or
>>   algorithms, static analysis, or debugging.
>> 
>>   Examples of tools impacted by this policy includes both GitHub's CoPilot,
>>   OpenAI's ChatGPT, and Meta's Code Llama, amongst many others which are less
>>   well known.
>> 
>> The paragraph in the middle is new, the other two are unchanged.
>> 
>> Thoughts?
>
> IMHO its redundant, as the policy is expressly around contribution of
> code/content, and those activities as not contribution related, so
> outside the scope already.

The very first paragraph in this file already set the scope: "provenance
of patch submissions [...] to the project", so you have a point here.
But does repeating the scope here hurt or help?

>> >> +to, `Large Language Models <https://en.wikipedia.org/wiki/Large_language_model>`__
>> >> +(LLMs) results in a number of difficult legal questions and risks for software
>> >> +projects, including QEMU.
>> 
>> Thanks!
>> 
>> [...]
>> 
>
> With regards,
> Daniel



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 3/3] docs: define policy forbidding use of AI code generators
  2025-06-04  7:54         ` Philippe Mathieu-Daudé
  2025-06-04  8:40           ` Daniel P. Berrangé
@ 2025-06-04  9:04           ` Markus Armbruster
  1 sibling, 0 replies; 29+ messages in thread
From: Markus Armbruster @ 2025-06-04  9:04 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: Daniel P. Berrangé, Stefan Hajnoczi, qemu-devel, Thomas Huth,
	Alex Bennée, Michael S . Tsirkin, Gerd Hoffmann,
	Mark Cave-Ayland, Kevin Wolf, Stefan Hajnoczi, Alexander Graf,
	Paolo Bonzini, Richard Henderson, Peter Maydell, Pierrick Bouvier

Philippe Mathieu-Daudé <philmd@linaro.org> writes:

> On 4/6/25 09:15, Daniel P. Berrangé wrote:
>> On Wed, Jun 04, 2025 at 08:17:27AM +0200, Markus Armbruster wrote:
>>> Stefan Hajnoczi <stefanha@gmail.com> writes:
>>>
>>>> On Tue, Jun 3, 2025 at 10:25 AM Markus Armbruster <armbru@redhat.com> wrote:
>>>>>
>>>>> From: Daniel P. Berrangé <berrange@redhat.com>
>>>>> +The increasing prevalence of AI code generators, most notably but not limited
>>>>
>>>> More detail is needed on what an "AI code generator" is. Coding
>>>> assistant tools range from autocompletion to linters to automatic code
>>>> generators. In addition there are other AI-related tools like ChatGPT
>>>> or Gemini as a chatbot that can people use like Stackoverflow or an
>>>> API documentation summarizer.
>>>>
>>>> I think the intent is to say: do not put code that comes from _any_ AI
>>>> tool into QEMU.
>>>>
>>>> It would be okay to use AI to research APIs, algorithms, brainstorm
>>>> ideas, debug the code, analyze the code, etc but the actual code
>>>> changes must not be generated by AI.
>> 
>> The scope of the policy is around contributions we receive as
>> patches with SoB. Researching / brainstorming / analysis etc
>> are not contribution activities, so not covered by the policy
>> IMHO.
>> 
>>>
>>> The existing text is about "AI code generators".  However, the "most
>>> notably LLMs" that follows it could lead readers to believe it's about
>>> more than just code generation, because LLMs are in fact used for more.
>>> I figure this is your concern.
>>>
>>> We could instead start wide, then narrow the focus to code generation.
>>> Here's my try:
>>>
>>>    The increasing prevalence of AI-assisted software development results
>>>    in a number of difficult legal questions and risks for software
>>>    projects, including QEMU.  Of particular concern is code generated by
>>>    `Large Language Models
>>>    <https://en.wikipedia.org/wiki/Large_language_model>`__ (LLMs).
>> 
>> Documentation we maintain has the same concerns as code.
>> So I'd suggest to substitute 'code' with 'code / content'.
>
> Why couldn't we accept documentation patches improved using LLM?
>
> As a non-native English speaker being often stuck trying to describe
> function APIs, I'm very tempted to use a LLM to review my sentences
> and make them better understandable.

I understand the temptation!  Unfortunately, the "legal questions and
risks" Daniel described apply to *any* kind of copyrightable material,
not just to code.

Quote:

    To satisfy the DCO, the patch contributor has to fully understand the
    copyright and license status of code they are contributing to QEMU. With AI
    code generators, the copyright and license status of the output is ill-defined
    with no generally accepted, settled legal foundation.

    Where the training material is known, it is common for it to include large
    volumes of material under restrictive licensing/copyright terms. Even where
    the training material is all known to be under open source licenses, it is
    likely to be under a variety of terms, not all of which will be compatible
    with QEMU's licensing requirements.

    How contributors could comply with DCO terms (b) or (c) for the output of AI
    code generators commonly available today is unclear.  The QEMU project is not
    willing or able to accept the legal risks of non-compliance.

[...]



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 3/3] docs: define policy forbidding use of AI code generators
  2025-06-03 18:25   ` Stefan Hajnoczi
  2025-06-04  6:17     ` Markus Armbruster
@ 2025-06-04  9:10     ` Daniel P. Berrangé
  2025-06-04 11:01       ` Stefan Hajnoczi
  1 sibling, 1 reply; 29+ messages in thread
From: Daniel P. Berrangé @ 2025-06-04  9:10 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Markus Armbruster, qemu-devel, Thomas Huth, Alex Bennée,
	Michael S . Tsirkin, Gerd Hoffmann, Mark Cave-Ayland,
	Philippe Mathieu-Daudé, Kevin Wolf, Stefan Hajnoczi,
	Alexander Graf, Paolo Bonzini, Richard Henderson, Peter Maydell

On Tue, Jun 03, 2025 at 02:25:42PM -0400, Stefan Hajnoczi wrote:
> On Tue, Jun 3, 2025 at 10:25 AM Markus Armbruster <armbru@redhat.com> wrote:
> >
> > From: Daniel P. Berrangé <berrange@redhat.com>
> >
> > There has been an explosion of interest in so called AI code
> > generators. Thus far though, this is has not been matched by a broadly
> > accepted legal interpretation of the licensing implications for code
> > generator outputs. While the vendors may claim there is no problem and
> > a free choice of license is possible, they have an inherent conflict
> > of interest in promoting this interpretation. More broadly there is,
> > as yet, no broad consensus on the licensing implications of code
> > generators trained on inputs under a wide variety of licenses
> >
> > The DCO requires contributors to assert they have the right to
> > contribute under the designated project license. Given the lack of
> > consensus on the licensing of AI code generator output, it is not
> > considered credible to assert compliance with the DCO clause (b) or (c)
> > where a patch includes such generated code.
> >
> > This patch thus defines a policy that the QEMU project will currently
> > not accept contributions where use of AI code generators is either
> > known, or suspected.
> >
> > These are early days of AI-assisted software development. The legal
> > questions will be resolved eventually. The tools will mature, and we
> > can expect some to become safely usable in free software projects.
> > The policy we set now must be for today, and be open to revision. It's
> > best to start strict and safe, then relax.
> >
> > Meanwhile requests for exceptions can also be considered on a case by
> > case basis.
> >
> > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > Acked-by: Stefan Hajnoczi <stefanha@gmail.com>
> > Reviewed-by: Kevin Wolf <kwolf@redhat.com>
> > Signed-off-by: Markus Armbruster <armbru@redhat.com>
> > ---
> >  docs/devel/code-provenance.rst | 50 +++++++++++++++++++++++++++++++++-
> >  1 file changed, 49 insertions(+), 1 deletion(-)
> >
> > diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
> > index c27d8fe649..261263cfba 100644
> > --- a/docs/devel/code-provenance.rst
> > +++ b/docs/devel/code-provenance.rst
> > @@ -270,4 +270,52 @@ boilerplate code template which is then filled in to produce the final patch.
> >  The output of such a tool would still be considered the "preferred format",
> >  since it is intended to be a foundation for further human authored changes.
> >  Such tools are acceptable to use, provided they follow a deterministic process
> > -and there is clearly defined copyright and licensing for their output.
> > +and there is clearly defined copyright and licensing for their output. Note
> > +in particular the caveats applying to AI code generators below.
> > +
> > +Use of AI code generators
> > +~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +TL;DR:
> > +
> > +  **Current QEMU project policy is to DECLINE any contributions which are
> > +  believed to include or derive from AI generated code. This includes ChatGPT,
> > +  CoPilot, Llama and similar tools**
> 
> GitHub spells it "Copilot".
> 
> Claude is very popular for coding at the moment and probably worth mentioning.
> 
> > +
> > +The increasing prevalence of AI code generators, most notably but not limited
> 
> More detail is needed on what an "AI code generator" is. Coding
> assistant tools range from autocompletion to linters to automatic code
> generators. In addition there are other AI-related tools like ChatGPT
> or Gemini as a chatbot that can people use like Stackoverflow or an
> API documentation summarizer.
> 
> I think the intent is to say: do not put code that comes from _any_ AI
> tool into QEMU.

Right, the intent is that any copyrightable portion of a commit must
not have come directly from an AI/LLM tool, or from an agent which
indirectly/internally uses an AI/LLM tool.

"code generator" is possibly a little overly specific, as this is really
about any type of tool which emits content that will make its way into
qemu.git, whether code or non-code content (docs, images, etc).

> It would be okay to use AI to research APIs, algorithms, brainstorm
> ideas, debug the code, analyze the code, etc but the actual code
> changes must not be generated by AI.

Mostly yes - there's a fuzzy boundary in the debug/analyze use cases,
if the tool is also suggesting code changes to fix issues.

If the scope of the suggested changes meets the threshold for being
(likely) copyrightable code, that would fall under the policy.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 3/3] docs: define policy forbidding use of AI code generators
  2025-06-04  8:40           ` Daniel P. Berrangé
@ 2025-06-04  9:19             ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 29+ messages in thread
From: Philippe Mathieu-Daudé @ 2025-06-04  9:19 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Markus Armbruster, Stefan Hajnoczi, qemu-devel, Thomas Huth,
	Alex Bennée, Michael S . Tsirkin, Gerd Hoffmann,
	Mark Cave-Ayland, Kevin Wolf, Stefan Hajnoczi, Alexander Graf,
	Paolo Bonzini, Richard Henderson, Peter Maydell, Pierrick Bouvier

On 4/6/25 10:40, Daniel P. Berrangé wrote:
> On Wed, Jun 04, 2025 at 09:54:33AM +0200, Philippe Mathieu-Daudé wrote:
>> On 4/6/25 09:15, Daniel P. Berrangé wrote:
>>> On Wed, Jun 04, 2025 at 08:17:27AM +0200, Markus Armbruster wrote:
>>>> Stefan Hajnoczi <stefanha@gmail.com> writes:
>>>>
>>>>> On Tue, Jun 3, 2025 at 10:25 AM Markus Armbruster <armbru@redhat.com> wrote:
>>>>>>
>>>>>> From: Daniel P. Berrangé <berrange@redhat.com>
>>>    >> +
>>>>>> +The increasing prevalence of AI code generators, most notably but not limited
>>>>>
>>>>> More detail is needed on what an "AI code generator" is. Coding
>>>>> assistant tools range from autocompletion to linters to automatic code
>>>>> generators. In addition there are other AI-related tools like ChatGPT
>>>>> or Gemini as a chatbot that can people use like Stackoverflow or an
>>>>> API documentation summarizer.
>>>>>
>>>>> I think the intent is to say: do not put code that comes from _any_ AI
>>>>> tool into QEMU.
>>>>>
>>>>> It would be okay to use AI to research APIs, algorithms, brainstorm
>>>>> ideas, debug the code, analyze the code, etc but the actual code
>>>>> changes must not be generated by AI.
>>>
>>> The scope of the policy is around contributions we receive as
>>> patches with SoB. Researching / brainstorming / analysis etc
>>> are not contribution activities, so not covered by the policy
>>> IMHO.
>>>
>>>>
>>>> The existing text is about "AI code generators".  However, the "most
>>>> notably LLMs" that follows it could lead readers to believe it's about
>>>> more than just code generation, because LLMs are in fact used for more.
>>>> I figure this is your concern.
>>>>
>>>> We could instead start wide, then narrow the focus to code generation.
>>>> Here's my try:
>>>>
>>>>     The increasing prevalence of AI-assisted software development results
>>>>     in a number of difficult legal questions and risks for software
>>>>     projects, including QEMU.  Of particular concern is code generated by
>>>>     `Large Language Models
>>>>     <https://en.wikipedia.org/wiki/Large_language_model>`__ (LLMs).
>>>
>>> Documentation we maintain has the same concerns as code.
>>> So I'd suggest to substitute 'code' with 'code / content'.
>>
>> Why couldn't we accept documentation patches improved using LLM?
> 
> I would flip it around and ask why would documentation not be held
> to the same standard as code, when it comes to licensing and legal
> compliance ?
> 
> This is all copyright content that we merge & distribute under the
> same QEMU licensing terms, and we have the same legal obligations
> whether it is "source code" or "documentation" or other content
> that is not traditional "source code" (images for example).
> 
> 
>> As a non-native English speaker being often stuck trying to describe
>> function APIs, I'm very tempted to use a LLM to review my sentences
>> and make them better understandable.
> 
> I can understand that desire, and it is an admittedly tricky situation
> and tradeoff for which I don't have a great answer.
> 
> As a starting point we (as reviewers/maintainers) must be broadly
> very tolerant & accepting of content that is not perfect English,
> because we know many (probably even the majority of) contributors
> won't have English as their first language.
> 
> As a reviewer I don't mind imperfect language in submissions. Even
> if language is not perfect it is at least a direct expression of
> the author's understanding and thus we can have a level of trust
> in the docs based on our community experience with the contributor.
> 
> If docs have been altered in any significant manner by an LLM,
> even if they are linguistically improved, IMHO, knowing that use
> of LLM would reduce my personal trust in the technically accuracy
> of the contribution.
> 
> This is straying into the debate around the accuracy of LLMs though,
> which is interesting, but tangential from the purpose of this policy
> which aims to focus on the code provenance / legal side.
> 
> 
> 
> So, back on track, a important point is that this policy (& the
> legal concerns/risks it attempts to address) are implicitly
> around contributions that can be considered copyrightable.
> 
> Some so called "trivial" work can be so simplistic as to not meet
> the threshold for copyright protection, and it is thus easy for the
> DCO requirements to be satisfied.
> 
> 
> As a person, when you write the API documentation from scratch,
> your output would generally be considered to be copyrightable
> contribution by the author.
> 
> When a reviewer then suggests changes to your docs, most of the
> time those changes are so trivial, that the reviewer wouldn't be
> claiming copyright over the resulting work.
> 
> If the reviewer completely rewrites entire sentences in the
> docs though, though would be able to claim copyright over part
> of the resulting work.
> 
> 
> The tippping point between copyrightable/non-copyrightable is
> hard to define in a policy. It is inherantly fuzzy, and somewhat
> of a "you'll know it when you see it" or "lets debate it in court"
> situation...
> 
> 
> So back to LLMs.
> 
> 
> If you ask the LLM (or an agent using an LLM) to entirely write
> the API docs from scratch, I think that should be expected to
> fall under this proposed contribution policy in general.
> 
> 
> If you write the API docs yourself and ask the LLM to review and
> suggest improvements, that MAY or MAY NOT fall under this policy.
> 
> If the LLM suggested tweaks were minor enough to be considered
> not to meet the threshold to be copyrightable it would be fine,
> this is little different to a human reviewer suggesting tweaks.

Good.

> If the LLM suggested large scale rewriting that would be harder
> to draw the line, but would tend towards falling under this
> contribution policy.
> 
> So it depends on the scope of what the LLM suggested as a change
> to your docs.
> 
> IOW, LLM-as-sparkling-auto-correct is probably OK, but
> LLM-as-book-editor / LLM-as-ghost-writer is probably NOT OK

OK.

> This is a scenario where the QEMU contributor has to use their
> personal judgement as to whether their use of LLM in a docs context
> is compliant with this policy, or not. I don't think we should try
> to describe this in the policy given how fuzzy the situation is.

Thank you very much for this detailed explanation!

> 
> NB, this copyrightable/non-copyrightable situation applies to source
> code too, not just docs.
> 
> With regards,
> Daniel



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 3/3] docs: define policy forbidding use of AI code generators
  2025-06-04  8:58         ` Markus Armbruster
@ 2025-06-04  9:22           ` Daniel P. Berrangé
  2025-06-04  9:40             ` Markus Armbruster
  2025-06-04 12:35             ` Yan Vugenfirer
  0 siblings, 2 replies; 29+ messages in thread
From: Daniel P. Berrangé @ 2025-06-04  9:22 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Stefan Hajnoczi, qemu-devel, Thomas Huth, Alex Bennée,
	Michael S . Tsirkin, Gerd Hoffmann, Mark Cave-Ayland,
	Philippe Mathieu-Daudé, Kevin Wolf, Stefan Hajnoczi,
	Alexander Graf, Paolo Bonzini, Richard Henderson, Peter Maydell

On Wed, Jun 04, 2025 at 10:58:38AM +0200, Markus Armbruster wrote:
> Daniel P. Berrangé <berrange@redhat.com> writes:
> 
> > On Wed, Jun 04, 2025 at 08:17:27AM +0200, Markus Armbruster wrote:
> >> Stefan Hajnoczi <stefanha@gmail.com> writes:
> >> 
> >> > On Tue, Jun 3, 2025 at 10:25 AM Markus Armbruster <armbru@redhat.com> wrote:
> >> >>
> >> >> From: Daniel P. Berrangé <berrange@redhat.com>
> >  >> +
> >> >> +The increasing prevalence of AI code generators, most notably but not limited
> >> >
> >> > More detail is needed on what an "AI code generator" is. Coding
> >> > assistant tools range from autocompletion to linters to automatic code
> >> > generators. In addition there are other AI-related tools like ChatGPT
> >> > or Gemini as a chatbot that can people use like Stackoverflow or an
> >> > API documentation summarizer.
> >> >
> >> > I think the intent is to say: do not put code that comes from _any_ AI
> >> > tool into QEMU.
> >> >
> >> > It would be okay to use AI to research APIs, algorithms, brainstorm
> >> > ideas, debug the code, analyze the code, etc but the actual code
> >> > changes must not be generated by AI.
> >
> > The scope of the policy is around contributions we receive as
> > patches with SoB. Researching / brainstorming / analysis etc
> > are not contribution activities, so not covered by the policy
> > IMHO.
> 
> Yes.  More below.
> 
> >> The existing text is about "AI code generators".  However, the "most
> >> notably LLMs" that follows it could lead readers to believe it's about
> >> more than just code generation, because LLMs are in fact used for more.
> >> I figure this is your concern.
> >> 
> >> We could instead start wide, then narrow the focus to code generation.
> >> Here's my try:
> >> 
> >>   The increasing prevalence of AI-assisted software development results
> >>   in a number of difficult legal questions and risks for software
> >>   projects, including QEMU.  Of particular concern is code generated by
> >>   `Large Language Models
> >>   <https://en.wikipedia.org/wiki/Large_language_model>`__ (LLMs).
> >
> > Documentation we maintain has the same concerns as code.
> > So I'd suggest to substitute 'code' with 'code / content'.
> 
> Makes sense, thanks!
> 
> >> If we want to mention uses of AI we consider okay, I'd do so further
> >> down, to not distract from the main point here.  Perhaps:
> >> 
> >>   The QEMU project thus requires that contributors refrain from using AI code
> >>   generators on patches intended to be submitted to the project, and will
> >>   decline any contribution if use of AI is either known or suspected.
> >> 
> >>   This policy does not apply to other uses of AI, such as researching APIs or
> >>   algorithms, static analysis, or debugging.
> >> 
> >>   Examples of tools impacted by this policy includes both GitHub's CoPilot,
> >>   OpenAI's ChatGPT, and Meta's Code Llama, amongst many others which are less
> >>   well known.
> >> 
> >> The paragraph in the middle is new, the other two are unchanged.
> >> 
> >> Thoughts?
> >
> > IMHO its redundant, as the policy is expressly around contribution of
> > code/content, and those activities as not contribution related, so
> > outside the scope already.
> 
> The very first paragraph in this file already set the scope: "provenance
> of patch submissions [...] to the project", so you have a point here.
> But does repeating the scope here hurt or help?

I guess it probably doesn't hurt to have it. Perhaps tweak to

 This policy does not apply to other uses of AI, such as researching APIs or
 algorithms, static analysis, or debugging, provided their output is not
 to be included in contributions.

and for the last paragraph remove 'both' and add a tailer

   Examples of tools impacted by this policy include GitHub's CoPilot,
   OpenAI's ChatGPT, and Meta's Code Llama (amongst many others which are less
   well known), and code/content generation agents which are built on top of
   such tools.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 3/3] docs: define policy forbidding use of AI code generators
  2025-06-04  9:22           ` Daniel P. Berrangé
@ 2025-06-04  9:40             ` Markus Armbruster
  2025-06-04 12:35             ` Yan Vugenfirer
  1 sibling, 0 replies; 29+ messages in thread
From: Markus Armbruster @ 2025-06-04  9:40 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Stefan Hajnoczi, qemu-devel, Thomas Huth, Alex Bennée,
	Michael S . Tsirkin, Gerd Hoffmann, Mark Cave-Ayland,
	Philippe Mathieu-Daudé, Kevin Wolf, Stefan Hajnoczi,
	Alexander Graf, Paolo Bonzini, Richard Henderson, Peter Maydell

Daniel P. Berrangé <berrange@redhat.com> writes:

> On Wed, Jun 04, 2025 at 10:58:38AM +0200, Markus Armbruster wrote:
>> Daniel P. Berrangé <berrange@redhat.com> writes:
>> 
>> > On Wed, Jun 04, 2025 at 08:17:27AM +0200, Markus Armbruster wrote:
>> >> Stefan Hajnoczi <stefanha@gmail.com> writes:
>> >> 
>> >> > On Tue, Jun 3, 2025 at 10:25 AM Markus Armbruster <armbru@redhat.com> wrote:
>> >> >>
>> >> >> From: Daniel P. Berrangé <berrange@redhat.com>
>> >  >> +
>> >> >> +The increasing prevalence of AI code generators, most notably but not limited
>> >> >
>> >> > More detail is needed on what an "AI code generator" is. Coding
>> >> > assistant tools range from autocompletion to linters to automatic code
>> >> > generators. In addition there are other AI-related tools like ChatGPT
>> >> > or Gemini as a chatbot that can people use like Stackoverflow or an
>> >> > API documentation summarizer.
>> >> >
>> >> > I think the intent is to say: do not put code that comes from _any_ AI
>> >> > tool into QEMU.
>> >> >
>> >> > It would be okay to use AI to research APIs, algorithms, brainstorm
>> >> > ideas, debug the code, analyze the code, etc but the actual code
>> >> > changes must not be generated by AI.
>> >
>> > The scope of the policy is around contributions we receive as
>> > patches with SoB. Researching / brainstorming / analysis etc
>> > are not contribution activities, so not covered by the policy
>> > IMHO.
>> 
>> Yes.  More below.
>> 
>> >> The existing text is about "AI code generators".  However, the "most
>> >> notably LLMs" that follows it could lead readers to believe it's about
>> >> more than just code generation, because LLMs are in fact used for more.
>> >> I figure this is your concern.
>> >> 
>> >> We could instead start wide, then narrow the focus to code generation.
>> >> Here's my try:
>> >> 
>> >>   The increasing prevalence of AI-assisted software development results
>> >>   in a number of difficult legal questions and risks for software
>> >>   projects, including QEMU.  Of particular concern is code generated by
>> >>   `Large Language Models
>> >>   <https://en.wikipedia.org/wiki/Large_language_model>`__ (LLMs).
>> >
>> > Documentation we maintain has the same concerns as code.
>> > So I'd suggest to substitute 'code' with 'code / content'.
>> 
>> Makes sense, thanks!
>> 
>> >> If we want to mention uses of AI we consider okay, I'd do so further
>> >> down, to not distract from the main point here.  Perhaps:
>> >> 
>> >>   The QEMU project thus requires that contributors refrain from using AI code
>> >>   generators on patches intended to be submitted to the project, and will
>> >>   decline any contribution if use of AI is either known or suspected.
>> >> 
>> >>   This policy does not apply to other uses of AI, such as researching APIs or
>> >>   algorithms, static analysis, or debugging.
>> >> 
>> >>   Examples of tools impacted by this policy includes both GitHub's CoPilot,
>> >>   OpenAI's ChatGPT, and Meta's Code Llama, amongst many others which are less
>> >>   well known.
>> >> 
>> >> The paragraph in the middle is new, the other two are unchanged.
>> >> 
>> >> Thoughts?
>> >
>> > IMHO its redundant, as the policy is expressly around contribution of
>> > code/content, and those activities as not contribution related, so
>> > outside the scope already.
>> 
>> The very first paragraph in this file already set the scope: "provenance
>> of patch submissions [...] to the project", so you have a point here.
>> But does repeating the scope here hurt or help?
>
> I guess it probably doesn't hurt to have it. Perhaps tweak to
>
>  This policy does not apply to other uses of AI, such as researching APIs or
>  algorithms, static analysis, or debugging, provided their output is not
>  to be included in contributions.
>
> and for the last paragraph remove 'both' and add a tailer
>
>    Examples of tools impacted by this policy include GitHub's CoPilot,
>    OpenAI's ChatGPT, and Meta's Code Llama (amongst many others which are less
>    well known), and code/content generation agents which are built on top of
>    such tools.

Sold!



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 3/3] docs: define policy forbidding use of AI code generators
  2025-06-04  9:10     ` Daniel P. Berrangé
@ 2025-06-04 11:01       ` Stefan Hajnoczi
  0 siblings, 0 replies; 29+ messages in thread
From: Stefan Hajnoczi @ 2025-06-04 11:01 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Markus Armbruster, qemu-devel, Thomas Huth, Alex Bennée,
	Michael S . Tsirkin, Gerd Hoffmann, Mark Cave-Ayland,
	Philippe Mathieu-Daudé, Kevin Wolf, Stefan Hajnoczi,
	Alexander Graf, Paolo Bonzini, Richard Henderson, Peter Maydell

[-- Attachment #1: Type: text/plain, Size: 5493 bytes --]

On Wed, Jun 4, 2025, 05:10 Daniel P. Berrangé <berrange@redhat.com> wrote:

> On Tue, Jun 03, 2025 at 02:25:42PM -0400, Stefan Hajnoczi wrote:
> > On Tue, Jun 3, 2025 at 10:25 AM Markus Armbruster <armbru@redhat.com>
> wrote:
> > >
> > > From: Daniel P. Berrangé <berrange@redhat.com>
> > >
> > > There has been an explosion of interest in so called AI code
> > > generators. Thus far though, this is has not been matched by a broadly
> > > accepted legal interpretation of the licensing implications for code
> > > generator outputs. While the vendors may claim there is no problem and
> > > a free choice of license is possible, they have an inherent conflict
> > > of interest in promoting this interpretation. More broadly there is,
> > > as yet, no broad consensus on the licensing implications of code
> > > generators trained on inputs under a wide variety of licenses
> > >
> > > The DCO requires contributors to assert they have the right to
> > > contribute under the designated project license. Given the lack of
> > > consensus on the licensing of AI code generator output, it is not
> > > considered credible to assert compliance with the DCO clause (b) or (c)
> > > where a patch includes such generated code.
> > >
> > > This patch thus defines a policy that the QEMU project will currently
> > > not accept contributions where use of AI code generators is either
> > > known, or suspected.
> > >
> > > These are early days of AI-assisted software development. The legal
> > > questions will be resolved eventually. The tools will mature, and we
> > > can expect some to become safely usable in free software projects.
> > > The policy we set now must be for today, and be open to revision. It's
> > > best to start strict and safe, then relax.
> > >
> > > Meanwhile requests for exceptions can also be considered on a case by
> > > case basis.
> > >
> > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > > Acked-by: Stefan Hajnoczi <stefanha@gmail.com>
> > > Reviewed-by: Kevin Wolf <kwolf@redhat.com>
> > > Signed-off-by: Markus Armbruster <armbru@redhat.com>
> > > ---
> > >  docs/devel/code-provenance.rst | 50 +++++++++++++++++++++++++++++++++-
> > >  1 file changed, 49 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/docs/devel/code-provenance.rst
> b/docs/devel/code-provenance.rst
> > > index c27d8fe649..261263cfba 100644
> > > --- a/docs/devel/code-provenance.rst
> > > +++ b/docs/devel/code-provenance.rst
> > > @@ -270,4 +270,52 @@ boilerplate code template which is then filled in
> to produce the final patch.
> > >  The output of such a tool would still be considered the "preferred
> format",
> > >  since it is intended to be a foundation for further human authored
> changes.
> > >  Such tools are acceptable to use, provided they follow a
> deterministic process
> > > -and there is clearly defined copyright and licensing for their output.
> > > +and there is clearly defined copyright and licensing for their
> output. Note
> > > +in particular the caveats applying to AI code generators below.
> > > +
> > > +Use of AI code generators
> > > +~~~~~~~~~~~~~~~~~~~~~~~~~
> > > +
> > > +TL;DR:
> > > +
> > > +  **Current QEMU project policy is to DECLINE any contributions which
> are
> > > +  believed to include or derive from AI generated code. This includes
> ChatGPT,
> > > +  CoPilot, Llama and similar tools**
> >
> > GitHub spells it "Copilot".
> >
> > Claude is very popular for coding at the moment and probably worth
> mentioning.
> >
> > > +
> > > +The increasing prevalence of AI code generators, most notably but not
> limited
> >
> > More detail is needed on what an "AI code generator" is. Coding
> > assistant tools range from autocompletion to linters to automatic code
> > generators. In addition there are other AI-related tools like ChatGPT
> > or Gemini as a chatbot that can people use like Stackoverflow or an
> > API documentation summarizer.
> >
> > I think the intent is to say: do not put code that comes from _any_ AI
> > tool into QEMU.
>
> Right, the intent is that any copyrightable portion of a commit must
> not have come directly from an AI/LLM tool, or from an agent which
> indirectly/internally uses an AI/LLM tool.
>
> "code generator" is possibly a little overly specific, as this is really
> about any type of tool which emits content that will make its way into
> qemu.git, whether code or non-code content (docs, images, etc).
>

Okay. The use case where AI is used to formulate code comments is common
enough that is with pointing it out explicitly in the policy. Many people
wouldn't consider that an "AI code generator" use case.

Stefan


> > It would be okay to use AI to research APIs, algorithms, brainstorm
> > ideas, debug the code, analyze the code, etc but the actual code
> > changes must not be generated by AI.
>
> Mostly yes - there's a fuzzy boundary in the debug/analyze use cases,
> if the tool is also suggesting code changes to fix issues.
>
> If the scope of the suggested changes meets the threshold for being
> (likely) copyrightable code, that would fall under the policy.
>
> With regards,
> Daniel
> --
> |: https://berrange.com      -o-
> https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-
> https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-
> https://www.instagram.com/dberrange :|
>
>

[-- Attachment #2: Type: text/html, Size: 7822 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 3/3] docs: define policy forbidding use of AI code generators
  2025-06-04  9:22           ` Daniel P. Berrangé
  2025-06-04  9:40             ` Markus Armbruster
@ 2025-06-04 12:35             ` Yan Vugenfirer
  1 sibling, 0 replies; 29+ messages in thread
From: Yan Vugenfirer @ 2025-06-04 12:35 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Markus Armbruster, Stefan Hajnoczi, qemu-devel, Thomas Huth,
	Alex Bennée, Michael S . Tsirkin, Gerd Hoffmann,
	Mark Cave-Ayland, Philippe Mathieu-Daudé, Kevin Wolf,
	Stefan Hajnoczi, Alexander Graf, Paolo Bonzini, Richard Henderson,
	Peter Maydell

[-- Attachment #1: Type: text/plain, Size: 4965 bytes --]

On Wed, Jun 4, 2025 at 12:23 PM Daniel P. Berrangé <berrange@redhat.com>
wrote:

> On Wed, Jun 04, 2025 at 10:58:38AM +0200, Markus Armbruster wrote:
> > Daniel P. Berrangé <berrange@redhat.com> writes:
> >
> > > On Wed, Jun 04, 2025 at 08:17:27AM +0200, Markus Armbruster wrote:
> > >> Stefan Hajnoczi <stefanha@gmail.com> writes:
> > >>
> > >> > On Tue, Jun 3, 2025 at 10:25 AM Markus Armbruster <
> armbru@redhat.com> wrote:
> > >> >>
> > >> >> From: Daniel P. Berrangé <berrange@redhat.com>
> > >  >> +
> > >> >> +The increasing prevalence of AI code generators, most notably but
> not limited
> > >> >
> > >> > More detail is needed on what an "AI code generator" is. Coding
> > >> > assistant tools range from autocompletion to linters to automatic
> code
> > >> > generators. In addition there are other AI-related tools like
> ChatGPT
> > >> > or Gemini as a chatbot that can people use like Stackoverflow or an
> > >> > API documentation summarizer.
> > >> >
> > >> > I think the intent is to say: do not put code that comes from _any_
> AI
> > >> > tool into QEMU.
> > >> >
> > >> > It would be okay to use AI to research APIs, algorithms, brainstorm
> > >> > ideas, debug the code, analyze the code, etc but the actual code
> > >> > changes must not be generated by AI.
> > >
> > > The scope of the policy is around contributions we receive as
> > > patches with SoB. Researching / brainstorming / analysis etc
> > > are not contribution activities, so not covered by the policy
> > > IMHO.
> >
> > Yes.  More below.
> >
> > >> The existing text is about "AI code generators".  However, the "most
> > >> notably LLMs" that follows it could lead readers to believe it's about
> > >> more than just code generation, because LLMs are in fact used for
> more.
> > >> I figure this is your concern.
> > >>
> > >> We could instead start wide, then narrow the focus to code generation.
> > >> Here's my try:
> > >>
> > >>   The increasing prevalence of AI-assisted software development
> results
> > >>   in a number of difficult legal questions and risks for software
> > >>   projects, including QEMU.  Of particular concern is code generated
> by
> > >>   `Large Language Models
> > >>   <https://en.wikipedia.org/wiki/Large_language_model>`__ (LLMs).
> > >
> > > Documentation we maintain has the same concerns as code.
> > > So I'd suggest to substitute 'code' with 'code / content'.
> >
> > Makes sense, thanks!
> >
> > >> If we want to mention uses of AI we consider okay, I'd do so further
> > >> down, to not distract from the main point here.  Perhaps:
> > >>
> > >>   The QEMU project thus requires that contributors refrain from using
> AI code
> > >>   generators on patches intended to be submitted to the project, and
> will
> > >>   decline any contribution if use of AI is either known or suspected.
> > >>
> > >>   This policy does not apply to other uses of AI, such as researching
> APIs or
> > >>   algorithms, static analysis, or debugging.
> > >>
> > >>   Examples of tools impacted by this policy includes both GitHub's
> CoPilot,
> > >>   OpenAI's ChatGPT, and Meta's Code Llama, amongst many others which
> are less
> > >>   well known.
> > >>
> > >> The paragraph in the middle is new, the other two are unchanged.
> > >>
> > >> Thoughts?
> > >
> > > IMHO its redundant, as the policy is expressly around contribution of
> > > code/content, and those activities as not contribution related, so
> > > outside the scope already.
> >
> > The very first paragraph in this file already set the scope: "provenance
> > of patch submissions [...] to the project", so you have a point here.
> > But does repeating the scope here hurt or help?
>
> I guess it probably doesn't hurt to have it. Perhaps tweak to
>
>  This policy does not apply to other uses of AI, such as researching APIs
> or
>  algorithms, static analysis, or debugging, provided their output is not
>  to be included in contributions.
>
> and for the last paragraph remove 'both' and add a tailer
>
>    Examples of tools impacted by this policy include GitHub's CoPilot,
>    OpenAI's ChatGPT, and Meta's Code Llama (amongst many others which are
> less
>    well known), and code/content generation agents which are built on top
> of
>    such tools.
>

I suggest emphasizing AI code completion as well (for example Copilot
integrated with Visual Studio Code does it). As such code is not generated
as a result of the prompt but by the "usual" code completion operation, the
developer might not be aware that this is actually AI generated code.

Best regards,
Yan.


> With regards,
> Daniel
> --
> |: https://berrange.com      -o-
> https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-
> https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-
> https://www.instagram.com/dberrange :|
>
>
>

[-- Attachment #2: Type: text/html, Size: 7360 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 1/3] docs: introduce dedicated page about code provenance / sign-off
  2025-06-04  8:52         ` Markus Armbruster
@ 2025-06-05  9:04           ` Markus Armbruster
  0 siblings, 0 replies; 29+ messages in thread
From: Markus Armbruster @ 2025-06-05  9:04 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: Alex Bennée, qemu-devel, Daniel P.Berrangé, Thomas Huth,
	Michael S . Tsirkin, Gerd Hoffmann, Mark Cave-Ayland, Kevin Wolf,
	Stefan Hajnoczi, Alexander Graf, Paolo Bonzini, Richard Henderson,
	Peter Maydell

Markus Armbruster <armbru@redhat.com> writes:

> Philippe Mathieu-Daudé <philmd@linaro.org> writes:
>
>> On 4/6/25 08:44, Markus Armbruster wrote:
>>> Alex Bennée <alex.bennee@linaro.org> writes:
>>> 
>>>> Markus Armbruster <armbru@redhat.com> writes:
>>>>
>>>>> From: Daniel P. Berrangé <berrange@redhat.com>
>>>>>
>>>>> Currently we have a short paragraph saying that patches must include
>>>>> a Signed-off-by line, and merely link to the kernel documentation.
>>>>> The linked kernel docs have a lot of content beyond the part about
>>>>> sign-off an thus are misleading/distracting to QEMU contributors.
>>>>>
>>>>> This introduces a dedicated 'code-provenance' page in QEMU talking
>>>>> about why we require sign-off, explaining the other tags we commonly
>>>>> use, and what to do in some edge cases.
>>>>>
>>>>> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
>>>>> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
>>>>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>>>>> ---
>>>>>   docs/devel/code-provenance.rst    | 218 ++++++++++++++++++++++++++++++
>>>>>   docs/devel/index-process.rst      |   1 +
>>>>>   docs/devel/submitting-a-patch.rst |  18 +--
>>>>>   3 files changed, 221 insertions(+), 16 deletions(-)
>>>>>   create mode 100644 docs/devel/code-provenance.rst
>>
>>
>>>>> +
>>>>> +   Signed-off-by: YOUR NAME <YOUR@EMAIL>
>>>>> +
>>>>> +using a known identity (sorry, no anonymous contributions.)
>>>>> +
>>>>
>>>> maybe "(contributions cannot be anonymous)" is more direct?
>>> If we're deviating from the kernel's text (which is *fine*), let's get
>>> rid of the parenthesis:
>>>      using a known identity.  Contributions cannot be anonymous.
>>> or in active voice:
>>>      using a known identity.  We cannot accept anonymous contributions.
>>
>> I'd add an anchor in the "commonly known identity" paragraph added in
>> commit 270c81b7d59 and here link to it.
>
> Makes sense, thanks!

Hmm, this splits the information between code-provenance.rst and
submitting-a-patch.rst.  The latter spot already links to the former.
Let's move the paragraph here.

>>> I like this one the best.



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 1/3] docs: introduce dedicated page about code provenance / sign-off
  2025-06-04  6:44     ` Markus Armbruster
                         ` (2 preceding siblings ...)
  2025-06-04  7:58       ` Gerd Hoffmann
@ 2025-06-05 14:52       ` Markus Armbruster
  2025-06-05 15:07         ` Alex Bennée
  3 siblings, 1 reply; 29+ messages in thread
From: Markus Armbruster @ 2025-06-05 14:52 UTC (permalink / raw)
  To: Alex Bennée
  Cc: qemu-devel, Daniel P . Berrangé, Thomas Huth,
	Michael S . Tsirkin, Gerd Hoffmann, Mark Cave-Ayland,
	Philippe Mathieu-Daudé, Kevin Wolf, Stefan Hajnoczi,
	Alexander Graf, Paolo Bonzini, Richard Henderson, Peter Maydell

Markus Armbruster <armbru@redhat.com> writes:

> Alex Bennée <alex.bennee@linaro.org> writes:

[...]

>> We don't mention the Link: or Message-Id: tags.
>
> Yes, but should it go into code-provenance.rst or
> submitting-a-patch.rst?
>
> You asked for guidance on use of "Message-Id:" in your review of v2.  I
> understand the practice, and can write guidance, but I wanted to get
> this out before my vacation next week, so I left it for later, as
> mentioned in the cover letter.
>
> How do we use "Link:"?  What about "Closes:"?

I didn't address this in v4.  I could try in a later revision, but I'd
prefer to do it on top.

[...]



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 1/3] docs: introduce dedicated page about code provenance / sign-off
  2025-06-05 14:52       ` Markus Armbruster
@ 2025-06-05 15:07         ` Alex Bennée
  0 siblings, 0 replies; 29+ messages in thread
From: Alex Bennée @ 2025-06-05 15:07 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: qemu-devel, Daniel P . Berrangé, Thomas Huth,
	Michael S . Tsirkin, Gerd Hoffmann, Mark Cave-Ayland,
	Philippe Mathieu-Daudé, Kevin Wolf, Stefan Hajnoczi,
	Alexander Graf, Paolo Bonzini, Richard Henderson, Peter Maydell

Markus Armbruster <armbru@redhat.com> writes:

> Markus Armbruster <armbru@redhat.com> writes:
>
>> Alex Bennée <alex.bennee@linaro.org> writes:
>
> [...]
>
>>> We don't mention the Link: or Message-Id: tags.
>>
>> Yes, but should it go into code-provenance.rst or
>> submitting-a-patch.rst?
>>
>> You asked for guidance on use of "Message-Id:" in your review of v2.  I
>> understand the practice, and can write guidance, but I wanted to get
>> this out before my vacation next week, so I left it for later, as
>> mentioned in the cover letter.
>>
>> How do we use "Link:"?  What about "Closes:"?
>
> I didn't address this in v4.  I could try in a later revision, but I'd
> prefer to do it on top.

Sure - no problem.

>
> [...]

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2025-06-05 15:08 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-03 14:25 [PATCH v3 0/3] docs: define policy forbidding use of "AI" / LLM code generators Markus Armbruster
2025-06-03 14:25 ` [PATCH v3 1/3] docs: introduce dedicated page about code provenance / sign-off Markus Armbruster
2025-06-03 16:53   ` Alex Bennée
2025-06-04  6:44     ` Markus Armbruster
2025-06-04  7:18       ` Daniel P. Berrangé
2025-06-04  7:46       ` Philippe Mathieu-Daudé
2025-06-04  8:52         ` Markus Armbruster
2025-06-05  9:04           ` Markus Armbruster
2025-06-04  7:58       ` Gerd Hoffmann
2025-06-05 14:52       ` Markus Armbruster
2025-06-05 15:07         ` Alex Bennée
2025-06-03 14:25 ` [PATCH v3 2/3] docs: define policy limiting the inclusion of generated files Markus Armbruster
2025-06-03 14:25 ` [PATCH v3 3/3] docs: define policy forbidding use of AI code generators Markus Armbruster
2025-06-03 15:37   ` Kevin Wolf
2025-06-04  6:18     ` Markus Armbruster
2025-06-03 18:25   ` Stefan Hajnoczi
2025-06-04  6:17     ` Markus Armbruster
2025-06-04  7:15       ` Daniel P. Berrangé
2025-06-04  7:54         ` Philippe Mathieu-Daudé
2025-06-04  8:40           ` Daniel P. Berrangé
2025-06-04  9:19             ` Philippe Mathieu-Daudé
2025-06-04  9:04           ` Markus Armbruster
2025-06-04  8:58         ` Markus Armbruster
2025-06-04  9:22           ` Daniel P. Berrangé
2025-06-04  9:40             ` Markus Armbruster
2025-06-04 12:35             ` Yan Vugenfirer
2025-06-04  9:10     ` Daniel P. Berrangé
2025-06-04 11:01       ` Stefan Hajnoczi
2025-06-03 15:25 ` [PATCH v3 0/3] docs: define policy forbidding use of "AI" / LLM " Kevin Wolf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).