[PATCH 0/2] docs: define policy forbidding use of "AI" / LLM code generators

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] docs: define policy forbidding use of "AI" / LLM code generators
@ 2023-11-23 11:40 Daniel P. Berrangé
  2023-11-23 11:40 ` [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off Daniel P. Berrangé
  2023-11-23 11:40 ` [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators Daniel P. Berrangé
  0 siblings, 2 replies; 57+ messages in thread
From: Daniel P. Berrangé @ 2023-11-23 11:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Alexander Graf, Alex Bennée,
	Paolo Bonzini, Michael S. Tsirkin, Markus Armbruster,
	Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Kevin Wolf,
	Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell,
	Daniel P. Berrangé

This patch kicks the hornet's nest of AI / LLM code generators.

With the increasing interest in code generators in recent times,
it is inevitable that QEMU contributions will include AI generated
code. Thus far we have remained silent on the matter. Given that
everyone knows these tools exist, our current position has to be
considered tacit acceptance of the use of AI generated code in QEMU.

The question for the project is whether that is a good position for
QEMU to take or not ?

IANAL, but I like to think I'm reasonably proficient at understanding
open source licensing. I am not inherantly against the use of AI tools,
rather I am anti-risk. I also want to see OSS licenses respected and
complied with.

AFAICT at its current state of (im)maturity the question of licensing
of AI code generator output does not have a broadly accepted / settled
legal position. This is an inherant bias/self-interest from the vendors
promoting their usage, who tend to minimize/dismiss the legal questions.
From my POV, this puts such tools in a position of elevated legal risk.

Given the fuzziness over the legal position of generated code from
such tools, I don't consider it credible (today) for a contributor
to assert compliance with the DCO terms (b) or (c) (which is a stated
pre-requisite for QEMU accepting patches) when a patch includes (or is
derived from) AI generated code.

By implication, I think that QEMU must (for now) explicitly decline
to (knowingly) accept AI generated code.

Perhaps a few years down the line the legal uncertainty will have
reduced and we can re-evaluate this policy.

NB I say "knowingly" because as reviewers we do ultimately have to
trust what contributors tell us about their patch origins, and this
has always been the case. Our policies and the use of the DCO, serve
to shift legal risk/exposure away from the project. They let us as a
project demonstrate that we took steps to set out our expectations /
requirements, and thus any contravention is the responsibility of the
contributor invovled, not the project.

Discuss...

Daniel P. Berrangé (2):
  docs: introduce dedicated page about code provenance / sign-off
  docs: define policy forbidding use of "AI" / LLM code generators

 docs/devel/code-provenance.rst    | 237 ++++++++++++++++++++++++++++++
 docs/devel/index-process.rst      |   1 +
 docs/devel/submitting-a-patch.rst |  18 +--
 3 files changed, 241 insertions(+), 15 deletions(-)
 create mode 100644 docs/devel/code-provenance.rst

-- 
2.41.0

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off
  2023-11-23 11:40 [PATCH 0/2] docs: define policy forbidding use of "AI" / LLM code generators Daniel P. Berrangé
@ 2023-11-23 11:40 ` Daniel P. Berrangé
  2023-11-23 11:58   ` Philippe Mathieu-Daudé
                     ` (5 more replies)
  2023-11-23 11:40 ` [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators Daniel P. Berrangé
  1 sibling, 6 replies; 57+ messages in thread
From: Daniel P. Berrangé @ 2023-11-23 11:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Alexander Graf, Alex Bennée,
	Paolo Bonzini, Michael S. Tsirkin, Markus Armbruster,
	Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Kevin Wolf,
	Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell,
	Daniel P. Berrangé

Currently we have a short paragraph saying that patches must include
a Signed-off-by line, and merely link to the kernel documentation.
The linked kernel docs have alot of content beyond the part about
sign-off an thus is misleading/distracting to QEMU contributors.

This introduces a dedicated 'code-provenance' page in QEMU talking
about why we require sign-off, explaining the other tags we commonly
use, and what to do in some edge cases.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 docs/devel/code-provenance.rst    | 197 ++++++++++++++++++++++++++++++
 docs/devel/index-process.rst      |   1 +
 docs/devel/submitting-a-patch.rst |  18 +--
 3 files changed, 201 insertions(+), 15 deletions(-)
 create mode 100644 docs/devel/code-provenance.rst

diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
new file mode 100644
index 0000000000..b4591a2dec
--- /dev/null
+++ b/docs/devel/code-provenance.rst
@@ -0,0 +1,197 @@
+.. _code-provenance:
+
+Code provenance
+===============
+
+Certifying patch submissions
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The QEMU community **mandates** all contributors to certify provenance
+of patch submissions they make to the project. To put it another way,
+contributors must indicate that they are legally permitted to contribute
+to the project.
+
+Certification is achieved with a low overhead by adding a single line
+to the bottom of every git commit::
+
+   Signed-off-by: YOUR NAME <YOUR@EMAIL>
+
+This existence of this line asserts that the author of the patch is
+contributing in accordance with the `Developer's Certificate of
+Origin <https://developercertifcate.org>`__:
+
+.. _dco:
+
+::
+  Developer's Certificate of Origin 1.1
+
+  By making a contribution to this project, I certify that:
+
+  (a) The contribution was created in whole or in part by me and I
+      have the right to submit it under the open source license
+      indicated in the file; or
+
+  (b) The contribution is based upon previous work that, to the best
+      of my knowledge, is covered under an appropriate open source
+      license and I have the right under that license to submit that
+      work with modifications, whether created in whole or in part
+      by me, under the same open source license (unless I am
+      permitted to submit under a different license), as indicated
+      in the file; or
+
+  (c) The contribution was provided directly to me by some other
+      person who certified (a), (b) or (c) and I have not modified
+      it.
+
+  (d) I understand and agree that this project and the contribution
+      are public and that a record of the contribution (including all
+      personal information I submit with it, including my sign-off) is
+      maintained indefinitely and may be redistributed consistent with
+      this project or the open source license(s) involved.
+
+It is generally expected that the name and email addresses used in one
+of the ``Signed-off-by`` lines, matches that of the git commit ``Author``
+field. If the person sending the mail is also one of the patch authors,
+it is further expected that the mail ``From:`` line name & address match
+one of the ``Signed-off-by`` lines. 
+
+Multiple authorship
+~~~~~~~~~~~~~~~~~~~
+
+It is not uncommon for a patch to have contributions from multiple
+authors. In such a scenario, a git commit will usually be expected
+to have a ``Signed-off-by`` line for each contributor involved in
+creatin of the patch. Some edge cases:
+
+  * The non-primary author's contributions were so trivial that
+    they can be considered not subject to copyright. In this case
+    the secondary authors need not include a ``Signed-off-by``.
+
+    This case most commonly applies where QEMU reviewers give short
+    snippets of code as suggested fixes to a patch. The reviewers
+    don't need to have their own ``Signed-off-by`` added unless
+    their code suggestion was unusually large.
+
+  * Both contributors work for the same employer and the employer
+    requires copyright assignment.
+
+    It can be said that in this case a ``Signed-off-by`` is indicating
+    that the person has permission to contributeo from their employer
+    who is the copyright holder. It is none the less still preferrable
+    to include a ``Signed-off-by`` for each contributor, as in some
+    countries employees are not able to assign copyright to their
+    employer, and it also covers any time invested outside working
+    hours.
+
+Other commit tags
+~~~~~~~~~~~~~~~~~
+
+While the ``Signed-off-by`` tag is mandatory, there are a number of
+other tags that are commonly used during QEMU development
+
+ * **``Reviewed-by``**: when a QEMU community member reviews a patch
+   on the mailing list, if they consider the patch acceptable, they
+   should send an email reply containing a ``Reviewed-by`` tag.
+
+   NB: a subsystem maintainer sending a pull request would replace
+   their own ``Reviewed-by`` with another ``Signed-off-by``
+
+ * **``Acked-by``**: when a QEMU subsystem maintainer approves a patch
+   that touches their subsystem, but intends to allow a different
+   maintainer to queue it and send a pull request, they would send
+   a mail containing a ``Acked-by`` tag.
+   
+ * **``Tested-by``**: when a QEMU community member has functionally
+   tested the behaviour of the patch in some manner, they should
+   send an email reply conmtaning a ``Tested-by`` tag.
+
+ * **``Reported-by``**: when a QEMU community member reports a problem
+   via the mailing list, or some other informal channel that is not
+   the issue tracker, it is good practice to credit them by including
+   a ``Reported-by`` tag on any patch fixing the issue. When the
+   problem is reported via the GitLab issue tracker, however, it is
+   sufficient to just include a link to the issue.
+
+Subsystem maintainer requirements
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+When a subsystem maintainer accepts a patch from a contributor, in
+addition to the normal code review points, they are expected to validate
+the presence of suitable ``Signed-off-by`` tags.
+
+At the time they queue the patch in their subsystem tree, the maintainer
+**MUST** also then add their own ``Signed-off-by`` to indicate that they
+have done the aforementioned validation.
+
+The subsystem maintainer submitting a pull request is **NOT** expected to
+have a ``Reviewed-by`` tag on the patch, since this is implied by their
+own ``Signed-off-by``.
+  
+Tools for adding ``Signed-of-by``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+There are a variety of ways tools can support adding ``Signed-off-by``
+tags for patches, avoiding the need for contributors to manually
+type in this repetitive text each time.
+
+git commands
+^^^^^^^^^^^^
+
+When creating, or amending, a commit the ``-s`` flag to ``git commit``
+will append a suitable line matching the configuring git author
+details.
+
+If preparing patches using the ``git format-patch`` tool, the ``-s``
+flag can be used to append a suitable line in the emails it creates,
+without modifying the local commits. Alternatively to modify the
+local commits on a branch en-mass::
+
+  git rebase master -x 'git commit --amend --no-edit -s'
+
+emacs
+^^^^^
+
+In the file ``$HOME/.emacs.d/abbrev_defs`` add::
+
+  (define-abbrev-table 'global-abbrev-table
+    '(
+      ("8rev" "Reviewed-by: YOUR NAME <your@email.addr>" nil 1)
+      ("8ack" "Acked-by: YOUR NAME <your@email.addr>" nil 1)
+      ("8test" "Tested-by: YOUR NAME <your@email.addr>" nil 1)
+      ("8sob" "Signed-off-by: YOUR NAME <your@email.addr>" nil 1)
+     ))
+
+with this change, if you type (for example) ``8rev`` followed
+by ``<space>`` or ``<enter>`` it will expand to the whole phrase. 
+
+vim
+^^^
+
+In the file ``$HOME/.vimrc`` add::
+
+  iabbrev 8rev Reviewed-by: YOUR NAME <your@email.addr>
+  iabbrev 8ack Acked-by: YOUR NAME <your@email.addr>
+  iabbrev 8test Tested-by: YOUR NAME <your@email.addr>
+  iabbrev 8sob Signed-off-by: YOUR NAME <your@email.addr>
+
+with this change, if you type (for example) ``8rev`` followed
+by ``<space>`` or ``<enter>`` it will expand to the whole phrase. 
+
+Re-starting abandoned work
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+For a variety of reasons there are some patches that get submitted to
+QEMU but never merged. An unrelated contributor may decide (months or
+years later) to continue working from the abandoned patch and re-submit
+it with extra changes.
+
+If the abandoned patch already had a ``Signed-off-by`` from the original
+author this **must** be preserved. The new contributor **must** then add
+their own ``Signed-off-by`` after the original one if they made any
+further changes to it. It is common to include a comment just prior to
+the new ``Signed-off-by`` indicating what extra changes were made. For
+example::
+
+  Signed-off-by: Some Person <some.person@example.com>
+  [Rebased and added support for 'foo']
+  Signed-off-by: New Person <new.person@example.com>
diff --git a/docs/devel/index-process.rst b/docs/devel/index-process.rst
index 362f97ee30..b54e58105e 100644
--- a/docs/devel/index-process.rst
+++ b/docs/devel/index-process.rst
@@ -13,6 +13,7 @@ Notes about how to interact with the community and how and where to submit patch
    maintainers
    style
    submitting-a-patch
+   code-provenance
    trivial-patches
    stable-process
    submitting-a-pull-request
diff --git a/docs/devel/submitting-a-patch.rst b/docs/devel/submitting-a-patch.rst
index c641d948f1..ec541b3d15 100644
--- a/docs/devel/submitting-a-patch.rst
+++ b/docs/devel/submitting-a-patch.rst
@@ -322,21 +322,9 @@ Patch emails must include a ``Signed-off-by:`` line
 
 Your patches **must** include a Signed-off-by: line. This is a hard
 requirement because it's how you say "I'm legally okay to contribute
-this and happy for it to go into QEMU". The process is modelled after
-the `Linux kernel
-<http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches?id=f6f94e2ab1b33f0082ac22d71f66385a60d8157f#n297>`__
-policy.
-
-If you wrote the patch, make sure your "From:" and "Signed-off-by:"
-lines use the same spelling. It's okay if you subscribe or contribute to
-the list via more than one address, but using multiple addresses in one
-commit just confuses things. If someone else wrote the patch, git will
-include a "From:" line in the body of the email (different from your
-envelope From:) that will give credit to the correct author; but again,
-that author's Signed-off-by: line is mandatory, with the same spelling.
-
-There are various tooling options for automatically adding these tags
-include using ``git commit -s`` or ``git format-patch -s``. For more
+this and happy for it to go into QEMU". For full guidance, read the
+:ref:`code-provenance` documentation.
+
 information see `SubmittingPatches 1.12
 <http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches?id=f6f94e2ab1b33f0082ac22d71f66385a60d8157f#n297>`__.
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators
  2023-11-23 11:40 [PATCH 0/2] docs: define policy forbidding use of "AI" / LLM code generators Daniel P. Berrangé
  2023-11-23 11:40 ` [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off Daniel P. Berrangé
@ 2023-11-23 11:40 ` Daniel P. Berrangé
  2023-11-23 12:57   ` Alex Bennée
                     ` (3 more replies)
  1 sibling, 4 replies; 57+ messages in thread
From: Daniel P. Berrangé @ 2023-11-23 11:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: Richard Henderson, Alexander Graf, Alex Bennée,
	Paolo Bonzini, Michael S. Tsirkin, Markus Armbruster,
	Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Kevin Wolf,
	Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell,
	Daniel P. Berrangé

There has been an explosion of interest in so called "AI" (LLM)
code generators in the past year or so. Thus far though, this is
has not been matched by a broadly accepted legal interpretation
of the licensing implications for code generator outputs. While
the vendors may claim there is no problem and a free choice of
license is possible, they have an inherent conflict of interest
in promoting this interpretation. More broadly there is, as yet,
no broad consensus on the licensing implications of code generators
trained on inputs under a wide variety of licenses.

The DCO requires contributors to assert they have the right to
contribute under the designated project license. Given the lack
of consensus on the licensing of "AI" (LLM) code generator output,
it is not considered credible to assert compliance with the DCO
clause (b) or (c) where a patch includes such generated code.

This patch thus defines a policy that the QEMU project will not
accept contributions where use of "AI" (LLM) code generators is
either known, or suspected.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 docs/devel/code-provenance.rst | 40 ++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
index b4591a2dec..a6e42c6b1b 100644
--- a/docs/devel/code-provenance.rst
+++ b/docs/devel/code-provenance.rst
@@ -195,3 +195,43 @@ example::
   Signed-off-by: Some Person <some.person@example.com>
   [Rebased and added support for 'foo']
   Signed-off-by: New Person <new.person@example.com>
+
+Use of "AI" (LLM) code generators
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+TL;DR:
+
+  **Current QEMU project policy is to DECLINE any contributions
+  which are believed to include or derive from "AI" (LLM)
+  generated code.**
+
+The existence of "AI" (`Large Language Model <https://en.wikipedia.org/wiki/Large_language_model>`__
+/ LLM) code generators raises a number of difficult legal questions, a
+number of which impact on Open Source projects. As noted earlier, the
+QEMU community requires that contributors certify their patch submissions
+are made in accordance with the rules of the :ref:`dco` (DCO). When a
+patch contains "AI" generated code this raises difficulties with code
+provenence and thus DCO compliance.
+
+To satisfy the DCO, the patch contributor has to fully understand
+the origins and license of code they are contributing to QEMU. The
+license terms that should apply to the output of an "AI" code generator
+are ill-defined, given that both training data and operation of the
+"AI" are typically opaque to the user. Even where the training data
+is said to all be open source, it will likely be under a wide variety
+of license terms.
+
+While the vendor's of "AI" code generators may promote the idea that
+code output can be taken under a free choice of license, this is not
+yet considered to be a generally accepted, nor tested, legal opinion.
+
+With this in mind, the QEMU maintainers does not consider it is
+currently possible to comply with DCO terms (b) or (c) for most "AI"
+generated code.
+
+The QEMU maintainers thus require that contributors refrain from using
+"AI" code generators on patches intended to be submitted to the project,
+and will decline any contribution if use of "AI" is known or suspected.
+
+Examples of tools impacted by this policy includes both GitHub CoPilot,
+and ChatGPT, amongst many others which are less well known.
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off
  2023-11-23 11:40 ` [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off Daniel P. Berrangé
@ 2023-11-23 11:58   ` Philippe Mathieu-Daudé
  2023-11-23 17:08     ` Daniel P. Berrangé
  2023-11-23 13:01   ` Peter Maydell
                     ` (4 subsequent siblings)
  5 siblings, 1 reply; 57+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-11-23 11:58 UTC (permalink / raw)
  To: Daniel P. Berrangé, qemu-devel
  Cc: Richard Henderson, Alexander Graf, Alex Bennée,
	Paolo Bonzini, Michael S. Tsirkin, Markus Armbruster,
	Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann,
	Mark Cave-Ayland, Peter Maydell

On 23/11/23 12:40, Daniel P. Berrangé wrote:
> Currently we have a short paragraph saying that patches must include
> a Signed-off-by line, and merely link to the kernel documentation.
> The linked kernel docs have alot of content beyond the part about
> sign-off an thus is misleading/distracting to QEMU contributors.
> 
> This introduces a dedicated 'code-provenance' page in QEMU talking
> about why we require sign-off, explaining the other tags we commonly
> use, and what to do in some edge cases.
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>   docs/devel/code-provenance.rst    | 197 ++++++++++++++++++++++++++++++
>   docs/devel/index-process.rst      |   1 +
>   docs/devel/submitting-a-patch.rst |  18 +--
>   3 files changed, 201 insertions(+), 15 deletions(-)
>   create mode 100644 docs/devel/code-provenance.rst
> 
> diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
> new file mode 100644
> index 0000000000..b4591a2dec
> --- /dev/null
> +++ b/docs/devel/code-provenance.rst
> @@ -0,0 +1,197 @@
> +.. _code-provenance:
> +
> +Code provenance
> +===============
> +
> +Certifying patch submissions
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The QEMU community **mandates** all contributors to certify provenance
> +of patch submissions they make to the project. To put it another way,
> +contributors must indicate that they are legally permitted to contribute
> +to the project.
> +
> +Certification is achieved with a low overhead by adding a single line
> +to the bottom of every git commit::
> +
> +   Signed-off-by: YOUR NAME <YOUR@EMAIL>
> +
> +This existence of this line asserts that the author of the patch is
> +contributing in accordance with the `Developer's Certificate of
> +Origin <https://developercertifcate.org>`__:

Typo: https://developercertificate.org/

> +
> +.. _dco:
> +
> +::
> +  Developer's Certificate of Origin 1.1
> +
> +  By making a contribution to this project, I certify that:
> +
> +  (a) The contribution was created in whole or in part by me and I
> +      have the right to submit it under the open source license
> +      indicated in the file; or
> +
> +  (b) The contribution is based upon previous work that, to the best
> +      of my knowledge, is covered under an appropriate open source
> +      license and I have the right under that license to submit that
> +      work with modifications, whether created in whole or in part
> +      by me, under the same open source license (unless I am
> +      permitted to submit under a different license), as indicated
> +      in the file; or
> +
> +  (c) The contribution was provided directly to me by some other
> +      person who certified (a), (b) or (c) and I have not modified
> +      it.
> +
> +  (d) I understand and agree that this project and the contribution
> +      are public and that a record of the contribution (including all
> +      personal information I submit with it, including my sign-off) is
> +      maintained indefinitely and may be redistributed consistent with
> +      this project or the open source license(s) involved.
> +
> +It is generally expected that the name and email addresses used in one
> +of the ``Signed-off-by`` lines, matches that of the git commit ``Author``
> +field. If the person sending the mail is also one of the patch authors,
> +it is further expected that the mail ``From:`` line name & address match
> +one of the ``Signed-off-by`` lines.
> +
> +Multiple authorship
> +~~~~~~~~~~~~~~~~~~~
> +
> +It is not uncommon for a patch to have contributions from multiple
> +authors. In such a scenario, a git commit will usually be expected
> +to have a ``Signed-off-by`` line for each contributor involved in
> +creatin of the patch. Some edge cases:

"creating"

> +
> +  * The non-primary author's contributions were so trivial that
> +    they can be considered not subject to copyright. In this case
> +    the secondary authors need not include a ``Signed-off-by``.
> +
> +    This case most commonly applies where QEMU reviewers give short
> +    snippets of code as suggested fixes to a patch. The reviewers
> +    don't need to have their own ``Signed-off-by`` added unless
> +    their code suggestion was unusually large.
> +
> +  * Both contributors work for the same employer and the employer
> +    requires copyright assignment.
> +
> +    It can be said that in this case a ``Signed-off-by`` is indicating
> +    that the person has permission to contributeo from their employer

"contribute"

> +    who is the copyright holder. It is none the less still preferrable

"preferable"

> +    to include a ``Signed-off-by`` for each contributor, as in some
> +    countries employees are not able to assign copyright to their
> +    employer, and it also covers any time invested outside working
> +    hours.
> +
> +Other commit tags
> +~~~~~~~~~~~~~~~~~
> +
> +While the ``Signed-off-by`` tag is mandatory, there are a number of
> +other tags that are commonly used during QEMU development
> +
> + * **``Reviewed-by``**: when a QEMU community member reviews a patch
> +   on the mailing list, if they consider the patch acceptable, they
> +   should send an email reply containing a ``Reviewed-by`` tag.
> +
> +   NB: a subsystem maintainer sending a pull request would replace
> +   their own ``Reviewed-by`` with another ``Signed-off-by``

Hmm not sure about replacing, they have different meaning. You can merge
patch you haven't reviewed. But as a maintainer you must S-o-b what you
end merging (what is mentioned below in "subsystem maintainer").

> +
> + * **``Acked-by``**: when a QEMU subsystem maintainer approves a patch
> +   that touches their subsystem, but intends to allow a different
> +   maintainer to queue it and send a pull request, they would send
> +   a mail containing a ``Acked-by`` tag.
> +
> + * **``Tested-by``**: when a QEMU community member has functionally
> +   tested the behaviour of the patch in some manner, they should
> +   send an email reply conmtaning a ``Tested-by`` tag.

"containing"

> +
> + * **``Reported-by``**: when a QEMU community member reports a problem
> +   via the mailing list, or some other informal channel that is not
> +   the issue tracker, it is good practice to credit them by including
> +   a ``Reported-by`` tag on any patch fixing the issue. When the
> +   problem is reported via the GitLab issue tracker, however, it is
> +   sufficient to just include a link to the issue.

Hmm isn't related to the "Resolves:" tag?

> +
> +Subsystem maintainer requirements
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +When a subsystem maintainer accepts a patch from a contributor, in
> +addition to the normal code review points, they are expected to validate
> +the presence of suitable ``Signed-off-by`` tags.
> +
> +At the time they queue the patch in their subsystem tree, the maintainer
> +**MUST** also then add their own ``Signed-off-by`` to indicate that they
> +have done the aforementioned validation.
> +
> +The subsystem maintainer submitting a pull request is **NOT** expected to
> +have a ``Reviewed-by`` tag on the patch, since this is implied by their
> +own ``Signed-off-by``.
> +
> +Tools for adding ``Signed-of-by``

"Signed-off-by"

> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +There are a variety of ways tools can support adding ``Signed-off-by``
> +tags for patches, avoiding the need for contributors to manually
> +type in this repetitive text each time.
> +
> +git commands
> +^^^^^^^^^^^^
> +
> +When creating, or amending, a commit the ``-s`` flag to ``git commit``
> +will append a suitable line matching the configuring git author
> +details.
> +
> +If preparing patches using the ``git format-patch`` tool, the ``-s``
> +flag can be used to append a suitable line in the emails it creates,
> +without modifying the local commits. Alternatively to modify the
> +local commits on a branch en-mass::
> +
> +  git rebase master -x 'git commit --amend --no-edit -s'
> +
> +emacs
> +^^^^^
> +
> +In the file ``$HOME/.emacs.d/abbrev_defs`` add::
> +
> +  (define-abbrev-table 'global-abbrev-table
> +    '(
> +      ("8rev" "Reviewed-by: YOUR NAME <your@email.addr>" nil 1)
> +      ("8ack" "Acked-by: YOUR NAME <your@email.addr>" nil 1)
> +      ("8test" "Tested-by: YOUR NAME <your@email.addr>" nil 1)
> +      ("8sob" "Signed-off-by: YOUR NAME <your@email.addr>" nil 1)
> +     ))
> +
> +with this change, if you type (for example) ``8rev`` followed
> +by ``<space>`` or ``<enter>`` it will expand to the whole phrase.
> +
> +vim
> +^^^
> +
> +In the file ``$HOME/.vimrc`` add::
> +
> +  iabbrev 8rev Reviewed-by: YOUR NAME <your@email.addr>
> +  iabbrev 8ack Acked-by: YOUR NAME <your@email.addr>
> +  iabbrev 8test Tested-by: YOUR NAME <your@email.addr>
> +  iabbrev 8sob Signed-off-by: YOUR NAME <your@email.addr>
> +
> +with this change, if you type (for example) ``8rev`` followed
> +by ``<space>`` or ``<enter>`` it will expand to the whole phrase.
> +
> +Re-starting abandoned work
> +~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +For a variety of reasons there are some patches that get submitted to
> +QEMU but never merged. An unrelated contributor may decide (months or
> +years later) to continue working from the abandoned patch and re-submit
> +it with extra changes.
> +
> +If the abandoned patch already had a ``Signed-off-by`` from the original
> +author this **must** be preserved. The new contributor **must** then add
> +their own ``Signed-off-by`` after the original one if they made any
> +further changes to it. It is common to include a comment just prior to
> +the new ``Signed-off-by`` indicating what extra changes were made. For
> +example::
> +
> +  Signed-off-by: Some Person <some.person@example.com>
> +  [Rebased and added support for 'foo']
> +  Signed-off-by: New Person <new.person@example.com>
> diff --git a/docs/devel/index-process.rst b/docs/devel/index-process.rst
> index 362f97ee30..b54e58105e 100644
> --- a/docs/devel/index-process.rst
> +++ b/docs/devel/index-process.rst
> @@ -13,6 +13,7 @@ Notes about how to interact with the community and how and where to submit patch
>      maintainers
>      style
>      submitting-a-patch
> +   code-provenance
>      trivial-patches
>      stable-process
>      submitting-a-pull-request
> diff --git a/docs/devel/submitting-a-patch.rst b/docs/devel/submitting-a-patch.rst
> index c641d948f1..ec541b3d15 100644
> --- a/docs/devel/submitting-a-patch.rst
> +++ b/docs/devel/submitting-a-patch.rst
> @@ -322,21 +322,9 @@ Patch emails must include a ``Signed-off-by:`` line
>   
>   Your patches **must** include a Signed-off-by: line. This is a hard
>   requirement because it's how you say "I'm legally okay to contribute
> -this and happy for it to go into QEMU". The process is modelled after
> -the `Linux kernel
> -<http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches?id=f6f94e2ab1b33f0082ac22d71f66385a60d8157f#n297>`__
> -policy.
> -
> -If you wrote the patch, make sure your "From:" and "Signed-off-by:"
> -lines use the same spelling. It's okay if you subscribe or contribute to
> -the list via more than one address, but using multiple addresses in one
> -commit just confuses things. If someone else wrote the patch, git will
> -include a "From:" line in the body of the email (different from your
> -envelope From:) that will give credit to the correct author; but again,
> -that author's Signed-off-by: line is mandatory, with the same spelling.
> -
> -There are various tooling options for automatically adding these tags
> -include using ``git commit -s`` or ``git format-patch -s``. For more
> +this and happy for it to go into QEMU". For full guidance, read the
> +:ref:`code-provenance` documentation.
> +
>   information see `SubmittingPatches 1.12
>   <http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches?id=f6f94e2ab1b33f0082ac22d71f66385a60d8157f#n297>`__.
>   



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators
  2023-11-23 11:40 ` [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators Daniel P. Berrangé
@ 2023-11-23 12:57   ` Alex Bennée
  2023-11-23 17:37     ` Michal Suchánek
  2023-11-23 17:46     ` Daniel P. Berrangé
  2023-11-23 13:20   ` Kevin Wolf
                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 57+ messages in thread
From: Alex Bennée @ 2023-11-23 12:57 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Richard Henderson, Alexander Graf, Paolo Bonzini,
	Michael S. Tsirkin, Markus Armbruster, Phil Mathieu-Daudé,
	Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann,
	Mark Cave-Ayland, Peter Maydell

Daniel P. Berrangé <berrange@redhat.com> writes:

> There has been an explosion of interest in so called "AI" (LLM)
> code generators in the past year or so. Thus far though, this is
> has not been matched by a broadly accepted legal interpretation
> of the licensing implications for code generator outputs. While
> the vendors may claim there is no problem and a free choice of
> license is possible, they have an inherent conflict of interest
> in promoting this interpretation. More broadly there is, as yet,
> no broad consensus on the licensing implications of code generators
> trained on inputs under a wide variety of licenses.
>
> The DCO requires contributors to assert they have the right to
> contribute under the designated project license. Given the lack
> of consensus on the licensing of "AI" (LLM) code generator output,
> it is not considered credible to assert compliance with the DCO
> clause (b) or (c) where a patch includes such generated code.
>
> This patch thus defines a policy that the QEMU project will not
> accept contributions where use of "AI" (LLM) code generators is
> either known, or suspected.
>
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>  docs/devel/code-provenance.rst | 40 ++++++++++++++++++++++++++++++++++
>  1 file changed, 40 insertions(+)
>
> diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
> index b4591a2dec..a6e42c6b1b 100644
> --- a/docs/devel/code-provenance.rst
> +++ b/docs/devel/code-provenance.rst
> @@ -195,3 +195,43 @@ example::
>    Signed-off-by: Some Person <some.person@example.com>
>    [Rebased and added support for 'foo']
>    Signed-off-by: New Person <new.person@example.com>
> +
> +Use of "AI" (LLM) code generators
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +TL;DR:
> +
> +  **Current QEMU project policy is to DECLINE any contributions
> +  which are believed to include or derive from "AI" (LLM)
> +  generated code.**
> +
> +The existence of "AI" (`Large Language Model <https://en.wikipedia.org/wiki/Large_language_model>`__
> +/ LLM) code generators raises a number of difficult legal questions, a
> +number of which impact on Open Source projects. As noted earlier, the
> +QEMU community requires that contributors certify their patch submissions
> +are made in accordance with the rules of the :ref:`dco` (DCO). When a
> +patch contains "AI" generated code this raises difficulties with code
> +provenence and thus DCO compliance.

I agree this is going to be a field that keeps lawyers well re-numerated
for the foreseeable future. However I suspect this elides over the main
use case for LLM generators which is non-novel transformation. One good
example is generating text fixtures where you write a piece of original
code and then ask the code completion engine to fill out some unit tests
to exercise the code. It's boring mechanical work but one an LLM is very
suited to (even if you might tweak the final result).

> +To satisfy the DCO, the patch contributor has to fully understand
> +the origins and license of code they are contributing to QEMU. The
> +license terms that should apply to the output of an "AI" code generator
> +are ill-defined, given that both training data and operation of the
> +"AI" are typically opaque to the user. Even where the training data
> +is said to all be open source, it will likely be under a wide variety
> +of license terms.
> +
> +While the vendor's of "AI" code generators may promote the idea that
> +code output can be taken under a free choice of license, this is not
> +yet considered to be a generally accepted, nor tested, legal opinion.
> +
> +With this in mind, the QEMU maintainers does not consider it is
> +currently possible to comply with DCO terms (b) or (c) for most "AI"
> +generated code.

There is a load of code out that isn't eligible for copyright projection
because it doesn't demonstrate much originality or creativity. In the
experimentation I've done so far I've not seen much sign of genuine
creativity. LLM's benefit from having access to a wide corpus of
training data and tend to do a better job of inferencing solutions from
semi-related posts than say for example human manually comparing posts
having pasted an error message in google.

> +
> +The QEMU maintainers thus require that contributors refrain from using
> +"AI" code generators on patches intended to be submitted to the project,
> +and will decline any contribution if use of "AI" is known or suspected.
> +
> +Examples of tools impacted by this policy includes both GitHub CoPilot,
> +and ChatGPT, amongst many others which are less well known.

What about if you took an LLM and then fine tuned it by using project
data so it could better help new users in making contributions to the
project? You would be biasing the model to your own data for the
purposes of helping developers write better QEMU code?

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off
  2023-11-23 11:40 ` [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off Daniel P. Berrangé
  2023-11-23 11:58   ` Philippe Mathieu-Daudé
@ 2023-11-23 13:01   ` Peter Maydell
  2023-11-23 17:12     ` Daniel P. Berrangé
  2023-11-23 13:16   ` Kevin Wolf
                     ` (3 subsequent siblings)
  5 siblings, 1 reply; 57+ messages in thread
From: Peter Maydell @ 2023-11-23 13:01 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Richard Henderson, Alexander Graf, Alex Bennée,
	Paolo Bonzini, Michael S. Tsirkin, Markus Armbruster,
	Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Kevin Wolf,
	Gerd Hoffmann, Mark Cave-Ayland

On Thu, 23 Nov 2023 at 11:40, Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> Currently we have a short paragraph saying that patches must include
> a Signed-off-by line, and merely link to the kernel documentation.
> The linked kernel docs have alot of content beyond the part about

"a lot"

> sign-off an thus is misleading/distracting to QEMU contributors.

"and thus are"

>
> This introduces a dedicated 'code-provenance' page in QEMU talking
> about why we require sign-off, explaining the other tags we commonly
> use, and what to do in some edge cases.

Good idea; I've felt for a while now that it was a little awkward
to have to point people at that big kernel doc page.


> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>  docs/devel/code-provenance.rst    | 197 ++++++++++++++++++++++++++++++
>  docs/devel/index-process.rst      |   1 +
>  docs/devel/submitting-a-patch.rst |  18 +--
>  3 files changed, 201 insertions(+), 15 deletions(-)
>  create mode 100644 docs/devel/code-provenance.rst
>
> diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
> new file mode 100644
> index 0000000000..b4591a2dec
> --- /dev/null
> +++ b/docs/devel/code-provenance.rst
> @@ -0,0 +1,197 @@
> +.. _code-provenance:
> +
> +Code provenance
> +===============
> +
> +Certifying patch submissions
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The QEMU community **mandates** all contributors to certify provenance
> +of patch submissions they make to the project. To put it another way,
> +contributors must indicate that they are legally permitted to contribute
> +to the project.
> +
> +Certification is achieved with a low overhead by adding a single line
> +to the bottom of every git commit::
> +
> +   Signed-off-by: YOUR NAME <YOUR@EMAIL>
> +
> +This existence of this line asserts that the author of the patch is
> +contributing in accordance with the `Developer's Certificate of
> +Origin <https://developercertifcate.org>`__:
> +
> +.. _dco:
> +
> +::
> +  Developer's Certificate of Origin 1.1
> +
> +  By making a contribution to this project, I certify that:
> +
> +  (a) The contribution was created in whole or in part by me and I
> +      have the right to submit it under the open source license
> +      indicated in the file; or
> +
> +  (b) The contribution is based upon previous work that, to the best
> +      of my knowledge, is covered under an appropriate open source
> +      license and I have the right under that license to submit that
> +      work with modifications, whether created in whole or in part
> +      by me, under the same open source license (unless I am
> +      permitted to submit under a different license), as indicated
> +      in the file; or
> +
> +  (c) The contribution was provided directly to me by some other
> +      person who certified (a), (b) or (c) and I have not modified
> +      it.
> +
> +  (d) I understand and agree that this project and the contribution
> +      are public and that a record of the contribution (including all
> +      personal information I submit with it, including my sign-off) is
> +      maintained indefinitely and may be redistributed consistent with
> +      this project or the open source license(s) involved.
> +
> +It is generally expected that the name and email addresses used in one
> +of the ``Signed-off-by`` lines, matches that of the git commit ``Author``
> +field. If the person sending the mail is also one of the patch authors,
> +it is further expected that the mail ``From:`` line name & address match
> +one of the ``Signed-off-by`` lines.

Is it? Patches sent via the sr.ht service won't do that, and I'm
pretty sure we've had a few contributors in the past who send
patches from different addresses to avoid problems with their
corporate mail server mangling patches. I think this would be
better softened to something like a recommendation ("Generally
you should use the same email addresses ... ").

> +Multiple authorship
> +~~~~~~~~~~~~~~~~~~~
> +
> +It is not uncommon for a patch to have contributions from multiple
> +authors. In such a scenario, a git commit will usually be expected
> +to have a ``Signed-off-by`` line for each contributor involved in
> +creatin of the patch. Some edge cases:

"creation" (not "creating")

> +
> +  * The non-primary author's contributions were so trivial that
> +    they can be considered not subject to copyright. In this case
> +    the secondary authors need not include a ``Signed-off-by``.
> +
> +    This case most commonly applies where QEMU reviewers give short
> +    snippets of code as suggested fixes to a patch. The reviewers
> +    don't need to have their own ``Signed-off-by`` added unless
> +    their code suggestion was unusually large.
> +
> +  * Both contributors work for the same employer and the employer
> +    requires copyright assignment.
> +
> +    It can be said that in this case a ``Signed-off-by`` is indicating
> +    that the person has permission to contributeo from their employer
> +    who is the copyright holder. It is none the less still preferrable
> +    to include a ``Signed-off-by`` for each contributor, as in some
> +    countries employees are not able to assign copyright to their
> +    employer, and it also covers any time invested outside working
> +    hours.
> +
> +Other commit tags
> +~~~~~~~~~~~~~~~~~
> +
> +While the ``Signed-off-by`` tag is mandatory, there are a number of
> +other tags that are commonly used during QEMU development

missing '.' (or perhaps ':').

> +
> + * **``Reviewed-by``**: when a QEMU community member reviews a patch
> +   on the mailing list, if they consider the patch acceptable, they
> +   should send an email reply containing a ``Reviewed-by`` tag.
> +
> +   NB: a subsystem maintainer sending a pull request would replace
> +   their own ``Reviewed-by`` with another ``Signed-off-by``

I agree with Philippe here -- you add signed-off-by, you don't
replace reviewed-by.

> +
> + * **``Acked-by``**: when a QEMU subsystem maintainer approves a patch
> +   that touches their subsystem, but intends to allow a different
> +   maintainer to queue it and send a pull request, they would send
> +   a mail containing a ``Acked-by`` tag.

I would personally also say "Acked-by does not imply a full code
review of the patch; if the subsystem maintainer has done a full
review, they should use the Reviewed-by tag instead."

But I know that there are some differences of opinion on exactly
what Acked-by: means...

> +
> + * **``Tested-by``**: when a QEMU community member has functionally
> +   tested the behaviour of the patch in some manner, they should
> +   send an email reply conmtaning a ``Tested-by`` tag.
> +
> + * **``Reported-by``**: when a QEMU community member reports a problem
> +   via the mailing list, or some other informal channel that is not
> +   the issue tracker, it is good practice to credit them by including
> +   a ``Reported-by`` tag on any patch fixing the issue. When the
> +   problem is reported via the GitLab issue tracker, however, it is
> +   sufficient to just include a link to the issue.

Maybe we should add a bit of encouraging text here along the lines of:

Reviewing and testing is something anybody can do -- if you've
reviewed the code or tested it, feel free to send an email with
your tag to say you've done that, or to ask questions if there's
part of the patch you don't understand.

? Or perhaps that would be better elsewhere; IDK.

> +
> +Subsystem maintainer requirements
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +When a subsystem maintainer accepts a patch from a contributor, in
> +addition to the normal code review points, they are expected to validate
> +the presence of suitable ``Signed-off-by`` tags.
> +
> +At the time they queue the patch in their subsystem tree, the maintainer
> +**MUST** also then add their own ``Signed-off-by`` to indicate that they
> +have done the aforementioned validation.
> +
> +The subsystem maintainer submitting a pull request is **NOT** expected to
> +have a ``Reviewed-by`` tag on the patch, since this is implied by their
> +own ``Signed-off-by``.

As above, Signed-off-by doesn't imply Reviewed-by. If the
submaintainer has reviewed the patch, they add the R-by,
but if they haven't done that, then they only add the S-o-by.

> +
> +Tools for adding ``Signed-of-by``
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +There are a variety of ways tools can support adding ``Signed-off-by``
> +tags for patches, avoiding the need for contributors to manually
> +type in this repetitive text each time.
> +
> +git commands
> +^^^^^^^^^^^^
> +
> +When creating, or amending, a commit the ``-s`` flag to ``git commit``
> +will append a suitable line matching the configuring git author
> +details.
> +
> +If preparing patches using the ``git format-patch`` tool, the ``-s``
> +flag can be used to append a suitable line in the emails it creates,
> +without modifying the local commits. Alternatively to modify the
> +local commits on a branch en-mass::
> +
> +  git rebase master -x 'git commit --amend --no-edit -s'
> +
> +emacs
> +^^^^^
> +
> +In the file ``$HOME/.emacs.d/abbrev_defs`` add::
> +
> +  (define-abbrev-table 'global-abbrev-table
> +    '(
> +      ("8rev" "Reviewed-by: YOUR NAME <your@email.addr>" nil 1)
> +      ("8ack" "Acked-by: YOUR NAME <your@email.addr>" nil 1)
> +      ("8test" "Tested-by: YOUR NAME <your@email.addr>" nil 1)
> +      ("8sob" "Signed-off-by: YOUR NAME <your@email.addr>" nil 1)
> +     ))
> +
> +with this change, if you type (for example) ``8rev`` followed
> +by ``<space>`` or ``<enter>`` it will expand to the whole phrase.
> +
> +vim
> +^^^
> +
> +In the file ``$HOME/.vimrc`` add::
> +
> +  iabbrev 8rev Reviewed-by: YOUR NAME <your@email.addr>
> +  iabbrev 8ack Acked-by: YOUR NAME <your@email.addr>
> +  iabbrev 8test Tested-by: YOUR NAME <your@email.addr>
> +  iabbrev 8sob Signed-off-by: YOUR NAME <your@email.addr>
> +
> +with this change, if you type (for example) ``8rev`` followed
> +by ``<space>`` or ``<enter>`` it will expand to the whole phrase.
> +
> +Re-starting abandoned work
> +~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +For a variety of reasons there are some patches that get submitted to
> +QEMU but never merged. An unrelated contributor may decide (months or
> +years later) to continue working from the abandoned patch and re-submit
> +it with extra changes.
> +
> +If the abandoned patch already had a ``Signed-off-by`` from the original
> +author this **must** be preserved. The new contributor **must** then add
> +their own ``Signed-off-by`` after the original one if they made any
> +further changes to it. It is common to include a comment just prior to
> +the new ``Signed-off-by`` indicating what extra changes were made. For
> +example::
> +
> +  Signed-off-by: Some Person <some.person@example.com>
> +  [Rebased and added support for 'foo']
> +  Signed-off-by: New Person <new.person@example.com>

You might want to use two different email domains in this example;
an abandoned project picked up by somebody from the same company
(assuming the usual copyright-belongs-to-company) is a bit different
from an abandoned project picked up by an entirely unrelated person.

I think in this case it's also worth stating the general principles:

===begin===
The general principles with picking up abandoned work are:
 * we should continue to credit the first author for their work
 * we should track the provenance of the code
 * we should also acknowledge the efforts of the person picking
   up the work
 * the commit messages should indicate who is responsible for
   what parts of the final patch

In complicated cases or if in doubt, you can always ask on the
mailing list for advice.

If the new work you'd need to do to resubmit the patches is
significant, it's worth dropping the original author a
friendly email to let them know, in case you might be
duplicating something the original author is still working on.
===endit===

perhaps ?

thanks
-- PMM


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off
  2023-11-23 11:40 ` [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off Daniel P. Berrangé
  2023-11-23 11:58   ` Philippe Mathieu-Daudé
  2023-11-23 13:01   ` Peter Maydell
@ 2023-11-23 13:16   ` Kevin Wolf
  2023-11-23 17:12     ` Daniel P. Berrangé
  2023-11-23 14:25   ` Michael S. Tsirkin
                     ` (2 subsequent siblings)
  5 siblings, 1 reply; 57+ messages in thread
From: Kevin Wolf @ 2023-11-23 13:16 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Richard Henderson, Alexander Graf, Alex Bennée,
	Paolo Bonzini, Michael S. Tsirkin, Markus Armbruster,
	Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth,
	Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell

Am 23.11.2023 um 12:40 hat Daniel P. Berrangé geschrieben:
> Currently we have a short paragraph saying that patches must include
> a Signed-off-by line, and merely link to the kernel documentation.
> The linked kernel docs have alot of content beyond the part about
> sign-off an thus is misleading/distracting to QEMU contributors.
> 
> This introduces a dedicated 'code-provenance' page in QEMU talking
> about why we require sign-off, explaining the other tags we commonly
> use, and what to do in some edge cases.
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>  docs/devel/code-provenance.rst    | 197 ++++++++++++++++++++++++++++++
>  docs/devel/index-process.rst      |   1 +
>  docs/devel/submitting-a-patch.rst |  18 +--
>  3 files changed, 201 insertions(+), 15 deletions(-)
>  create mode 100644 docs/devel/code-provenance.rst
> 
> diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
> new file mode 100644
> index 0000000000..b4591a2dec
> --- /dev/null
> +++ b/docs/devel/code-provenance.rst
> @@ -0,0 +1,197 @@
> +.. _code-provenance:
> +
> +Code provenance
> +===============
> +
> +Certifying patch submissions
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The QEMU community **mandates** all contributors to certify provenance
> +of patch submissions they make to the project. To put it another way,
> +contributors must indicate that they are legally permitted to contribute
> +to the project.
> +
> +Certification is achieved with a low overhead by adding a single line
> +to the bottom of every git commit::
> +
> +   Signed-off-by: YOUR NAME <YOUR@EMAIL>
> +
> +This existence of this line asserts that the author of the patch is
> +contributing in accordance with the `Developer's Certificate of
> +Origin <https://developercertifcate.org>`__:
> +
> +.. _dco:
> +
> +::
> +  Developer's Certificate of Origin 1.1
> +
> +  By making a contribution to this project, I certify that:
> +
> +  (a) The contribution was created in whole or in part by me and I
> +      have the right to submit it under the open source license
> +      indicated in the file; or
> +
> +  (b) The contribution is based upon previous work that, to the best
> +      of my knowledge, is covered under an appropriate open source
> +      license and I have the right under that license to submit that
> +      work with modifications, whether created in whole or in part
> +      by me, under the same open source license (unless I am
> +      permitted to submit under a different license), as indicated
> +      in the file; or
> +
> +  (c) The contribution was provided directly to me by some other
> +      person who certified (a), (b) or (c) and I have not modified
> +      it.
> +
> +  (d) I understand and agree that this project and the contribution
> +      are public and that a record of the contribution (including all
> +      personal information I submit with it, including my sign-off) is
> +      maintained indefinitely and may be redistributed consistent with
> +      this project or the open source license(s) involved.
> +
> +It is generally expected that the name and email addresses used in one
> +of the ``Signed-off-by`` lines, matches that of the git commit ``Author``
> +field. If the person sending the mail is also one of the patch authors,
> +it is further expected that the mail ``From:`` line name & address match
> +one of the ``Signed-off-by`` lines. 

Isn't the S-o-b expected even if the person sending the mail isn't one
of the patch authors, i.e. certifying (c) rather than (a) or (b) from
the DCO? This is essentially the same case as what a subsystem
maintainer does.

> +Multiple authorship
> +~~~~~~~~~~~~~~~~~~~
> +
> +It is not uncommon for a patch to have contributions from multiple
> +authors. In such a scenario, a git commit will usually be expected
> +to have a ``Signed-off-by`` line for each contributor involved in
> +creatin of the patch. Some edge cases:
> +
> +  * The non-primary author's contributions were so trivial that
> +    they can be considered not subject to copyright. In this case
> +    the secondary authors need not include a ``Signed-off-by``.
> +
> +    This case most commonly applies where QEMU reviewers give short
> +    snippets of code as suggested fixes to a patch. The reviewers
> +    don't need to have their own ``Signed-off-by`` added unless
> +    their code suggestion was unusually large.
> +
> +  * Both contributors work for the same employer and the employer
> +    requires copyright assignment.
> +
> +    It can be said that in this case a ``Signed-off-by`` is indicating
> +    that the person has permission to contributeo from their employer
> +    who is the copyright holder. It is none the less still preferrable
> +    to include a ``Signed-off-by`` for each contributor, as in some
> +    countries employees are not able to assign copyright to their
> +    employer, and it also covers any time invested outside working
> +    hours.
> +
> +Other commit tags
> +~~~~~~~~~~~~~~~~~
> +
> +While the ``Signed-off-by`` tag is mandatory, there are a number of
> +other tags that are commonly used during QEMU development
> +
> + * **``Reviewed-by``**: when a QEMU community member reviews a patch
> +   on the mailing list, if they consider the patch acceptable, they
> +   should send an email reply containing a ``Reviewed-by`` tag.
> +
> +   NB: a subsystem maintainer sending a pull request would replace
> +   their own ``Reviewed-by`` with another ``Signed-off-by``

As Philippe already mentioned, this isn't necessarily the case. It's a
common enough practice to add a S-o-b (which technically only certifies
the DCO) without removing the R-b (which tells that the content was
actually reviewed in detail - maintainers don't always do that if there
are already R-bs from trusted community members).

> + * **``Acked-by``**: when a QEMU subsystem maintainer approves a patch
> +   that touches their subsystem, but intends to allow a different
> +   maintainer to queue it and send a pull request, they would send
> +   a mail containing a ``Acked-by`` tag.
> +   

Trailing whitespace?

> + * **``Tested-by``**: when a QEMU community member has functionally
> +   tested the behaviour of the patch in some manner, they should
> +   send an email reply conmtaning a ``Tested-by`` tag.
> +
> + * **``Reported-by``**: when a QEMU community member reports a problem
> +   via the mailing list, or some other informal channel that is not
> +   the issue tracker, it is good practice to credit them by including
> +   a ``Reported-by`` tag on any patch fixing the issue. When the
> +   problem is reported via the GitLab issue tracker, however, it is
> +   sufficient to just include a link to the issue.
> +
> +Subsystem maintainer requirements
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +When a subsystem maintainer accepts a patch from a contributor, in
> +addition to the normal code review points, they are expected to validate
> +the presence of suitable ``Signed-off-by`` tags.
> +
> +At the time they queue the patch in their subsystem tree, the maintainer
> +**MUST** also then add their own ``Signed-off-by`` to indicate that they
> +have done the aforementioned validation.
> +
> +The subsystem maintainer submitting a pull request is **NOT** expected to
> +have a ``Reviewed-by`` tag on the patch, since this is implied by their
> +own ``Signed-off-by``.

Considering the above, I would remove this last paragraph.

Kevin



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators
  2023-11-23 11:40 ` [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators Daniel P. Berrangé
  2023-11-23 12:57   ` Alex Bennée
@ 2023-11-23 13:20   ` Kevin Wolf
  2023-11-23 14:35   ` Michael S. Tsirkin
  2023-11-23 15:22   ` Stefan Hajnoczi
  3 siblings, 0 replies; 57+ messages in thread
From: Kevin Wolf @ 2023-11-23 13:20 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Richard Henderson, Alexander Graf, Alex Bennée,
	Paolo Bonzini, Michael S. Tsirkin, Markus Armbruster,
	Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth,
	Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell

Am 23.11.2023 um 12:40 hat Daniel P. Berrangé geschrieben:
> There has been an explosion of interest in so called "AI" (LLM)
> code generators in the past year or so. Thus far though, this is
> has not been matched by a broadly accepted legal interpretation
> of the licensing implications for code generator outputs. While
> the vendors may claim there is no problem and a free choice of
> license is possible, they have an inherent conflict of interest
> in promoting this interpretation. More broadly there is, as yet,
> no broad consensus on the licensing implications of code generators
> trained on inputs under a wide variety of licenses.
> 
> The DCO requires contributors to assert they have the right to
> contribute under the designated project license. Given the lack
> of consensus on the licensing of "AI" (LLM) code generator output,
> it is not considered credible to assert compliance with the DCO
> clause (b) or (c) where a patch includes such generated code.
> 
> This patch thus defines a policy that the QEMU project will not
> accept contributions where use of "AI" (LLM) code generators is
> either known, or suspected.
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>  docs/devel/code-provenance.rst | 40 ++++++++++++++++++++++++++++++++++
>  1 file changed, 40 insertions(+)
> 
> diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
> index b4591a2dec..a6e42c6b1b 100644
> --- a/docs/devel/code-provenance.rst
> +++ b/docs/devel/code-provenance.rst
> @@ -195,3 +195,43 @@ example::
>    Signed-off-by: Some Person <some.person@example.com>
>    [Rebased and added support for 'foo']
>    Signed-off-by: New Person <new.person@example.com>
> +
> +Use of "AI" (LLM) code generators
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +TL;DR:
> +
> +  **Current QEMU project policy is to DECLINE any contributions
> +  which are believed to include or derive from "AI" (LLM)
> +  generated code.**
> +
> +The existence of "AI" (`Large Language Model <https://en.wikipedia.org/wiki/Large_language_model>`__
> +/ LLM) code generators raises a number of difficult legal questions, a
> +number of which impact on Open Source projects. As noted earlier, the
> +QEMU community requires that contributors certify their patch submissions
> +are made in accordance with the rules of the :ref:`dco` (DCO). When a
> +patch contains "AI" generated code this raises difficulties with code
> +provenence and thus DCO compliance.
> +
> +To satisfy the DCO, the patch contributor has to fully understand
> +the origins and license of code they are contributing to QEMU. The
> +license terms that should apply to the output of an "AI" code generator
> +are ill-defined, given that both training data and operation of the
> +"AI" are typically opaque to the user. Even where the training data
> +is said to all be open source, it will likely be under a wide variety
> +of license terms.
> +
> +While the vendor's of "AI" code generators may promote the idea that
> +code output can be taken under a free choice of license, this is not
> +yet considered to be a generally accepted, nor tested, legal opinion.
> +
> +With this in mind, the QEMU maintainers does not consider it is

s/does/do/ or maybe s/maintainers/project/

> +currently possible to comply with DCO terms (b) or (c) for most "AI"
> +generated code.
> +
> +The QEMU maintainers thus require that contributors refrain from using
> +"AI" code generators on patches intended to be submitted to the project,
> +and will decline any contribution if use of "AI" is known or suspected.
> +
> +Examples of tools impacted by this policy includes both GitHub CoPilot,
> +and ChatGPT, amongst many others which are less well known.

Acked-by: Kevin Wolf <kwolf@redhat.com>



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off
  2023-11-23 11:40 ` [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off Daniel P. Berrangé
                     ` (2 preceding siblings ...)
  2023-11-23 13:16   ` Kevin Wolf
@ 2023-11-23 14:25   ` Michael S. Tsirkin
  2023-11-23 17:16     ` Daniel P. Berrangé
  2023-11-23 15:13   ` Stefan Hajnoczi
  2024-01-27 14:36   ` Zhao Liu
  5 siblings, 1 reply; 57+ messages in thread
From: Michael S. Tsirkin @ 2023-11-23 14:25 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Richard Henderson, Alexander Graf, Alex Bennée,
	Paolo Bonzini, Markus Armbruster, Phil Mathieu-Daudé,
	Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann,
	Mark Cave-Ayland, Peter Maydell

On Thu, Nov 23, 2023 at 11:40:25AM +0000, Daniel P. Berrangé wrote:
> Currently we have a short paragraph saying that patches must include
> a Signed-off-by line, and merely link to the kernel documentation.
> The linked kernel docs have alot of content beyond the part about
> sign-off an thus is misleading/distracting to QEMU contributors.
> 
> This introduces a dedicated 'code-provenance' page in QEMU talking
> about why we require sign-off, explaining the other tags we commonly
> use, and what to do in some edge cases.
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>

Great initiative! I think we needed this for a while now.



> ---
>  docs/devel/code-provenance.rst    | 197 ++++++++++++++++++++++++++++++
>  docs/devel/index-process.rst      |   1 +
>  docs/devel/submitting-a-patch.rst |  18 +--
>  3 files changed, 201 insertions(+), 15 deletions(-)
>  create mode 100644 docs/devel/code-provenance.rst
> 
> diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
> new file mode 100644
> index 0000000000..b4591a2dec
> --- /dev/null
> +++ b/docs/devel/code-provenance.rst
> @@ -0,0 +1,197 @@
> +.. _code-provenance:
> +
> +Code provenance
> +===============
> +
> +Certifying patch submissions
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The QEMU community **mandates** all contributors to certify provenance
> +of patch submissions they make to the project. To put it another way,
> +contributors must indicate that they are legally permitted to contribute
> +to the project.
> +
> +Certification is achieved with a low overhead by adding a single line
> +to the bottom of every git commit::
> +
> +   Signed-off-by: YOUR NAME <YOUR@EMAIL>
> +
> +This existence of this line asserts that the author of the patch is

The existence?

> +contributing in accordance with the `Developer's Certificate of
> +Origin <https://developercertifcate.org>`__:
> +
> +.. _dco:
> +
> +::
> +  Developer's Certificate of Origin 1.1
> +
> +  By making a contribution to this project, I certify that:
> +
> +  (a) The contribution was created in whole or in part by me and I
> +      have the right to submit it under the open source license
> +      indicated in the file; or
> +
> +  (b) The contribution is based upon previous work that, to the best
> +      of my knowledge, is covered under an appropriate open source
> +      license and I have the right under that license to submit that
> +      work with modifications, whether created in whole or in part
> +      by me, under the same open source license (unless I am
> +      permitted to submit under a different license), as indicated
> +      in the file; or
> +
> +  (c) The contribution was provided directly to me by some other
> +      person who certified (a), (b) or (c) and I have not modified
> +      it.
> +
> +  (d) I understand and agree that this project and the contribution
> +      are public and that a record of the contribution (including all
> +      personal information I submit with it, including my sign-off) is
> +      maintained indefinitely and may be redistributed consistent with
> +      this project or the open source license(s) involved.
> +
> +It is generally expected that the name and email addresses used in one
> +of the ``Signed-off-by`` lines, matches that of the git commit ``Author``
> +field. If the person sending the mail is also one of the patch authors,
> +it is further expected that the mail ``From:`` line name & address match
> +one of the ``Signed-off-by`` lines. 
> +
> +Multiple authorship
> +~~~~~~~~~~~~~~~~~~~
> +
> +It is not uncommon for a patch to have contributions from multiple
> +authors. In such a scenario, a git commit will usually be expected
> +to have a ``Signed-off-by`` line for each contributor involved in
> +creatin of the patch. Some edge cases:

creation

> +
> +  * The non-primary author's contributions were so trivial that
> +    they can be considered not subject to copyright. In this case
> +    the secondary authors need not include a ``Signed-off-by``.
> +
> +    This case most commonly applies where QEMU reviewers give short
> +    snippets of code as suggested fixes to a patch. The reviewers
> +    don't need to have their own ``Signed-off-by`` added unless
> +    their code suggestion was unusually large.

It is still a good policy to include attribution, e.g.
by adding a Suggested-by tag.


> +
> +  * Both contributors work for the same employer and the employer
> +    requires copyright assignment.
> +
> +    It can be said that in this case a ``Signed-off-by`` is indicating
> +    that the person has permission to contributeo from their employer

contribute

> +    who is the copyright holder. It is none the less still preferrable
> +    to include a ``Signed-off-by`` for each contributor, as in some
> +    countries employees are not able to assign copyright to their
> +    employer, and it also covers any time invested outside working
> +    hours.
> +
> +Other commit tags
> +~~~~~~~~~~~~~~~~~
> +
> +While the ``Signed-off-by`` tag is mandatory, there are a number of
> +other tags that are commonly used during QEMU development
> +
> + * **``Reviewed-by``**: when a QEMU community member reviews a patch
> +   on the mailing list, if they consider the patch acceptable, they
> +   should send an email reply containing a ``Reviewed-by`` tag.
> +
> +   NB: a subsystem maintainer sending a pull request would replace
> +   their own ``Reviewed-by`` with another ``Signed-off-by``
> +
> + * **``Acked-by``**: when a QEMU subsystem maintainer approves a patch
> +   that touches their subsystem, but intends to allow a different
> +   maintainer to queue it and send a pull request, they would send
> +   a mail containing a ``Acked-by`` tag.
> +   
> + * **``Tested-by``**: when a QEMU community member has functionally
> +   tested the behaviour of the patch in some manner, they should
> +   send an email reply conmtaning a ``Tested-by`` tag.
> +
> + * **``Reported-by``**: when a QEMU community member reports a problem
> +   via the mailing list, or some other informal channel that is not
> +   the issue tracker, it is good practice to credit them by including
> +   a ``Reported-by`` tag on any patch fixing the issue. When the
> +   problem is reported via the GitLab issue tracker, however, it is
> +   sufficient to just include a link to the issue.


Suggested-by is also common.

As long as we are here, let's document Fixes: and Cc: ?


> +Subsystem maintainer requirements
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +When a subsystem maintainer accepts a patch from a contributor, in
> +addition to the normal code review points, they are expected to validate
> +the presence of suitable ``Signed-off-by`` tags.
> +
> +At the time they queue the patch in their subsystem tree, the maintainer
> +**MUST** also then add their own ``Signed-off-by`` to indicate that they
> +have done the aforementioned validation.


Below you say **must** - I think that is better, no need to shout.

> +
> +The subsystem maintainer submitting a pull request is **NOT** expected to
> +have a ``Reviewed-by`` tag on the patch, since this is implied by their
> +own ``Signed-off-by``.
> +  
> +Tools for adding ``Signed-of-by``


Signed-off-by

> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +There are a variety of ways tools can support adding ``Signed-off-by``
> +tags for patches, avoiding the need for contributors to manually
> +type in this repetitive text each time.
> +
> +git commands
> +^^^^^^^^^^^^
> +
> +When creating, or amending, a commit the ``-s`` flag to ``git commit``
> +will append a suitable line matching the configuring git author
> +details.
> +
> +If preparing patches using the ``git format-patch`` tool, the ``-s``
> +flag can be used to append a suitable line in the emails it creates,
> +without modifying the local commits. Alternatively to modify the
> +local commits on a branch en-mass::
> +
> +  git rebase master -x 'git commit --amend --no-edit -s'
> +
> +emacs
> +^^^^^
> +
> +In the file ``$HOME/.emacs.d/abbrev_defs`` add::
> +
> +  (define-abbrev-table 'global-abbrev-table
> +    '(
> +      ("8rev" "Reviewed-by: YOUR NAME <your@email.addr>" nil 1)
> +      ("8ack" "Acked-by: YOUR NAME <your@email.addr>" nil 1)
> +      ("8test" "Tested-by: YOUR NAME <your@email.addr>" nil 1)
> +      ("8sob" "Signed-off-by: YOUR NAME <your@email.addr>" nil 1)
> +     ))
> +
> +with this change, if you type (for example) ``8rev`` followed
> +by ``<space>`` or ``<enter>`` it will expand to the whole phrase. 
> +
> +vim
> +^^^
> +
> +In the file ``$HOME/.vimrc`` add::
> +
> +  iabbrev 8rev Reviewed-by: YOUR NAME <your@email.addr>
> +  iabbrev 8ack Acked-by: YOUR NAME <your@email.addr>
> +  iabbrev 8test Tested-by: YOUR NAME <your@email.addr>
> +  iabbrev 8sob Signed-off-by: YOUR NAME <your@email.addr>
> +
> +with this change, if you type (for example) ``8rev`` followed
> +by ``<space>`` or ``<enter>`` it will expand to the whole phrase. 
> +
> +Re-starting abandoned work
> +~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +For a variety of reasons there are some patches that get submitted to
> +QEMU but never merged. An unrelated contributor may decide (months or
> +years later) to continue working from the abandoned patch and re-submit
> +it with extra changes.
> +
> +If the abandoned patch already had a ``Signed-off-by`` from the original
> +author this **must** be preserved. The new contributor **must** then add
> +their own ``Signed-off-by`` after the original one if they made any
> +further changes to it. It is common to include a comment just prior to
> +the new ``Signed-off-by`` indicating what extra changes were made. For
> +example::
> +
> +  Signed-off-by: Some Person <some.person@example.com>
> +  [Rebased and added support for 'foo']
> +  Signed-off-by: New Person <new.person@example.com>
> diff --git a/docs/devel/index-process.rst b/docs/devel/index-process.rst
> index 362f97ee30..b54e58105e 100644
> --- a/docs/devel/index-process.rst
> +++ b/docs/devel/index-process.rst
> @@ -13,6 +13,7 @@ Notes about how to interact with the community and how and where to submit patch
>     maintainers
>     style
>     submitting-a-patch
> +   code-provenance
>     trivial-patches
>     stable-process
>     submitting-a-pull-request
> diff --git a/docs/devel/submitting-a-patch.rst b/docs/devel/submitting-a-patch.rst
> index c641d948f1..ec541b3d15 100644
> --- a/docs/devel/submitting-a-patch.rst
> +++ b/docs/devel/submitting-a-patch.rst
> @@ -322,21 +322,9 @@ Patch emails must include a ``Signed-off-by:`` line
>  
>  Your patches **must** include a Signed-off-by: line. This is a hard
>  requirement because it's how you say "I'm legally okay to contribute
> -this and happy for it to go into QEMU". The process is modelled after
> -the `Linux kernel
> -<http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches?id=f6f94e2ab1b33f0082ac22d71f66385a60d8157f#n297>`__
> -policy.
> -
> -If you wrote the patch, make sure your "From:" and "Signed-off-by:"
> -lines use the same spelling. It's okay if you subscribe or contribute to
> -the list via more than one address, but using multiple addresses in one
> -commit just confuses things. If someone else wrote the patch, git will
> -include a "From:" line in the body of the email (different from your
> -envelope From:) that will give credit to the correct author; but again,
> -that author's Signed-off-by: line is mandatory, with the same spelling.
> -
> -There are various tooling options for automatically adding these tags
> -include using ``git commit -s`` or ``git format-patch -s``. For more
> +this and happy for it to go into QEMU". For full guidance, read the
> +:ref:`code-provenance` documentation.
> +
>  information see `SubmittingPatches 1.12
>  <http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches?id=f6f94e2ab1b33f0082ac22d71f66385a60d8157f#n297>`__.

this "information" now looks orphaned or am I confused?


> -- 
> 2.41.0



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators
  2023-11-23 11:40 ` [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators Daniel P. Berrangé
  2023-11-23 12:57   ` Alex Bennée
  2023-11-23 13:20   ` Kevin Wolf
@ 2023-11-23 14:35   ` Michael S. Tsirkin
  2023-11-23 14:56     ` Manos Pitsidianakis
  2023-11-23 17:58     ` Daniel P. Berrangé
  2023-11-23 15:22   ` Stefan Hajnoczi
  3 siblings, 2 replies; 57+ messages in thread
From: Michael S. Tsirkin @ 2023-11-23 14:35 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Richard Henderson, Alexander Graf, Alex Bennée,
	Paolo Bonzini, Markus Armbruster, Phil Mathieu-Daudé,
	Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann,
	Mark Cave-Ayland, Peter Maydell

On Thu, Nov 23, 2023 at 11:40:26AM +0000, Daniel P. Berrangé wrote:
> There has been an explosion of interest in so called "AI" (LLM)
> code generators in the past year or so. Thus far though, this is
> has not been matched by a broadly accepted legal interpretation
> of the licensing implications for code generator outputs. While
> the vendors may claim there is no problem and a free choice of
> license is possible, they have an inherent conflict of interest
> in promoting this interpretation. More broadly there is, as yet,
> no broad consensus on the licensing implications of code generators
> trained on inputs under a wide variety of licenses.
> 
> The DCO requires contributors to assert they have the right to
> contribute under the designated project license. Given the lack
> of consensus on the licensing of "AI" (LLM) code generator output,
> it is not considered credible to assert compliance with the DCO
> clause (b) or (c) where a patch includes such generated code.
> 
> This patch thus defines a policy that the QEMU project will not
> accept contributions where use of "AI" (LLM) code generators is
> either known, or suspected.
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>  docs/devel/code-provenance.rst | 40 ++++++++++++++++++++++++++++++++++
>  1 file changed, 40 insertions(+)
> 
> diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
> index b4591a2dec..a6e42c6b1b 100644
> --- a/docs/devel/code-provenance.rst
> +++ b/docs/devel/code-provenance.rst
> @@ -195,3 +195,43 @@ example::
>    Signed-off-by: Some Person <some.person@example.com>
>    [Rebased and added support for 'foo']
>    Signed-off-by: New Person <new.person@example.com>
> +
> +Use of "AI" (LLM) code generators
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +TL;DR:
> +
> +  **Current QEMU project policy is to DECLINE any contributions
> +  which are believed to include or derive from "AI" (LLM)
> +  generated code.**
> +
> +The existence of "AI" (`Large Language Model <https://en.wikipedia.org/wiki/Large_language_model>`__
> +/ LLM) code generators raises a number of difficult legal questions, a
> +number of which impact on Open Source projects. As noted earlier, the
> +QEMU community requires that contributors certify their patch submissions
> +are made in accordance with the rules of the :ref:`dco` (DCO). When a
> +patch contains "AI" generated code this raises difficulties with code
> +provenence and thus DCO compliance.
> +
> +To satisfy the DCO, the patch contributor has to fully understand
> +the origins and license of code they are contributing to QEMU. The
> +license terms that should apply to the output of an "AI" code generator
> +are ill-defined, given that both training data and operation of the
> +"AI" are typically opaque to the user. Even where the training data
> +is said to all be open source, it will likely be under a wide variety
> +of license terms.
> +
> +While the vendor's of "AI" code generators may promote the idea that
> +code output can be taken under a free choice of license, this is not
> +yet considered to be a generally accepted, nor tested, legal opinion.
> +
> +With this in mind, the QEMU maintainers does not consider it is
> +currently possible to comply with DCO terms (b) or (c) for most "AI"
> +generated code.
> +
> +The QEMU maintainers thus require that contributors refrain from using
> +"AI" code generators on patches intended to be submitted to the project,
> +and will decline any contribution if use of "AI" is known or suspected.
> +
> +Examples of tools impacted by this policy includes both GitHub CoPilot,
> +and ChatGPT, amongst many others which are less well known.


So you called out these two by name, fine, but given "AI" is in scare
quotes I don't really know what is or is not allowed and I don't know
how will contributors know.  Is the "AI" that one must not use
necessarily an LLM?  And how do you define LLM even? Wikipedia says
"general-purpose language understanding and generation".


All this seems vague to me.


However, can't we define a simpler more specific policy?
For example, isn't it true that *any* automatically generated code
can only be included if the scripts producing said code
are also included or otherwise available under GPLv2?




> -- 
> 2.41.0



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators
  2023-11-23 14:35   ` Michael S. Tsirkin
@ 2023-11-23 14:56     ` Manos Pitsidianakis
  2023-11-23 15:13       ` Michael S. Tsirkin
                         ` (4 more replies)
  2023-11-23 17:58     ` Daniel P. Berrangé
  1 sibling, 5 replies; 57+ messages in thread
From: Manos Pitsidianakis @ 2023-11-23 14:56 UTC (permalink / raw)
  To: qemu-devel, Michael S. Tsirkin, Daniel P. Berrangé 
  Cc: qemu-devel, Richard Henderson, Alexander Graf, Alex Benné e,
	Paolo Bonzini, Markus Armbruster, Phil Mathieu-Daudé ,
	Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann,
	Mark Cave-Ayland, Peter Maydell

On Thu, 23 Nov 2023 16:35, "Michael S. Tsirkin" <mst@redhat.com> wrote:
>On Thu, Nov 23, 2023 at 11:40:26AM +0000, Daniel P. Berrangé wrote:
>> There has been an explosion of interest in so called "AI" (LLM)
>> code generators in the past year or so. Thus far though, this is
>> has not been matched by a broadly accepted legal interpretation
>> of the licensing implications for code generator outputs. While
>> the vendors may claim there is no problem and a free choice of
>> license is possible, they have an inherent conflict of interest
>> in promoting this interpretation. More broadly there is, as yet,
>> no broad consensus on the licensing implications of code generators
>> trained on inputs under a wide variety of licenses.
>> 
>> The DCO requires contributors to assert they have the right to
>> contribute under the designated project license. Given the lack
>> of consensus on the licensing of "AI" (LLM) code generator output,
>> it is not considered credible to assert compliance with the DCO
>> clause (b) or (c) where a patch includes such generated code.
>> 
>> This patch thus defines a policy that the QEMU project will not
>> accept contributions where use of "AI" (LLM) code generators is
>> either known, or suspected.
>> 
>> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
>> ---
>>  docs/devel/code-provenance.rst | 40 ++++++++++++++++++++++++++++++++++
>>  1 file changed, 40 insertions(+)
>> 
>> diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
>> index b4591a2dec..a6e42c6b1b 100644
>> --- a/docs/devel/code-provenance.rst
>> +++ b/docs/devel/code-provenance.rst
>> @@ -195,3 +195,43 @@ example::
>>    Signed-off-by: Some Person <some.person@example.com>
>>    [Rebased and added support for 'foo']
>>    Signed-off-by: New Person <new.person@example.com>
>> +
>> +Use of "AI" (LLM) code generators
>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> +
>> +TL;DR:
>> +
>> +  **Current QEMU project policy is to DECLINE any contributions
>> +  which are believed to include or derive from "AI" (LLM)
>> +  generated code.**
>> +
>> +The existence of "AI" (`Large Language Model <https://en.wikipedia.org/wiki/Large_language_model>`__
>> +/ LLM) code generators raises a number of difficult legal questions, a
>> +number of which impact on Open Source projects. As noted earlier, the
>> +QEMU community requires that contributors certify their patch submissions
>> +are made in accordance with the rules of the :ref:`dco` (DCO). When a
>> +patch contains "AI" generated code this raises difficulties with code
>> +provenence and thus DCO compliance.
>> +
>> +To satisfy the DCO, the patch contributor has to fully understand
>> +the origins and license of code they are contributing to QEMU. The
>> +license terms that should apply to the output of an "AI" code generator
>> +are ill-defined, given that both training data and operation of the
>> +"AI" are typically opaque to the user. Even where the training data
>> +is said to all be open source, it will likely be under a wide variety
>> +of license terms.
>> +
>> +While the vendor's of "AI" code generators may promote the idea that
>> +code output can be taken under a free choice of license, this is not
>> +yet considered to be a generally accepted, nor tested, legal opinion.
>> +
>> +With this in mind, the QEMU maintainers does not consider it is
>> +currently possible to comply with DCO terms (b) or (c) for most "AI"
>> +generated code.
>> +
>> +The QEMU maintainers thus require that contributors refrain from using
>> +"AI" code generators on patches intended to be submitted to the project,
>> +and will decline any contribution if use of "AI" is known or suspected.
>> +
>> +Examples of tools impacted by this policy includes both GitHub CoPilot,
>> +and ChatGPT, amongst many others which are less well known.
>
>
>So you called out these two by name, fine, but given "AI" is in scare
>quotes I don't really know what is or is not allowed and I don't know
>how will contributors know.  Is the "AI" that one must not use
>necessarily an LLM?  And how do you define LLM even? Wikipedia says
>"general-purpose language understanding and generation".
>
>
>All this seems vague to me.
>
>
>However, can't we define a simpler more specific policy?
>For example, isn't it true that *any* automatically generated code
>can only be included if the scripts producing said code
>are also included or otherwise available under GPLv2?

The following definition makes sense to me:

- Automated codegen tool must be idempotent.
- Automated codegen tool must not use statistical modelling.

I'd remove all AI or LLM references. These are non-specific, colloquial 
and in the case of `AI`, non-technical. This policy should apply the 
same to a Markov chain code generator.


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off
  2023-11-23 11:40 ` [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off Daniel P. Berrangé
                     ` (3 preceding siblings ...)
  2023-11-23 14:25   ` Michael S. Tsirkin
@ 2023-11-23 15:13   ` Stefan Hajnoczi
  2024-01-27 14:36   ` Zhao Liu
  5 siblings, 0 replies; 57+ messages in thread
From: Stefan Hajnoczi @ 2023-11-23 15:13 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Richard Henderson, Alexander Graf, Alex Bennée,
	Paolo Bonzini, Michael S. Tsirkin, Markus Armbruster,
	Phil Mathieu-Daudé, Thomas Huth, Kevin Wolf, Gerd Hoffmann,
	Mark Cave-Ayland, Peter Maydell

[-- Attachment #1: Type: text/plain, Size: 11979 bytes --]

On Thu, Nov 23, 2023 at 11:40:25AM +0000, Daniel P. Berrangé wrote:
> Currently we have a short paragraph saying that patches must include
> a Signed-off-by line, and merely link to the kernel documentation.
> The linked kernel docs have alot of content beyond the part about
> sign-off an thus is misleading/distracting to QEMU contributors.
> 
> This introduces a dedicated 'code-provenance' page in QEMU talking
> about why we require sign-off, explaining the other tags we commonly
> use, and what to do in some edge cases.
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>  docs/devel/code-provenance.rst    | 197 ++++++++++++++++++++++++++++++
>  docs/devel/index-process.rst      |   1 +
>  docs/devel/submitting-a-patch.rst |  18 +--
>  3 files changed, 201 insertions(+), 15 deletions(-)
>  create mode 100644 docs/devel/code-provenance.rst
> 
> diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
> new file mode 100644
> index 0000000000..b4591a2dec
> --- /dev/null
> +++ b/docs/devel/code-provenance.rst
> @@ -0,0 +1,197 @@
> +.. _code-provenance:
> +
> +Code provenance
> +===============
> +
> +Certifying patch submissions
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The QEMU community **mandates** all contributors to certify provenance
> +of patch submissions they make to the project. To put it another way,
> +contributors must indicate that they are legally permitted to contribute
> +to the project.
> +
> +Certification is achieved with a low overhead by adding a single line
> +to the bottom of every git commit::
> +
> +   Signed-off-by: YOUR NAME <YOUR@EMAIL>
> +
> +This existence of this line asserts that the author of the patch is
> +contributing in accordance with the `Developer's Certificate of
> +Origin <https://developercertifcate.org>`__:
> +
> +.. _dco:
> +
> +::
> +  Developer's Certificate of Origin 1.1
> +
> +  By making a contribution to this project, I certify that:
> +
> +  (a) The contribution was created in whole or in part by me and I
> +      have the right to submit it under the open source license
> +      indicated in the file; or
> +
> +  (b) The contribution is based upon previous work that, to the best
> +      of my knowledge, is covered under an appropriate open source
> +      license and I have the right under that license to submit that
> +      work with modifications, whether created in whole or in part
> +      by me, under the same open source license (unless I am
> +      permitted to submit under a different license), as indicated
> +      in the file; or
> +
> +  (c) The contribution was provided directly to me by some other
> +      person who certified (a), (b) or (c) and I have not modified
> +      it.
> +
> +  (d) I understand and agree that this project and the contribution
> +      are public and that a record of the contribution (including all
> +      personal information I submit with it, including my sign-off) is
> +      maintained indefinitely and may be redistributed consistent with
> +      this project or the open source license(s) involved.
> +
> +It is generally expected that the name and email addresses used in one
> +of the ``Signed-off-by`` lines, matches that of the git commit ``Author``
> +field. If the person sending the mail is also one of the patch authors,
> +it is further expected that the mail ``From:`` line name & address match
> +one of the ``Signed-off-by`` lines. 
> +
> +Multiple authorship
> +~~~~~~~~~~~~~~~~~~~
> +
> +It is not uncommon for a patch to have contributions from multiple
> +authors. In such a scenario, a git commit will usually be expected
> +to have a ``Signed-off-by`` line for each contributor involved in
> +creatin of the patch. Some edge cases:
> +
> +  * The non-primary author's contributions were so trivial that
> +    they can be considered not subject to copyright. In this case
> +    the secondary authors need not include a ``Signed-off-by``.
> +
> +    This case most commonly applies where QEMU reviewers give short
> +    snippets of code as suggested fixes to a patch. The reviewers
> +    don't need to have their own ``Signed-off-by`` added unless
> +    their code suggestion was unusually large.
> +
> +  * Both contributors work for the same employer and the employer
> +    requires copyright assignment.
> +
> +    It can be said that in this case a ``Signed-off-by`` is indicating
> +    that the person has permission to contributeo from their employer

s/contributeo/contribute/

> +    who is the copyright holder. It is none the less still preferrable
> +    to include a ``Signed-off-by`` for each contributor, as in some
> +    countries employees are not able to assign copyright to their
> +    employer, and it also covers any time invested outside working
> +    hours.
> +
> +Other commit tags
> +~~~~~~~~~~~~~~~~~
> +
> +While the ``Signed-off-by`` tag is mandatory, there are a number of
> +other tags that are commonly used during QEMU development
> +
> + * **``Reviewed-by``**: when a QEMU community member reviews a patch
> +   on the mailing list, if they consider the patch acceptable, they
> +   should send an email reply containing a ``Reviewed-by`` tag.
> +
> +   NB: a subsystem maintainer sending a pull request would replace
> +   their own ``Reviewed-by`` with another ``Signed-off-by``
> +
> + * **``Acked-by``**: when a QEMU subsystem maintainer approves a patch
> +   that touches their subsystem, but intends to allow a different
> +   maintainer to queue it and send a pull request, they would send
> +   a mail containing a ``Acked-by`` tag.
> +   
> + * **``Tested-by``**: when a QEMU community member has functionally
> +   tested the behaviour of the patch in some manner, they should
> +   send an email reply conmtaning a ``Tested-by`` tag.

s/conmtaning/containing/

> +
> + * **``Reported-by``**: when a QEMU community member reports a problem
> +   via the mailing list, or some other informal channel that is not
> +   the issue tracker, it is good practice to credit them by including
> +   a ``Reported-by`` tag on any patch fixing the issue. When the
> +   problem is reported via the GitLab issue tracker, however, it is
> +   sufficient to just include a link to the issue.
> +
> +Subsystem maintainer requirements
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +When a subsystem maintainer accepts a patch from a contributor, in
> +addition to the normal code review points, they are expected to validate
> +the presence of suitable ``Signed-off-by`` tags.
> +
> +At the time they queue the patch in their subsystem tree, the maintainer
> +**MUST** also then add their own ``Signed-off-by`` to indicate that they
> +have done the aforementioned validation.
> +
> +The subsystem maintainer submitting a pull request is **NOT** expected to
> +have a ``Reviewed-by`` tag on the patch, since this is implied by their
> +own ``Signed-off-by``.
> +  
> +Tools for adding ``Signed-of-by``

s/Signed-of-by/Signed-off-by/

> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +There are a variety of ways tools can support adding ``Signed-off-by``
> +tags for patches, avoiding the need for contributors to manually
> +type in this repetitive text each time.
> +
> +git commands
> +^^^^^^^^^^^^
> +
> +When creating, or amending, a commit the ``-s`` flag to ``git commit``
> +will append a suitable line matching the configuring git author
> +details.
> +
> +If preparing patches using the ``git format-patch`` tool, the ``-s``
> +flag can be used to append a suitable line in the emails it creates,
> +without modifying the local commits. Alternatively to modify the
> +local commits on a branch en-mass::
> +
> +  git rebase master -x 'git commit --amend --no-edit -s'
> +
> +emacs
> +^^^^^
> +
> +In the file ``$HOME/.emacs.d/abbrev_defs`` add::
> +
> +  (define-abbrev-table 'global-abbrev-table
> +    '(
> +      ("8rev" "Reviewed-by: YOUR NAME <your@email.addr>" nil 1)
> +      ("8ack" "Acked-by: YOUR NAME <your@email.addr>" nil 1)
> +      ("8test" "Tested-by: YOUR NAME <your@email.addr>" nil 1)
> +      ("8sob" "Signed-off-by: YOUR NAME <your@email.addr>" nil 1)
> +     ))
> +
> +with this change, if you type (for example) ``8rev`` followed
> +by ``<space>`` or ``<enter>`` it will expand to the whole phrase. 
> +
> +vim
> +^^^
> +
> +In the file ``$HOME/.vimrc`` add::
> +
> +  iabbrev 8rev Reviewed-by: YOUR NAME <your@email.addr>
> +  iabbrev 8ack Acked-by: YOUR NAME <your@email.addr>
> +  iabbrev 8test Tested-by: YOUR NAME <your@email.addr>
> +  iabbrev 8sob Signed-off-by: YOUR NAME <your@email.addr>
> +
> +with this change, if you type (for example) ``8rev`` followed
> +by ``<space>`` or ``<enter>`` it will expand to the whole phrase. 
> +
> +Re-starting abandoned work
> +~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +For a variety of reasons there are some patches that get submitted to
> +QEMU but never merged. An unrelated contributor may decide (months or
> +years later) to continue working from the abandoned patch and re-submit
> +it with extra changes.
> +
> +If the abandoned patch already had a ``Signed-off-by`` from the original
> +author this **must** be preserved. The new contributor **must** then add
> +their own ``Signed-off-by`` after the original one if they made any
> +further changes to it. It is common to include a comment just prior to
> +the new ``Signed-off-by`` indicating what extra changes were made. For
> +example::
> +
> +  Signed-off-by: Some Person <some.person@example.com>
> +  [Rebased and added support for 'foo']
> +  Signed-off-by: New Person <new.person@example.com>
> diff --git a/docs/devel/index-process.rst b/docs/devel/index-process.rst
> index 362f97ee30..b54e58105e 100644
> --- a/docs/devel/index-process.rst
> +++ b/docs/devel/index-process.rst
> @@ -13,6 +13,7 @@ Notes about how to interact with the community and how and where to submit patch
>     maintainers
>     style
>     submitting-a-patch
> +   code-provenance
>     trivial-patches
>     stable-process
>     submitting-a-pull-request
> diff --git a/docs/devel/submitting-a-patch.rst b/docs/devel/submitting-a-patch.rst
> index c641d948f1..ec541b3d15 100644
> --- a/docs/devel/submitting-a-patch.rst
> +++ b/docs/devel/submitting-a-patch.rst
> @@ -322,21 +322,9 @@ Patch emails must include a ``Signed-off-by:`` line
>  
>  Your patches **must** include a Signed-off-by: line. This is a hard
>  requirement because it's how you say "I'm legally okay to contribute
> -this and happy for it to go into QEMU". The process is modelled after
> -the `Linux kernel
> -<http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches?id=f6f94e2ab1b33f0082ac22d71f66385a60d8157f#n297>`__
> -policy.
> -
> -If you wrote the patch, make sure your "From:" and "Signed-off-by:"
> -lines use the same spelling. It's okay if you subscribe or contribute to
> -the list via more than one address, but using multiple addresses in one
> -commit just confuses things. If someone else wrote the patch, git will
> -include a "From:" line in the body of the email (different from your
> -envelope From:) that will give credit to the correct author; but again,
> -that author's Signed-off-by: line is mandatory, with the same spelling.
> -
> -There are various tooling options for automatically adding these tags
> -include using ``git commit -s`` or ``git format-patch -s``. For more
> +this and happy for it to go into QEMU". For full guidance, read the
> +:ref:`code-provenance` documentation.
> +
>  information see `SubmittingPatches 1.12
>  <http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches?id=f6f94e2ab1b33f0082ac22d71f66385a60d8157f#n297>`__.
>  
> -- 
> 2.41.0
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators
  2023-11-23 14:56     ` Manos Pitsidianakis
@ 2023-11-23 15:13       ` Michael S. Tsirkin
  2023-11-23 15:29       ` Philippe Mathieu-Daudé
                         ` (3 subsequent siblings)
  4 siblings, 0 replies; 57+ messages in thread
From: Michael S. Tsirkin @ 2023-11-23 15:13 UTC (permalink / raw)
  To: Manos Pitsidianakis
  Cc: qemu-devel, Daniel P. Berrangé, Richard Henderson,
	Alexander Graf, Alex Benné e, Paolo Bonzini,
	Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi,
	Thomas Huth, Kevin Wolf, Gerd Hoffmann, Mark Cave-Ayland,
	Peter Maydell

On Thu, Nov 23, 2023 at 04:56:28PM +0200, Manos Pitsidianakis wrote:
> > However, can't we define a simpler more specific policy?
> > For example, isn't it true that *any* automatically generated code
> > can only be included if the scripts producing said code
> > are also included or otherwise available under GPLv2?
> 
> The following definition makes sense to me:
> 
> - Automated codegen tool must be idempotent.
> - Automated codegen tool must not use statistical modelling.

Why does it matter so much?

> I'd remove all AI or LLM references. These are non-specific, colloquial and
> in the case of `AI`, non-technical. This policy should apply the same to a
> Markov chain code generator.

-- 
MST



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators
  2023-11-23 11:40 ` [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators Daniel P. Berrangé
                     ` (2 preceding siblings ...)
  2023-11-23 14:35   ` Michael S. Tsirkin
@ 2023-11-23 15:22   ` Stefan Hajnoczi
  3 siblings, 0 replies; 57+ messages in thread
From: Stefan Hajnoczi @ 2023-11-23 15:22 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Richard Henderson, Alexander Graf, Alex Bennée,
	Paolo Bonzini, Michael S. Tsirkin, Markus Armbruster,
	Phil Mathieu-Daudé, Thomas Huth, Kevin Wolf, Gerd Hoffmann,
	Mark Cave-Ayland, Peter Maydell

[-- Attachment #1: Type: text/plain, Size: 1685 bytes --]

On Thu, Nov 23, 2023 at 11:40:26AM +0000, Daniel P. Berrangé wrote:
> There has been an explosion of interest in so called "AI" (LLM)
> code generators in the past year or so. Thus far though, this is
> has not been matched by a broadly accepted legal interpretation
> of the licensing implications for code generator outputs. While
> the vendors may claim there is no problem and a free choice of
> license is possible, they have an inherent conflict of interest
> in promoting this interpretation. More broadly there is, as yet,
> no broad consensus on the licensing implications of code generators
> trained on inputs under a wide variety of licenses.
> 
> The DCO requires contributors to assert they have the right to
> contribute under the designated project license. Given the lack
> of consensus on the licensing of "AI" (LLM) code generator output,
> it is not considered credible to assert compliance with the DCO
> clause (b) or (c) where a patch includes such generated code.
> 
> This patch thus defines a policy that the QEMU project will not
> accept contributions where use of "AI" (LLM) code generators is
> either known, or suspected.
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>  docs/devel/code-provenance.rst | 40 ++++++++++++++++++++++++++++++++++
>  1 file changed, 40 insertions(+)

As open source LLMs mature, it may be possible to curate the training
data so that the output complies with software licenses and can be used
in QEMU.

For the time being, the position in this patch seems reasonable because
it prevents license problems down the road.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators
  2023-11-23 14:56     ` Manos Pitsidianakis
  2023-11-23 15:13       ` Michael S. Tsirkin
@ 2023-11-23 15:29       ` Philippe Mathieu-Daudé
  2023-11-23 17:06         ` Michael S. Tsirkin
  2023-11-23 15:32       ` Alex Bennée
                         ` (2 subsequent siblings)
  4 siblings, 1 reply; 57+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-11-23 15:29 UTC (permalink / raw)
  To: Manos Pitsidianakis, qemu-devel, Michael S. Tsirkin,
	Daniel P. Berrangé
  Cc: Richard Henderson, Alexander Graf, Alex Benn é e,
	Paolo Bonzini, Markus Armbruster, Stefan Hajnoczi, Thomas Huth,
	Kevin Wolf, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell

On 23/11/23 15:56, Manos Pitsidianakis wrote:
> On Thu, 23 Nov 2023 16:35, "Michael S. Tsirkin" <mst@redhat.com> wrote:
>> On Thu, Nov 23, 2023 at 11:40:26AM +0000, Daniel P. Berrangé wrote:
>>> There has been an explosion of interest in so called "AI" (LLM)
>>> code generators in the past year or so. Thus far though, this is
>>> has not been matched by a broadly accepted legal interpretation
>>> of the licensing implications for code generator outputs. While
>>> the vendors may claim there is no problem and a free choice of
>>> license is possible, they have an inherent conflict of interest
>>> in promoting this interpretation. More broadly there is, as yet,
>>> no broad consensus on the licensing implications of code generators
>>> trained on inputs under a wide variety of licenses.
>>>
>>> The DCO requires contributors to assert they have the right to
>>> contribute under the designated project license. Given the lack
>>> of consensus on the licensing of "AI" (LLM) code generator output,
>>> it is not considered credible to assert compliance with the DCO
>>> clause (b) or (c) where a patch includes such generated code.
>>>
>>> This patch thus defines a policy that the QEMU project will not
>>> accept contributions where use of "AI" (LLM) code generators is
>>> either known, or suspected.
>>>
>>> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
>>> ---
>>>  docs/devel/code-provenance.rst | 40 ++++++++++++++++++++++++++++++++++
>>>  1 file changed, 40 insertions(+)


>>> +Use of "AI" (LLM) code generators
>>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>> +
>>> +TL;DR:
>>> +
>>> +  **Current QEMU project policy is to DECLINE any contributions
>>> +  which are believed to include or derive from "AI" (LLM)
>>> +  generated code.**
>>> +
>>> +The existence of "AI" (`Large Language Model 
>>> <https://en.wikipedia.org/wiki/Large_language_model>`__
>>> +/ LLM) code generators raises a number of difficult legal questions, a
>>> +number of which impact on Open Source projects. As noted earlier, the
>>> +QEMU community requires that contributors certify their patch 
>>> submissions
>>> +are made in accordance with the rules of the :ref:`dco` (DCO). When a
>>> +patch contains "AI" generated code this raises difficulties with code
>>> +provenence and thus DCO compliance.
>>> +
>>> +To satisfy the DCO, the patch contributor has to fully understand
>>> +the origins and license of code they are contributing to QEMU. The
>>> +license terms that should apply to the output of an "AI" code generator
>>> +are ill-defined, given that both training data and operation of the
>>> +"AI" are typically opaque to the user. Even where the training data
>>> +is said to all be open source, it will likely be under a wide variety
>>> +of license terms.
>>> +
>>> +While the vendor's of "AI" code generators may promote the idea that
>>> +code output can be taken under a free choice of license, this is not
>>> +yet considered to be a generally accepted, nor tested, legal opinion.
>>> +
>>> +With this in mind, the QEMU maintainers does not consider it is
>>> +currently possible to comply with DCO terms (b) or (c) for most "AI"
>>> +generated code.
>>> +
>>> +The QEMU maintainers thus require that contributors refrain from using
>>> +"AI" code generators on patches intended to be submitted to the 
>>> project,
>>> +and will decline any contribution if use of "AI" is known or suspected.
>>> +
>>> +Examples of tools impacted by this policy includes both GitHub CoPilot,
>>> +and ChatGPT, amongst many others which are less well known.
>>
>>
>> So you called out these two by name, fine, but given "AI" is in scare
>> quotes I don't really know what is or is not allowed and I don't know
>> how will contributors know.  Is the "AI" that one must not use
>> necessarily an LLM?  And how do you define LLM even? Wikipedia says
>> "general-purpose language understanding and generation".
>>
>>
>> All this seems vague to me.
>>
>>
>> However, can't we define a simpler more specific policy?
>> For example, isn't it true that *any* automatically generated code
>> can only be included if the scripts producing said code
>> are also included or otherwise available under GPLv2?
> 
> The following definition makes sense to me:
> 
> - Automated codegen tool must be idempotent.
> - Automated codegen tool must not use statistical modelling.
> 
> I'd remove all AI or LLM references. These are non-specific, colloquial 
> and in the case of `AI`, non-technical. This policy should apply the 
> same to a Markov chain code generator.

This document targets all contributors. Contributions can be typo
fix, translations, ... and don't have to be technical. Similarly,
contributors aren't expected to be technical experts. As a neophyte,
"AI" makes sense. "Idempotent code generator" or "LLM" don't :)


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators
  2023-11-23 14:56     ` Manos Pitsidianakis
  2023-11-23 15:13       ` Michael S. Tsirkin
  2023-11-23 15:29       ` Philippe Mathieu-Daudé
@ 2023-11-23 15:32       ` Alex Bennée
  2023-11-23 18:02       ` Daniel P. Berrangé
  2023-11-24 10:25       ` Kevin Wolf
  4 siblings, 0 replies; 57+ messages in thread
From: Alex Bennée @ 2023-11-23 15:32 UTC (permalink / raw)
  To: Manos Pitsidianakis
  Cc: qemu-devel, Michael S. Tsirkin, Daniel P. Berrangé ,
	Richard Henderson, Alexander Graf, Paolo Bonzini,
	Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi,
	Thomas Huth, Kevin Wolf, Gerd Hoffmann, Mark Cave-Ayland,
	Peter Maydell

Manos Pitsidianakis <manos.pitsidianakis@linaro.org> writes:

> On Thu, 23 Nov 2023 16:35, "Michael S. Tsirkin" <mst@redhat.com> wrote:
>>On Thu, Nov 23, 2023 at 11:40:26AM +0000, Daniel P. Berrangé wrote:
>>> There has been an explosion of interest in so called "AI" (LLM)
>>> code generators in the past year or so. Thus far though, this is
>>> has not been matched by a broadly accepted legal interpretation
>>> of the licensing implications for code generator outputs. While
>>> the vendors may claim there is no problem and a free choice of
>>> license is possible, they have an inherent conflict of interest
>>> in promoting this interpretation. More broadly there is, as yet,
>>> no broad consensus on the licensing implications of code generators
>>> trained on inputs under a wide variety of licenses.
>>> The DCO requires contributors to assert they have the right to
>>> contribute under the designated project license. Given the lack
>>> of consensus on the licensing of "AI" (LLM) code generator output,
>>> it is not considered credible to assert compliance with the DCO
>>> clause (b) or (c) where a patch includes such generated code.
>>> This patch thus defines a policy that the QEMU project will not
>>> accept contributions where use of "AI" (LLM) code generators is
>>> either known, or suspected.
>>> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
>>> ---
>>>  docs/devel/code-provenance.rst | 40 ++++++++++++++++++++++++++++++++++
>>>  1 file changed, 40 insertions(+)
>>> diff --git a/docs/devel/code-provenance.rst
>>> b/docs/devel/code-provenance.rst
>>> index b4591a2dec..a6e42c6b1b 100644
>>> --- a/docs/devel/code-provenance.rst
>>> +++ b/docs/devel/code-provenance.rst
>>> @@ -195,3 +195,43 @@ example::
>>>    Signed-off-by: Some Person <some.person@example.com>
>>>    [Rebased and added support for 'foo']
>>>    Signed-off-by: New Person <new.person@example.com>
>>> +
>>> +Use of "AI" (LLM) code generators
>>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>> +
>>> +TL;DR:
>>> +
>>> +  **Current QEMU project policy is to DECLINE any contributions
>>> +  which are believed to include or derive from "AI" (LLM)
>>> +  generated code.**
>>> +
>>> +The existence of "AI" (`Large Language Model <https://en.wikipedia.org/wiki/Large_language_model>`__
>>> +/ LLM) code generators raises a number of difficult legal questions, a
>>> +number of which impact on Open Source projects. As noted earlier, the
>>> +QEMU community requires that contributors certify their patch submissions
>>> +are made in accordance with the rules of the :ref:`dco` (DCO). When a
>>> +patch contains "AI" generated code this raises difficulties with code
>>> +provenence and thus DCO compliance.
>>> +
<snip>
>>> +
>>> +The QEMU maintainers thus require that contributors refrain from using
>>> +"AI" code generators on patches intended to be submitted to the project,
>>> +and will decline any contribution if use of "AI" is known or suspected.
>>> +
>>> +Examples of tools impacted by this policy includes both GitHub CoPilot,
>>> +and ChatGPT, amongst many others which are less well known.
>>
>>
>>So you called out these two by name, fine, but given "AI" is in scare
>>quotes I don't really know what is or is not allowed and I don't know
>>how will contributors know.  Is the "AI" that one must not use
>>necessarily an LLM?  And how do you define LLM even? Wikipedia says
>>"general-purpose language understanding and generation".
>>
>>
>>All this seems vague to me.
>>
>>
>>However, can't we define a simpler more specific policy?
>>For example, isn't it true that *any* automatically generated code
>>can only be included if the scripts producing said code
>>are also included or otherwise available under GPLv2?
>
> The following definition makes sense to me:
>
> - Automated codegen tool must be idempotent.
> - Automated codegen tool must not use statistical modelling.
>
> I'd remove all AI or LLM references. These are non-specific,
> colloquial and in the case of `AI`, non-technical. This policy should
> apply the same to a Markov chain code generator.

I'm fairly sure my Emacs auto-complete would fail by that definition.

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators
  2023-11-23 15:29       ` Philippe Mathieu-Daudé
@ 2023-11-23 17:06         ` Michael S. Tsirkin
  2023-11-23 17:29           ` Michal Suchánek
  0 siblings, 1 reply; 57+ messages in thread
From: Michael S. Tsirkin @ 2023-11-23 17:06 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: Manos Pitsidianakis, qemu-devel, Daniel P. Berrangé,
	Richard Henderson, Alexander Graf, Alex Benn é e,
	Paolo Bonzini, Markus Armbruster, Stefan Hajnoczi, Thomas Huth,
	Kevin Wolf, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell

On Thu, Nov 23, 2023 at 04:29:52PM +0100, Philippe Mathieu-Daudé wrote:
> This document targets all contributors. Contributions can be typo
> fix, translations, ... and don't have to be technical. Similarly,
> contributors aren't expected to be technical experts. As a neophyte,
> "AI" makes sense. "Idempotent code generator" or "LLM" don't :)

I don't think there's any big deal in using AI for typo fixes.

-- 
MST



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off
  2023-11-23 11:58   ` Philippe Mathieu-Daudé
@ 2023-11-23 17:08     ` Daniel P. Berrangé
  2023-11-23 23:56       ` Michael S. Tsirkin
  0 siblings, 1 reply; 57+ messages in thread
From: Daniel P. Berrangé @ 2023-11-23 17:08 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: qemu-devel, Richard Henderson, Alexander Graf, Alex Bennée,
	Paolo Bonzini, Michael S. Tsirkin, Markus Armbruster,
	Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann,
	Mark Cave-Ayland, Peter Maydell

On Thu, Nov 23, 2023 at 12:58:18PM +0100, Philippe Mathieu-Daudé wrote:
> On 23/11/23 12:40, Daniel P. Berrangé wrote:
> > Currently we have a short paragraph saying that patches must include
> > a Signed-off-by line, and merely link to the kernel documentation.
> > The linked kernel docs have alot of content beyond the part about
> > sign-off an thus is misleading/distracting to QEMU contributors.
> > 
> > This introduces a dedicated 'code-provenance' page in QEMU talking
> > about why we require sign-off, explaining the other tags we commonly
> > use, and what to do in some edge cases.
> > 
> > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > ---
> >   docs/devel/code-provenance.rst    | 197 ++++++++++++++++++++++++++++++
> >   docs/devel/index-process.rst      |   1 +
> >   docs/devel/submitting-a-patch.rst |  18 +--
> >   3 files changed, 201 insertions(+), 15 deletions(-)
> >   create mode 100644 docs/devel/code-provenance.rst

> > +Other commit tags
> > +~~~~~~~~~~~~~~~~~
> > +
> > +While the ``Signed-off-by`` tag is mandatory, there are a number of
> > +other tags that are commonly used during QEMU development
> > +
> > + * **``Reviewed-by``**: when a QEMU community member reviews a patch
> > +   on the mailing list, if they consider the patch acceptable, they
> > +   should send an email reply containing a ``Reviewed-by`` tag.
> > +
> > +   NB: a subsystem maintainer sending a pull request would replace
> > +   their own ``Reviewed-by`` with another ``Signed-off-by``
> 
> Hmm not sure about replacing, they have different meaning. You can merge
> patch you haven't reviewed. But as a maintainer you must S-o-b what you
> end merging (what is mentioned below in "subsystem maintainer").

I've always taken it as implied that patches I queue are reviewed by me,
but replies here suggest I'm in a minority on that.  That shows why it is
worth documenting this for QEMU explicitly :-)

> > + * **``Reported-by``**: when a QEMU community member reports a problem
> > +   via the mailing list, or some other informal channel that is not
> > +   the issue tracker, it is good practice to credit them by including
> > +   a ``Reported-by`` tag on any patch fixing the issue. When the
> > +   problem is reported via the GitLab issue tracker, however, it is
> > +   sufficient to just include a link to the issue.
> 
> Hmm isn't related to the "Resolves:" tag?

Gitlab supports a huge varity - resolves/fixes/closes/etc

I don't think this wants to turn into a full guide on what info to include
in a commit message, as we already have that in the submitting-a-patch doc,
explaining the bug link syntax. So I'll still to just the tags that
explicitly credit humans.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off
  2023-11-23 13:01   ` Peter Maydell
@ 2023-11-23 17:12     ` Daniel P. Berrangé
  0 siblings, 0 replies; 57+ messages in thread
From: Daniel P. Berrangé @ 2023-11-23 17:12 UTC (permalink / raw)
  To: Peter Maydell
  Cc: qemu-devel, Richard Henderson, Alexander Graf, Alex Bennée,
	Paolo Bonzini, Michael S. Tsirkin, Markus Armbruster,
	Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Kevin Wolf,
	Gerd Hoffmann, Mark Cave-Ayland

On Thu, Nov 23, 2023 at 01:01:00PM +0000, Peter Maydell wrote:
> On Thu, 23 Nov 2023 at 11:40, Daniel P. Berrangé <berrange@redhat.com> wrote:
> >
> > Currently we have a short paragraph saying that patches must include
> > a Signed-off-by line, and merely link to the kernel documentation.
> > The linked kernel docs have alot of content beyond the part about
> 
> "a lot"
> 
> > sign-off an thus is misleading/distracting to QEMU contributors.
> 
> "and thus are"
> 
> >
> > This introduces a dedicated 'code-provenance' page in QEMU talking
> > about why we require sign-off, explaining the other tags we commonly
> > use, and what to do in some edge cases.
> 
> Good idea; I've felt for a while now that it was a little awkward
> to have to point people at that big kernel doc page.
> 
> 
> > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > ---
> >  docs/devel/code-provenance.rst    | 197 ++++++++++++++++++++++++++++++
> >  docs/devel/index-process.rst      |   1 +
> >  docs/devel/submitting-a-patch.rst |  18 +--
> >  3 files changed, 201 insertions(+), 15 deletions(-)
> >  create mode 100644 docs/devel/code-provenance.rst
> >
> > diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
> > new file mode 100644
> > index 0000000000..b4591a2dec
> > --- /dev/null
> > +++ b/docs/devel/code-provenance.rst
> > @@ -0,0 +1,197 @@
> > +.. _code-provenance:
> > +
> > +Code provenance
> > +===============
> > +
> > +Certifying patch submissions
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +The QEMU community **mandates** all contributors to certify provenance
> > +of patch submissions they make to the project. To put it another way,
> > +contributors must indicate that they are legally permitted to contribute
> > +to the project.
> > +
> > +Certification is achieved with a low overhead by adding a single line
> > +to the bottom of every git commit::
> > +
> > +   Signed-off-by: YOUR NAME <YOUR@EMAIL>
> > +
> > +This existence of this line asserts that the author of the patch is
> > +contributing in accordance with the `Developer's Certificate of
> > +Origin <https://developercertifcate.org>`__:
> > +
> > +.. _dco:
> > +
> > +::
> > +  Developer's Certificate of Origin 1.1
> > +
> > +  By making a contribution to this project, I certify that:
> > +
> > +  (a) The contribution was created in whole or in part by me and I
> > +      have the right to submit it under the open source license
> > +      indicated in the file; or
> > +
> > +  (b) The contribution is based upon previous work that, to the best
> > +      of my knowledge, is covered under an appropriate open source
> > +      license and I have the right under that license to submit that
> > +      work with modifications, whether created in whole or in part
> > +      by me, under the same open source license (unless I am
> > +      permitted to submit under a different license), as indicated
> > +      in the file; or
> > +
> > +  (c) The contribution was provided directly to me by some other
> > +      person who certified (a), (b) or (c) and I have not modified
> > +      it.
> > +
> > +  (d) I understand and agree that this project and the contribution
> > +      are public and that a record of the contribution (including all
> > +      personal information I submit with it, including my sign-off) is
> > +      maintained indefinitely and may be redistributed consistent with
> > +      this project or the open source license(s) involved.
> > +
> > +It is generally expected that the name and email addresses used in one
> > +of the ``Signed-off-by`` lines, matches that of the git commit ``Author``
> > +field. If the person sending the mail is also one of the patch authors,
> > +it is further expected that the mail ``From:`` line name & address match
> > +one of the ``Signed-off-by`` lines.
> 
> Is it? Patches sent via the sr.ht service won't do that, and I'm
> pretty sure we've had a few contributors in the past who send
> patches from different addresses to avoid problems with their
> corporate mail server mangling patches. I think this would be
> better softened to something like a recommendation ("Generally
> you should use the same email addresses ... ").

Yes, I forgot about sr.ht being wierd in this respect, so I'll
take your suggestion.


> > +
> > + * **``Reviewed-by``**: when a QEMU community member reviews a patch
> > +   on the mailing list, if they consider the patch acceptable, they
> > +   should send an email reply containing a ``Reviewed-by`` tag.
> > +
> > +   NB: a subsystem maintainer sending a pull request would replace
> > +   their own ``Reviewed-by`` with another ``Signed-off-by``
> 
> I agree with Philippe here -- you add signed-off-by, you don't
> replace reviewed-by.

Yep, will change that.

> 
> > +
> > + * **``Acked-by``**: when a QEMU subsystem maintainer approves a patch
> > +   that touches their subsystem, but intends to allow a different
> > +   maintainer to queue it and send a pull request, they would send
> > +   a mail containing a ``Acked-by`` tag.
> 
> I would personally also say "Acked-by does not imply a full code
> review of the patch; if the subsystem maintainer has done a full
> review, they should use the Reviewed-by tag instead."
> 
> But I know that there are some differences of opinion on exactly
> what Acked-by: means...

I'll incorporate something along those lines with a little fuzzyness
to give flexibility.

> > +
> > + * **``Tested-by``**: when a QEMU community member has functionally
> > +   tested the behaviour of the patch in some manner, they should
> > +   send an email reply conmtaning a ``Tested-by`` tag.
> > +
> > + * **``Reported-by``**: when a QEMU community member reports a problem
> > +   via the mailing list, or some other informal channel that is not
> > +   the issue tracker, it is good practice to credit them by including
> > +   a ``Reported-by`` tag on any patch fixing the issue. When the
> > +   problem is reported via the GitLab issue tracker, however, it is
> > +   sufficient to just include a link to the issue.
> 
> Maybe we should add a bit of encouraging text here along the lines of:
> 
> Reviewing and testing is something anybody can do -- if you've
> reviewed the code or tested it, feel free to send an email with
> your tag to say you've done that, or to ask questions if there's
> part of the patch you don't understand.
> 
> ? Or perhaps that would be better elsewhere; IDK.

I'll put a little bit in here but want to keep it relatively
concise, since we have other docs about more general contribution
practices.



> > +If the abandoned patch already had a ``Signed-off-by`` from the original
> > +author this **must** be preserved. The new contributor **must** then add
> > +their own ``Signed-off-by`` after the original one if they made any
> > +further changes to it. It is common to include a comment just prior to
> > +the new ``Signed-off-by`` indicating what extra changes were made. For
> > +example::
> > +
> > +  Signed-off-by: Some Person <some.person@example.com>
> > +  [Rebased and added support for 'foo']
> > +  Signed-off-by: New Person <new.person@example.com>
> 
> You might want to use two different email domains in this example;
> an abandoned project picked up by somebody from the same company
> (assuming the usual copyright-belongs-to-company) is a bit different
> from an abandoned project picked up by an entirely unrelated person.

Yes good idea.

> I think in this case it's also worth stating the general principles:
> 
> ===begin===
> The general principles with picking up abandoned work are:
>  * we should continue to credit the first author for their work
>  * we should track the provenance of the code
>  * we should also acknowledge the efforts of the person picking
>    up the work
>  * the commit messages should indicate who is responsible for
>    what parts of the final patch
> 
> In complicated cases or if in doubt, you can always ask on the
> mailing list for advice.
> 
> If the new work you'd need to do to resubmit the patches is
> significant, it's worth dropping the original author a
> friendly email to let them know, in case you might be
> duplicating something the original author is still working on.
> ===endit===
> 
> perhaps ?

I'll incorporate somethnig along these lines.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off
  2023-11-23 13:16   ` Kevin Wolf
@ 2023-11-23 17:12     ` Daniel P. Berrangé
  0 siblings, 0 replies; 57+ messages in thread
From: Daniel P. Berrangé @ 2023-11-23 17:12 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-devel, Richard Henderson, Alexander Graf, Alex Bennée,
	Paolo Bonzini, Michael S. Tsirkin, Markus Armbruster,
	Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth,
	Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell

On Thu, Nov 23, 2023 at 02:16:36PM +0100, Kevin Wolf wrote:
> Am 23.11.2023 um 12:40 hat Daniel P. Berrangé geschrieben:
> > Currently we have a short paragraph saying that patches must include
> > a Signed-off-by line, and merely link to the kernel documentation.
> > The linked kernel docs have alot of content beyond the part about
> > sign-off an thus is misleading/distracting to QEMU contributors.
> > 
> > This introduces a dedicated 'code-provenance' page in QEMU talking
> > about why we require sign-off, explaining the other tags we commonly
> > use, and what to do in some edge cases.
> > 
> > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > ---
> >  docs/devel/code-provenance.rst    | 197 ++++++++++++++++++++++++++++++
> >  docs/devel/index-process.rst      |   1 +
> >  docs/devel/submitting-a-patch.rst |  18 +--
> >  3 files changed, 201 insertions(+), 15 deletions(-)
> >  create mode 100644 docs/devel/code-provenance.rst
> > 
> > diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
> > new file mode 100644
> > index 0000000000..b4591a2dec
> > --- /dev/null
> > +++ b/docs/devel/code-provenance.rst
> > @@ -0,0 +1,197 @@
> > +.. _code-provenance:
> > +
> > +Code provenance
> > +===============
> > +
> > +Certifying patch submissions
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +The QEMU community **mandates** all contributors to certify provenance
> > +of patch submissions they make to the project. To put it another way,
> > +contributors must indicate that they are legally permitted to contribute
> > +to the project.
> > +
> > +Certification is achieved with a low overhead by adding a single line
> > +to the bottom of every git commit::
> > +
> > +   Signed-off-by: YOUR NAME <YOUR@EMAIL>
> > +
> > +This existence of this line asserts that the author of the patch is
> > +contributing in accordance with the `Developer's Certificate of
> > +Origin <https://developercertifcate.org>`__:
> > +
> > +.. _dco:
> > +
> > +::
> > +  Developer's Certificate of Origin 1.1
> > +
> > +  By making a contribution to this project, I certify that:
> > +
> > +  (a) The contribution was created in whole or in part by me and I
> > +      have the right to submit it under the open source license
> > +      indicated in the file; or
> > +
> > +  (b) The contribution is based upon previous work that, to the best
> > +      of my knowledge, is covered under an appropriate open source
> > +      license and I have the right under that license to submit that
> > +      work with modifications, whether created in whole or in part
> > +      by me, under the same open source license (unless I am
> > +      permitted to submit under a different license), as indicated
> > +      in the file; or
> > +
> > +  (c) The contribution was provided directly to me by some other
> > +      person who certified (a), (b) or (c) and I have not modified
> > +      it.
> > +
> > +  (d) I understand and agree that this project and the contribution
> > +      are public and that a record of the contribution (including all
> > +      personal information I submit with it, including my sign-off) is
> > +      maintained indefinitely and may be redistributed consistent with
> > +      this project or the open source license(s) involved.
> > +
> > +It is generally expected that the name and email addresses used in one
> > +of the ``Signed-off-by`` lines, matches that of the git commit ``Author``
> > +field. If the person sending the mail is also one of the patch authors,
> > +it is further expected that the mail ``From:`` line name & address match
> > +one of the ``Signed-off-by`` lines. 
> 
> Isn't the S-o-b expected even if the person sending the mail isn't one
> of the patch authors, i.e. certifying (c) rather than (a) or (b) from
> the DCO? This is essentially the same case as what a subsystem
> maintainer does.

Yes, you are right.


> > +Other commit tags
> > +~~~~~~~~~~~~~~~~~
> > +
> > +While the ``Signed-off-by`` tag is mandatory, there are a number of
> > +other tags that are commonly used during QEMU development
> > +
> > + * **``Reviewed-by``**: when a QEMU community member reviews a patch
> > +   on the mailing list, if they consider the patch acceptable, they
> > +   should send an email reply containing a ``Reviewed-by`` tag.
> > +
> > +   NB: a subsystem maintainer sending a pull request would replace
> > +   their own ``Reviewed-by`` with another ``Signed-off-by``
> 
> As Philippe already mentioned, this isn't necessarily the case. It's a
> common enough practice to add a S-o-b (which technically only certifies
> the DCO) without removing the R-b (which tells that the content was
> actually reviewed in detail - maintainers don't always do that if there
> are already R-bs from trusted community members).

Yes, will change.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off
  2023-11-23 14:25   ` Michael S. Tsirkin
@ 2023-11-23 17:16     ` Daniel P. Berrangé
  2023-11-23 17:33       ` Michael S. Tsirkin
  2023-11-24  9:49       ` Kevin Wolf
  0 siblings, 2 replies; 57+ messages in thread
From: Daniel P. Berrangé @ 2023-11-23 17:16 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, Richard Henderson, Alexander Graf, Alex Bennée,
	Paolo Bonzini, Markus Armbruster, Phil Mathieu-Daudé,
	Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann,
	Mark Cave-Ayland, Peter Maydell

On Thu, Nov 23, 2023 at 09:25:13AM -0500, Michael S. Tsirkin wrote:
> On Thu, Nov 23, 2023 at 11:40:25AM +0000, Daniel P. Berrangé wrote:
> > Currently we have a short paragraph saying that patches must include
> > a Signed-off-by line, and merely link to the kernel documentation.
> > The linked kernel docs have alot of content beyond the part about
> > sign-off an thus is misleading/distracting to QEMU contributors.
> > 
> > This introduces a dedicated 'code-provenance' page in QEMU talking
> > about why we require sign-off, explaining the other tags we commonly
> > use, and what to do in some edge cases.
> > 
> > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> 

> > +  * The non-primary author's contributions were so trivial that
> > +    they can be considered not subject to copyright. In this case
> > +    the secondary authors need not include a ``Signed-off-by``.
> > +
> > +    This case most commonly applies where QEMU reviewers give short
> > +    snippets of code as suggested fixes to a patch. The reviewers
> > +    don't need to have their own ``Signed-off-by`` added unless
> > +    their code suggestion was unusually large.
> 
> It is still a good policy to include attribution, e.g.
> by adding a Suggested-by tag.

Will add this tag.


> > +Other commit tags
> > +~~~~~~~~~~~~~~~~~
> > +
> > +While the ``Signed-off-by`` tag is mandatory, there are a number of
> > +other tags that are commonly used during QEMU development
> > +
> > + * **``Reviewed-by``**: when a QEMU community member reviews a patch
> > +   on the mailing list, if they consider the patch acceptable, they
> > +   should send an email reply containing a ``Reviewed-by`` tag.
> > +
> > +   NB: a subsystem maintainer sending a pull request would replace
> > +   their own ``Reviewed-by`` with another ``Signed-off-by``
> > +
> > + * **``Acked-by``**: when a QEMU subsystem maintainer approves a patch
> > +   that touches their subsystem, but intends to allow a different
> > +   maintainer to queue it and send a pull request, they would send
> > +   a mail containing a ``Acked-by`` tag.
> > +   
> > + * **``Tested-by``**: when a QEMU community member has functionally
> > +   tested the behaviour of the patch in some manner, they should
> > +   send an email reply conmtaning a ``Tested-by`` tag.
> > +
> > + * **``Reported-by``**: when a QEMU community member reports a problem
> > +   via the mailing list, or some other informal channel that is not
> > +   the issue tracker, it is good practice to credit them by including
> > +   a ``Reported-by`` tag on any patch fixing the issue. When the
> > +   problem is reported via the GitLab issue tracker, however, it is
> > +   sufficient to just include a link to the issue.
> 
> 
> Suggested-by is also common.
> 
> As long as we are here, let's document Fixes: and Cc: ?

The submitting-a-patch doc covers more general commit message information.
I think this doc just ought to focus on tags that identify humans involved
in the process.

I've never been sure what the point of the 'Cc' tag is, when you actually
want to use the Cc email header ? 


> > diff --git a/docs/devel/submitting-a-patch.rst b/docs/devel/submitting-a-patch.rst
> > index c641d948f1..ec541b3d15 100644
> > --- a/docs/devel/submitting-a-patch.rst
> > +++ b/docs/devel/submitting-a-patch.rst
> > @@ -322,21 +322,9 @@ Patch emails must include a ``Signed-off-by:`` line
> >  
> >  Your patches **must** include a Signed-off-by: line. This is a hard
> >  requirement because it's how you say "I'm legally okay to contribute
> > -this and happy for it to go into QEMU". The process is modelled after
> > -the `Linux kernel
> > -<http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches?id=f6f94e2ab1b33f0082ac22d71f66385a60d8157f#n297>`__
> > -policy.
> > -
> > -If you wrote the patch, make sure your "From:" and "Signed-off-by:"
> > -lines use the same spelling. It's okay if you subscribe or contribute to
> > -the list via more than one address, but using multiple addresses in one
> > -commit just confuses things. If someone else wrote the patch, git will
> > -include a "From:" line in the body of the email (different from your
> > -envelope From:) that will give credit to the correct author; but again,
> > -that author's Signed-off-by: line is mandatory, with the same spelling.
> > -
> > -There are various tooling options for automatically adding these tags
> > -include using ``git commit -s`` or ``git format-patch -s``. For more
> > +this and happy for it to go into QEMU". For full guidance, read the
> > +:ref:`code-provenance` documentation.
> > +
> >  information see `SubmittingPatches 1.12
> >  <http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches?id=f6f94e2ab1b33f0082ac22d71f66385a60d8157f#n297>`__.
> 
> this "information" now looks orphaned or am I confused?

Yes, forgot to cull it.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators
  2023-11-23 17:06         ` Michael S. Tsirkin
@ 2023-11-23 17:29           ` Michal Suchánek
  2023-11-23 18:05             ` Michael S. Tsirkin
  0 siblings, 1 reply; 57+ messages in thread
From: Michal Suchánek @ 2023-11-23 17:29 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Philippe Mathieu-Daudé, Manos Pitsidianakis, qemu-devel,
	Daniel P. Berrangé, Richard Henderson, Alexander Graf,
	Alex Benn é e, Paolo Bonzini, Markus Armbruster,
	Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann,
	Mark Cave-Ayland, Peter Maydell

On Thu, Nov 23, 2023 at 12:06:59PM -0500, Michael S. Tsirkin wrote:
> On Thu, Nov 23, 2023 at 04:29:52PM +0100, Philippe Mathieu-Daudé wrote:
> > This document targets all contributors. Contributions can be typo
> > fix, translations, ... and don't have to be technical. Similarly,
> > contributors aren't expected to be technical experts. As a neophyte,
> > "AI" makes sense. "Idempotent code generator" or "LLM" don't :)
> 
> I don't think there's any big deal in using AI for typo fixes.

For how many typos it is still OK, and would not a deterministic
spellchecker be preferred?

There are some edge cases where using AI is OK, the problem is most of
the time it is not clear it is OK to use.

Thanks

Michal


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off
  2023-11-23 17:16     ` Daniel P. Berrangé
@ 2023-11-23 17:33       ` Michael S. Tsirkin
  2023-11-24 11:11         ` Philippe Mathieu-Daudé
  2023-11-24  9:49       ` Kevin Wolf
  1 sibling, 1 reply; 57+ messages in thread
From: Michael S. Tsirkin @ 2023-11-23 17:33 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Richard Henderson, Alexander Graf, Alex Bennée,
	Paolo Bonzini, Markus Armbruster, Phil Mathieu-Daudé,
	Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann,
	Mark Cave-Ayland, Peter Maydell

On Thu, Nov 23, 2023 at 05:16:45PM +0000, Daniel P. Berrangé wrote:
> On Thu, Nov 23, 2023 at 09:25:13AM -0500, Michael S. Tsirkin wrote:
> > On Thu, Nov 23, 2023 at 11:40:25AM +0000, Daniel P. Berrangé wrote:
> > > Currently we have a short paragraph saying that patches must include
> > > a Signed-off-by line, and merely link to the kernel documentation.
> > > The linked kernel docs have alot of content beyond the part about
> > > sign-off an thus is misleading/distracting to QEMU contributors.
> > > 
> > > This introduces a dedicated 'code-provenance' page in QEMU talking
> > > about why we require sign-off, explaining the other tags we commonly
> > > use, and what to do in some edge cases.
> > > 
> > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > 
> 
> > > +  * The non-primary author's contributions were so trivial that
> > > +    they can be considered not subject to copyright. In this case
> > > +    the secondary authors need not include a ``Signed-off-by``.
> > > +
> > > +    This case most commonly applies where QEMU reviewers give short
> > > +    snippets of code as suggested fixes to a patch. The reviewers
> > > +    don't need to have their own ``Signed-off-by`` added unless
> > > +    their code suggestion was unusually large.
> > 
> > It is still a good policy to include attribution, e.g.
> > by adding a Suggested-by tag.
> 
> Will add this tag.
> 
> 
> > > +Other commit tags
> > > +~~~~~~~~~~~~~~~~~
> > > +
> > > +While the ``Signed-off-by`` tag is mandatory, there are a number of
> > > +other tags that are commonly used during QEMU development
> > > +
> > > + * **``Reviewed-by``**: when a QEMU community member reviews a patch
> > > +   on the mailing list, if they consider the patch acceptable, they
> > > +   should send an email reply containing a ``Reviewed-by`` tag.
> > > +
> > > +   NB: a subsystem maintainer sending a pull request would replace
> > > +   their own ``Reviewed-by`` with another ``Signed-off-by``
> > > +
> > > + * **``Acked-by``**: when a QEMU subsystem maintainer approves a patch
> > > +   that touches their subsystem, but intends to allow a different
> > > +   maintainer to queue it and send a pull request, they would send
> > > +   a mail containing a ``Acked-by`` tag.
> > > +   
> > > + * **``Tested-by``**: when a QEMU community member has functionally
> > > +   tested the behaviour of the patch in some manner, they should
> > > +   send an email reply conmtaning a ``Tested-by`` tag.
> > > +
> > > + * **``Reported-by``**: when a QEMU community member reports a problem
> > > +   via the mailing list, or some other informal channel that is not
> > > +   the issue tracker, it is good practice to credit them by including
> > > +   a ``Reported-by`` tag on any patch fixing the issue. When the
> > > +   problem is reported via the GitLab issue tracker, however, it is
> > > +   sufficient to just include a link to the issue.
> > 
> > 
> > Suggested-by is also common.
> > 
> > As long as we are here, let's document Fixes: and Cc: ?
> 
> The submitting-a-patch doc covers more general commit message information.
> I think this doc just ought to focus on tags that identify humans involved
> in the process.
> 
> I've never been sure what the point of the 'Cc' tag is, when you actually
> want to use the Cc email header ? 
> 

It records the fact that these people have been copied but did not
respond.

> > > diff --git a/docs/devel/submitting-a-patch.rst b/docs/devel/submitting-a-patch.rst
> > > index c641d948f1..ec541b3d15 100644
> > > --- a/docs/devel/submitting-a-patch.rst
> > > +++ b/docs/devel/submitting-a-patch.rst
> > > @@ -322,21 +322,9 @@ Patch emails must include a ``Signed-off-by:`` line
> > >  
> > >  Your patches **must** include a Signed-off-by: line. This is a hard
> > >  requirement because it's how you say "I'm legally okay to contribute
> > > -this and happy for it to go into QEMU". The process is modelled after
> > > -the `Linux kernel
> > > -<http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches?id=f6f94e2ab1b33f0082ac22d71f66385a60d8157f#n297>`__
> > > -policy.
> > > -
> > > -If you wrote the patch, make sure your "From:" and "Signed-off-by:"
> > > -lines use the same spelling. It's okay if you subscribe or contribute to
> > > -the list via more than one address, but using multiple addresses in one
> > > -commit just confuses things. If someone else wrote the patch, git will
> > > -include a "From:" line in the body of the email (different from your
> > > -envelope From:) that will give credit to the correct author; but again,
> > > -that author's Signed-off-by: line is mandatory, with the same spelling.
> > > -
> > > -There are various tooling options for automatically adding these tags
> > > -include using ``git commit -s`` or ``git format-patch -s``. For more
> > > +this and happy for it to go into QEMU". For full guidance, read the
> > > +:ref:`code-provenance` documentation.
> > > +
> > >  information see `SubmittingPatches 1.12
> > >  <http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches?id=f6f94e2ab1b33f0082ac22d71f66385a60d8157f#n297>`__.
> > 
> > this "information" now looks orphaned or am I confused?
> 
> Yes, forgot to cull it.
> 
> With regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators
  2023-11-23 12:57   ` Alex Bennée
@ 2023-11-23 17:37     ` Michal Suchánek
  2023-11-23 23:27       ` Michael S. Tsirkin
  2023-11-23 17:46     ` Daniel P. Berrangé
  1 sibling, 1 reply; 57+ messages in thread
From: Michal Suchánek @ 2023-11-23 17:37 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Daniel P. Berrangé, qemu-devel, Richard Henderson,
	Alexander Graf, Paolo Bonzini, Michael S. Tsirkin,
	Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi,
	Thomas Huth, Kevin Wolf, Gerd Hoffmann, Mark Cave-Ayland,
	Peter Maydell

On Thu, Nov 23, 2023 at 12:57:42PM +0000, Alex Bennée wrote:
> Daniel P. Berrangé <berrange@redhat.com> writes:
> 
> > There has been an explosion of interest in so called "AI" (LLM)
> > code generators in the past year or so. Thus far though, this is
> > has not been matched by a broadly accepted legal interpretation
> > of the licensing implications for code generator outputs. While
> > the vendors may claim there is no problem and a free choice of
> > license is possible, they have an inherent conflict of interest
> > in promoting this interpretation. More broadly there is, as yet,
> > no broad consensus on the licensing implications of code generators
> > trained on inputs under a wide variety of licenses.
> >
> > The DCO requires contributors to assert they have the right to
> > contribute under the designated project license. Given the lack
> > of consensus on the licensing of "AI" (LLM) code generator output,
> > it is not considered credible to assert compliance with the DCO
> > clause (b) or (c) where a patch includes such generated code.
> >
> > This patch thus defines a policy that the QEMU project will not
> > accept contributions where use of "AI" (LLM) code generators is
> > either known, or suspected.
> >
> > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > ---
> >  docs/devel/code-provenance.rst | 40 ++++++++++++++++++++++++++++++++++
> >  1 file changed, 40 insertions(+)
> >
> > diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
> > index b4591a2dec..a6e42c6b1b 100644
> > --- a/docs/devel/code-provenance.rst
> > +++ b/docs/devel/code-provenance.rst
> > @@ -195,3 +195,43 @@ example::
> >    Signed-off-by: Some Person <some.person@example.com>
> >    [Rebased and added support for 'foo']
> >    Signed-off-by: New Person <new.person@example.com>
> > +
> > +Use of "AI" (LLM) code generators
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +TL;DR:
> > +
> > +  **Current QEMU project policy is to DECLINE any contributions
> > +  which are believed to include or derive from "AI" (LLM)
> > +  generated code.**
> > +
> > +The existence of "AI" (`Large Language Model <https://en.wikipedia.org/wiki/Large_language_model>`__
> > +/ LLM) code generators raises a number of difficult legal questions, a
> > +number of which impact on Open Source projects. As noted earlier, the
> > +QEMU community requires that contributors certify their patch submissions
> > +are made in accordance with the rules of the :ref:`dco` (DCO). When a
> > +patch contains "AI" generated code this raises difficulties with code
> > +provenence and thus DCO compliance.
> 
> I agree this is going to be a field that keeps lawyers well re-numerated
> for the foreseeable future. However I suspect this elides over the main
> use case for LLM generators which is non-novel transformation. One good
> example is generating text fixtures where you write a piece of original
> code and then ask the code completion engine to fill out some unit tests
> to exercise the code. It's boring mechanical work but one an LLM is very
> suited to (even if you might tweak the final result).

It may be suited to produce such code (disputable) but the code is not
suited for inclusion into the project, for legal reasons.

> > +To satisfy the DCO, the patch contributor has to fully understand
> > +the origins and license of code they are contributing to QEMU. The
> > +license terms that should apply to the output of an "AI" code generator
> > +are ill-defined, given that both training data and operation of the
> > +"AI" are typically opaque to the user. Even where the training data
> > +is said to all be open source, it will likely be under a wide variety
> > +of license terms.
> > +
> > +While the vendor's of "AI" code generators may promote the idea that
> > +code output can be taken under a free choice of license, this is not
> > +yet considered to be a generally accepted, nor tested, legal opinion.
> > +
> > +With this in mind, the QEMU maintainers does not consider it is
> > +currently possible to comply with DCO terms (b) or (c) for most "AI"
> > +generated code.
> 
> There is a load of code out that isn't eligible for copyright projection
> because it doesn't demonstrate much originality or creativity. In the
> experimentation I've done so far I've not seen much sign of genuine
> creativity. LLM's benefit from having access to a wide corpus of
> training data and tend to do a better job of inferencing solutions from
> semi-related posts than say for example human manually comparing posts
> having pasted an error message in google.

And license of that corpus of training data is not defined.

If you could erase the copyright on anything by feeding it into a
statistical model and pulling it back out there would be some big
content license holders objecting so it's very unlikely to happen.

Consequently, for all practical purposes the "AI"/LLM output is
derivative work of the input with all legal consequences.

This is, of course, only a problem for *generative* use of AI/LLM where
the putput can contain contain copies of substantial parts of input.

Thanks

Michal


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators
  2023-11-23 12:57   ` Alex Bennée
  2023-11-23 17:37     ` Michal Suchánek
@ 2023-11-23 17:46     ` Daniel P. Berrangé
  2023-11-23 23:53       ` Michael S. Tsirkin
  1 sibling, 1 reply; 57+ messages in thread
From: Daniel P. Berrangé @ 2023-11-23 17:46 UTC (permalink / raw)
  To: Alex Bennée
  Cc: qemu-devel, Richard Henderson, Alexander Graf, Paolo Bonzini,
	Michael S. Tsirkin, Markus Armbruster, Phil Mathieu-Daudé,
	Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann,
	Mark Cave-Ayland, Peter Maydell

On Thu, Nov 23, 2023 at 12:57:42PM +0000, Alex Bennée wrote:
> Daniel P. Berrangé <berrange@redhat.com> writes:
> 
> > There has been an explosion of interest in so called "AI" (LLM)
> > code generators in the past year or so. Thus far though, this is
> > has not been matched by a broadly accepted legal interpretation
> > of the licensing implications for code generator outputs. While
> > the vendors may claim there is no problem and a free choice of
> > license is possible, they have an inherent conflict of interest
> > in promoting this interpretation. More broadly there is, as yet,
> > no broad consensus on the licensing implications of code generators
> > trained on inputs under a wide variety of licenses.
> >
> > The DCO requires contributors to assert they have the right to
> > contribute under the designated project license. Given the lack
> > of consensus on the licensing of "AI" (LLM) code generator output,
> > it is not considered credible to assert compliance with the DCO
> > clause (b) or (c) where a patch includes such generated code.
> >
> > This patch thus defines a policy that the QEMU project will not
> > accept contributions where use of "AI" (LLM) code generators is
> > either known, or suspected.
> >
> > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > ---
> >  docs/devel/code-provenance.rst | 40 ++++++++++++++++++++++++++++++++++
> >  1 file changed, 40 insertions(+)
> >
> > diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
> > index b4591a2dec..a6e42c6b1b 100644
> > --- a/docs/devel/code-provenance.rst
> > +++ b/docs/devel/code-provenance.rst
> > @@ -195,3 +195,43 @@ example::
> >    Signed-off-by: Some Person <some.person@example.com>
> >    [Rebased and added support for 'foo']
> >    Signed-off-by: New Person <new.person@example.com>
> > +
> > +Use of "AI" (LLM) code generators
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +TL;DR:
> > +
> > +  **Current QEMU project policy is to DECLINE any contributions
> > +  which are believed to include or derive from "AI" (LLM)
> > +  generated code.**
> > +
> > +The existence of "AI" (`Large Language Model <https://en.wikipedia.org/wiki/Large_language_model>`__
> > +/ LLM) code generators raises a number of difficult legal questions, a
> > +number of which impact on Open Source projects. As noted earlier, the
> > +QEMU community requires that contributors certify their patch submissions
> > +are made in accordance with the rules of the :ref:`dco` (DCO). When a
> > +patch contains "AI" generated code this raises difficulties with code
> > +provenence and thus DCO compliance.
> 
> I agree this is going to be a field that keeps lawyers well re-numerated
> for the foreseeable future. However I suspect this elides over the main
> use case for LLM generators which is non-novel transformation. One good
> example is generating text fixtures where you write a piece of original
> code and then ask the code completion engine to fill out some unit tests
> to exercise the code. It's boring mechanical work but one an LLM is very
> suited to (even if you might tweak the final result).

Yes, I can see how that is helpful, but I think in many cases the
resulting code will be complex enough to be considered copyrightable,
and so even with the original input code, I feel the licensing of the
output is still ill-defined.

> 
> > +To satisfy the DCO, the patch contributor has to fully understand
> > +the origins and license of code they are contributing to QEMU. The
> > +license terms that should apply to the output of an "AI" code generator
> > +are ill-defined, given that both training data and operation of the
> > +"AI" are typically opaque to the user. Even where the training data
> > +is said to all be open source, it will likely be under a wide variety
> > +of license terms.
> > +
> > +While the vendor's of "AI" code generators may promote the idea that
> > +code output can be taken under a free choice of license, this is not
> > +yet considered to be a generally accepted, nor tested, legal opinion.
> > +
> > +With this in mind, the QEMU maintainers does not consider it is
> > +currently possible to comply with DCO terms (b) or (c) for most "AI"
> > +generated code.
> 
> There is a load of code out that isn't eligible for copyright projection
> because it doesn't demonstrate much originality or creativity. In the
> experimentation I've done so far I've not seen much sign of genuine
> creativity. LLM's benefit from having access to a wide corpus of
> training data and tend to do a better job of inferencing solutions from
> semi-related posts than say for example human manually comparing posts
> having pasted an error message in google.

The boundary between what is considered copyrightable and not, it
itself quite ill-defined, and thus it is hard to express a clear
rule that can be applied.

I think more experience long term contributors end up getting somewhat
of a "gut feeling" about what's ok and what's not, but I'm not sure if
that is true for contibutors in general.

IOW, while there are likely cases where it is possible to safely use
a AI generator, I'm not sure how to best express that in an way that
makes sense.

Perhaps a loosely worded addendum  about possible exception for
"trivial" output

> > +The QEMU maintainers thus require that contributors refrain from using
> > +"AI" code generators on patches intended to be submitted to the project,
> > +and will decline any contribution if use of "AI" is known or suspected.
> > +
> > +Examples of tools impacted by this policy includes both GitHub CoPilot,
> > +and ChatGPT, amongst many others which are less well known.
> 
> What about if you took an LLM and then fine tuned it by using project
> data so it could better help new users in making contributions to the
> project? You would be biasing the model to your own data for the
> purposes of helping developers write better QEMU code?

It is hard to provide an answer to that question, since I think it is
something that would need to be considered case by case. It hinges
around how much does the new QEMU specific training data influence
the model, vs other pre-existing training (if any)

Perhaps we can finish this policy with a general point to solicit
feedback on possible exceptions ?

  "If a contributor believes they can demonstrate that the output of
   a particular tool has deterministic licensing, such that they can
   satisfy the DCO, they should provide such info to the mailing list"

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators
  2023-11-23 14:35   ` Michael S. Tsirkin
  2023-11-23 14:56     ` Manos Pitsidianakis
@ 2023-11-23 17:58     ` Daniel P. Berrangé
  2023-11-23 22:39       ` Michael S. Tsirkin
  1 sibling, 1 reply; 57+ messages in thread
From: Daniel P. Berrangé @ 2023-11-23 17:58 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, Richard Henderson, Alexander Graf, Alex Bennée,
	Paolo Bonzini, Markus Armbruster, Phil Mathieu-Daudé,
	Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann,
	Mark Cave-Ayland, Peter Maydell

On Thu, Nov 23, 2023 at 09:35:43AM -0500, Michael S. Tsirkin wrote:
> On Thu, Nov 23, 2023 at 11:40:26AM +0000, Daniel P. Berrangé wrote:
> > There has been an explosion of interest in so called "AI" (LLM)
> > code generators in the past year or so. Thus far though, this is
> > has not been matched by a broadly accepted legal interpretation
> > of the licensing implications for code generator outputs. While
> > the vendors may claim there is no problem and a free choice of
> > license is possible, they have an inherent conflict of interest
> > in promoting this interpretation. More broadly there is, as yet,
> > no broad consensus on the licensing implications of code generators
> > trained on inputs under a wide variety of licenses.
> > 
> > The DCO requires contributors to assert they have the right to
> > contribute under the designated project license. Given the lack
> > of consensus on the licensing of "AI" (LLM) code generator output,
> > it is not considered credible to assert compliance with the DCO
> > clause (b) or (c) where a patch includes such generated code.
> > 
> > This patch thus defines a policy that the QEMU project will not
> > accept contributions where use of "AI" (LLM) code generators is
> > either known, or suspected.
> > 
> > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > ---
> >  docs/devel/code-provenance.rst | 40 ++++++++++++++++++++++++++++++++++
> >  1 file changed, 40 insertions(+)
> > 
> > diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
> > index b4591a2dec..a6e42c6b1b 100644
> > --- a/docs/devel/code-provenance.rst
> > +++ b/docs/devel/code-provenance.rst
> > @@ -195,3 +195,43 @@ example::
> >    Signed-off-by: Some Person <some.person@example.com>
> >    [Rebased and added support for 'foo']
> >    Signed-off-by: New Person <new.person@example.com>
> > +
> > +Use of "AI" (LLM) code generators
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +TL;DR:
> > +
> > +  **Current QEMU project policy is to DECLINE any contributions
> > +  which are believed to include or derive from "AI" (LLM)
> > +  generated code.**
> > +
> > +The existence of "AI" (`Large Language Model <https://en.wikipedia.org/wiki/Large_language_model>`__
> > +/ LLM) code generators raises a number of difficult legal questions, a
> > +number of which impact on Open Source projects. As noted earlier, the
> > +QEMU community requires that contributors certify their patch submissions
> > +are made in accordance with the rules of the :ref:`dco` (DCO). When a
> > +patch contains "AI" generated code this raises difficulties with code
> > +provenence and thus DCO compliance.
> > +
> > +To satisfy the DCO, the patch contributor has to fully understand
> > +the origins and license of code they are contributing to QEMU. The
> > +license terms that should apply to the output of an "AI" code generator
> > +are ill-defined, given that both training data and operation of the
> > +"AI" are typically opaque to the user. Even where the training data
> > +is said to all be open source, it will likely be under a wide variety
> > +of license terms.
> > +
> > +While the vendor's of "AI" code generators may promote the idea that
> > +code output can be taken under a free choice of license, this is not
> > +yet considered to be a generally accepted, nor tested, legal opinion.
> > +
> > +With this in mind, the QEMU maintainers does not consider it is
> > +currently possible to comply with DCO terms (b) or (c) for most "AI"
> > +generated code.
> > +
> > +The QEMU maintainers thus require that contributors refrain from using
> > +"AI" code generators on patches intended to be submitted to the project,
> > +and will decline any contribution if use of "AI" is known or suspected.
> > +
> > +Examples of tools impacted by this policy includes both GitHub CoPilot,
> > +and ChatGPT, amongst many others which are less well known.
> 
> 
> So you called out these two by name, fine, but given "AI" is in scare
> quotes I don't really know what is or is not allowed and I don't know
> how will contributors know.  Is the "AI" that one must not use
> necessarily an LLM?  And how do you define LLM even? Wikipedia says
> "general-purpose language understanding and generation".

I used "AI" in quotes, because I think it can mean different things to
different people. In practical terms it has become a bit of a catch
all term for a wide variety of tools. Thus I think the quote serve to
express this as a loose generalization, rather than a precise definition.

The same for "LLM", I don't want to try to define it, as it has also
become somewhat of a general term. 

> All this seems vague to me.

Delibrately so, as there are a wide variety of tools working in
varying ways, but all with similar caveats around the licensing
of the output "derivative" work.

> However, can't we define a simpler more specific policy?
> For example, isn't it true that *any* automatically generated code
> can only be included if the scripts producing said code
> are also included or otherwise available under GPLv2?

The license of a code generation tool itself is usually considered
to be not a factor in the license of its output.

In most cases the license of the input data will determine the
license of the output data, since the latter is a derivative
work of the former. The person runing the tool will typically
know exact what the input data is, and so have confidence over
the license of the output.

If there are questions about whether the output is a derivative
of the tool's code itself, then the tool author can provide an
disclaimer for this.  Such a disclaimer though, would not erase
the derivative link between input data and output data. One
example is GCC where the output .o/exe is a derivative of the
input .c.  The output, however, may also link the gcc runtime
library, and so GCC has a license exception saying that this
runtime linkage doesn't affect the license of the output
program. This is OK, since the GCC authors who added this
exception owned copyright over the runtime library they're
adding an exception for.

If we apply this to LLMs, the output of the LLM is a derivative
of the training data. The output is not a derivative of the LLM
code. The LLM copyright holders could make this latter point
explicit since they own copyright of the LLM code, but they do
not own copyright of the training data, and neither does the
person using the LLM, hence the legal uncertainty.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators
  2023-11-23 14:56     ` Manos Pitsidianakis
                         ` (2 preceding siblings ...)
  2023-11-23 15:32       ` Alex Bennée
@ 2023-11-23 18:02       ` Daniel P. Berrangé
  2023-11-23 18:10         ` Peter Maydell
  2023-11-24 10:25       ` Kevin Wolf
  4 siblings, 1 reply; 57+ messages in thread
From: Daniel P. Berrangé @ 2023-11-23 18:02 UTC (permalink / raw)
  To: Manos Pitsidianakis
  Cc: qemu-devel, Michael S. Tsirkin, Richard Henderson, Alexander Graf,
	Alex Benné e, Paolo Bonzini, Markus Armbruster,
	Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Kevin Wolf,
	Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell

On Thu, Nov 23, 2023 at 04:56:28PM +0200, Manos Pitsidianakis wrote:
> On Thu, 23 Nov 2023 16:35, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Thu, Nov 23, 2023 at 11:40:26AM +0000, Daniel P. Berrangé wrote:
> > > There has been an explosion of interest in so called "AI" (LLM)
> > > code generators in the past year or so. Thus far though, this is
> > > has not been matched by a broadly accepted legal interpretation
> > > of the licensing implications for code generator outputs. While
> > > the vendors may claim there is no problem and a free choice of
> > > license is possible, they have an inherent conflict of interest
> > > in promoting this interpretation. More broadly there is, as yet,
> > > no broad consensus on the licensing implications of code generators
> > > trained on inputs under a wide variety of licenses.
> > > 
> > > The DCO requires contributors to assert they have the right to
> > > contribute under the designated project license. Given the lack
> > > of consensus on the licensing of "AI" (LLM) code generator output,
> > > it is not considered credible to assert compliance with the DCO
> > > clause (b) or (c) where a patch includes such generated code.
> > > 
> > > This patch thus defines a policy that the QEMU project will not
> > > accept contributions where use of "AI" (LLM) code generators is
> > > either known, or suspected.
> > > 
> > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > > ---
> > >  docs/devel/code-provenance.rst | 40 ++++++++++++++++++++++++++++++++++
> > >  1 file changed, 40 insertions(+)
> > > 
> > > diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
> > > index b4591a2dec..a6e42c6b1b 100644
> > > --- a/docs/devel/code-provenance.rst
> > > +++ b/docs/devel/code-provenance.rst
> > > @@ -195,3 +195,43 @@ example::
> > >    Signed-off-by: Some Person <some.person@example.com>
> > >    [Rebased and added support for 'foo']
> > >    Signed-off-by: New Person <new.person@example.com>
> > > +
> > > +Use of "AI" (LLM) code generators
> > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > +
> > > +TL;DR:
> > > +
> > > +  **Current QEMU project policy is to DECLINE any contributions
> > > +  which are believed to include or derive from "AI" (LLM)
> > > +  generated code.**
> > > +
> > > +The existence of "AI" (`Large Language Model <https://en.wikipedia.org/wiki/Large_language_model>`__
> > > +/ LLM) code generators raises a number of difficult legal questions, a
> > > +number of which impact on Open Source projects. As noted earlier, the
> > > +QEMU community requires that contributors certify their patch submissions
> > > +are made in accordance with the rules of the :ref:`dco` (DCO). When a
> > > +patch contains "AI" generated code this raises difficulties with code
> > > +provenence and thus DCO compliance.
> > > +
> > > +To satisfy the DCO, the patch contributor has to fully understand
> > > +the origins and license of code they are contributing to QEMU. The
> > > +license terms that should apply to the output of an "AI" code generator
> > > +are ill-defined, given that both training data and operation of the
> > > +"AI" are typically opaque to the user. Even where the training data
> > > +is said to all be open source, it will likely be under a wide variety
> > > +of license terms.
> > > +
> > > +While the vendor's of "AI" code generators may promote the idea that
> > > +code output can be taken under a free choice of license, this is not
> > > +yet considered to be a generally accepted, nor tested, legal opinion.
> > > +
> > > +With this in mind, the QEMU maintainers does not consider it is
> > > +currently possible to comply with DCO terms (b) or (c) for most "AI"
> > > +generated code.
> > > +
> > > +The QEMU maintainers thus require that contributors refrain from using
> > > +"AI" code generators on patches intended to be submitted to the project,
> > > +and will decline any contribution if use of "AI" is known or suspected.
> > > +
> > > +Examples of tools impacted by this policy includes both GitHub CoPilot,
> > > +and ChatGPT, amongst many others which are less well known.
> > 
> > 
> > So you called out these two by name, fine, but given "AI" is in scare
> > quotes I don't really know what is or is not allowed and I don't know
> > how will contributors know.  Is the "AI" that one must not use
> > necessarily an LLM?  And how do you define LLM even? Wikipedia says
> > "general-purpose language understanding and generation".
> > 
> > 
> > All this seems vague to me.
> > 
> > 
> > However, can't we define a simpler more specific policy?
> > For example, isn't it true that *any* automatically generated code
> > can only be included if the scripts producing said code
> > are also included or otherwise available under GPLv2?
> 
> The following definition makes sense to me:
> 
> - Automated codegen tool must be idempotent.
> - Automated codegen tool must not use statistical modelling.

As a casual reader, I would find this somewhat unclear to interpet
and relate to.

> I'd remove all AI or LLM references. These are non-specific, colloquial and
> in the case of `AI`, non-technical. This policy should apply the same to a
> Markov chain code generator.

The fact that they are colloaquial is, IMHO, a good thing is it makes
the policy relatable to the casual reader who hears the terms "AI" and
"LLM" in technical press articles/blogs/etc all over the place.

I would have considered "Markov chain code generator" to fall under the
"AI" reference, since "AI" has defacto become a general purpose term
that covers a wierd variety of underlying technologies.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators
  2023-11-23 17:29           ` Michal Suchánek
@ 2023-11-23 18:05             ` Michael S. Tsirkin
  0 siblings, 0 replies; 57+ messages in thread
From: Michael S. Tsirkin @ 2023-11-23 18:05 UTC (permalink / raw)
  To: Michal Suchánek
  Cc: Philippe Mathieu-Daudé, Manos Pitsidianakis, qemu-devel,
	Daniel P. Berrangé, Richard Henderson, Alexander Graf,
	Alex Benn é e, Paolo Bonzini, Markus Armbruster,
	Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann,
	Mark Cave-Ayland, Peter Maydell

On Thu, Nov 23, 2023 at 06:29:38PM +0100, Michal Suchánek wrote:
> On Thu, Nov 23, 2023 at 12:06:59PM -0500, Michael S. Tsirkin wrote:
> > On Thu, Nov 23, 2023 at 04:29:52PM +0100, Philippe Mathieu-Daudé wrote:
> > > This document targets all contributors. Contributions can be typo
> > > fix, translations, ... and don't have to be technical. Similarly,
> > > contributors aren't expected to be technical experts. As a neophyte,
> > > "AI" makes sense. "Idempotent code generator" or "LLM" don't :)
> > 
> > I don't think there's any big deal in using AI for typo fixes.
> 
> For how many typos it is still OK, and would not a deterministic
> spellchecker be preferred?
> 
> There are some edge cases where using AI is OK, the problem is most of
> the time it is not clear it is OK to use.
> 
> Thanks
> 
> Michal

¯\_(ツ)_/¯ I am not a lawyer, and I don't speak for Red Hat.


My point is however that e.g. even if you are using e.g. a grammar
corrector you better make sure that it is not claiming that its output
is a derivative work.

-- 
MST



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators
  2023-11-23 18:02       ` Daniel P. Berrangé
@ 2023-11-23 18:10         ` Peter Maydell
  0 siblings, 0 replies; 57+ messages in thread
From: Peter Maydell @ 2023-11-23 18:10 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Manos Pitsidianakis, qemu-devel, Michael S. Tsirkin,
	Richard Henderson, Alexander Graf, Alex Benné e,
	Paolo Bonzini, Markus Armbruster, Phil Mathieu-Daudé,
	Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann,
	Mark Cave-Ayland

On Thu, 23 Nov 2023 at 18:02, Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> On Thu, Nov 23, 2023 at 04:56:28PM +0200, Manos Pitsidianakis wrote:
> > On Thu, 23 Nov 2023 16:35, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > On Thu, Nov 23, 2023 at 11:40:26AM +0000, Daniel P. Berrangé wrote:
> > > > +Examples of tools impacted by this policy includes both GitHub CoPilot,
> > > > +and ChatGPT, amongst many others which are less well known.
> > >
> > >
> > > So you called out these two by name, fine, but given "AI" is in scare
> > > quotes I don't really know what is or is not allowed and I don't know
> > > how will contributors know.  Is the "AI" that one must not use
> > > necessarily an LLM?  And how do you define LLM even? Wikipedia says
> > > "general-purpose language understanding and generation".
> > >
> > >
> > > All this seems vague to me.
> > >
> > >
> > > However, can't we define a simpler more specific policy?
> > > For example, isn't it true that *any* automatically generated code
> > > can only be included if the scripts producing said code
> > > are also included or otherwise available under GPLv2?
> >
> > The following definition makes sense to me:
> >
> > - Automated codegen tool must be idempotent.
> > - Automated codegen tool must not use statistical modelling.
>
> As a casual reader, I would find this somewhat unclear to interpet
> and relate to.

It's also not really relevant to what we're trying to rule out.
A non-idempotent codegen tool is fine, if the code it generates
is clearly under a license that's compatible with QEMU's.
A codegen tool that uses statistical modelling is also fine,
if (for example) it's only doing statistical modelling of the
data in the single file it's adding code to and doesn't use
any external data set.

> > I'd remove all AI or LLM references. These are non-specific, colloquial and
> > in the case of `AI`, non-technical. This policy should apply the same to a
> > Markov chain code generator.
>
> The fact that they are colloaquial is, IMHO, a good thing is it makes
> the policy relatable to the casual reader who hears the terms "AI" and
> "LLM" in technical press articles/blogs/etc all over the place.

Yes, I think that the most important thing about the wording
of this policy (assuming we agree on it) is that it should be
immediately very clear to anybody reading it that ChatGPT,
Copilot, etc type tools aren't permitted. Because in practice
the most likely case is somebody who wants to use those, and we
don't want to make them have to go through "read an abstract
definition of what isn't permitted and apply that abstract
definition to the concrete tool they're using".

thanks
-- PMM


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators
  2023-11-23 17:58     ` Daniel P. Berrangé
@ 2023-11-23 22:39       ` Michael S. Tsirkin
  2023-11-24  9:06         ` Daniel P. Berrangé
  0 siblings, 1 reply; 57+ messages in thread
From: Michael S. Tsirkin @ 2023-11-23 22:39 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Richard Henderson, Alexander Graf, Alex Bennée,
	Paolo Bonzini, Markus Armbruster, Phil Mathieu-Daudé,
	Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann,
	Mark Cave-Ayland, Peter Maydell

On Thu, Nov 23, 2023 at 05:58:45PM +0000, Daniel P. Berrangé wrote:
> The license of a code generation tool itself is usually considered
> to be not a factor in the license of its output.

Really? I would find it very surprising if a code generation tool that
is not a language model and so is not understanding the code it's
generating did not include some code snippets going into the output.
It is also possible to unintentionally run afoul of GPL's definition of source
code which is "the preferred form of the work for making modifications to it". 
So even if you have copyright to input, dumping just output and putting
GPL on it might or might not be ok.

-- 
MST

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators
  2023-11-23 17:37     ` Michal Suchánek
@ 2023-11-23 23:27       ` Michael S. Tsirkin
  0 siblings, 0 replies; 57+ messages in thread
From: Michael S. Tsirkin @ 2023-11-23 23:27 UTC (permalink / raw)
  To: Michal Suchánek
  Cc: Alex Bennée, Daniel P. Berrangé, qemu-devel,
	Richard Henderson, Alexander Graf, Paolo Bonzini,
	Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi,
	Thomas Huth, Kevin Wolf, Gerd Hoffmann, Mark Cave-Ayland,
	Peter Maydell

On Thu, Nov 23, 2023 at 06:37:47PM +0100, Michal Suchánek wrote:
> If you could erase the copyright on anything by feeding it into a
> statistical model and pulling it back out there
> Would be some big
> content license holders objecting so it's very unlikely to happen.

I won't venture a guess and I think neither should QEMU.  For now, being
on the safe side and rejecting auto-generated code sounds very
reasonable to me, though, in particular because it's often
quite low quality ;).

Not a lawyer, and I don't speak for Red Hat.
-- 
MST



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators
  2023-11-23 17:46     ` Daniel P. Berrangé
@ 2023-11-23 23:53       ` Michael S. Tsirkin
  2023-11-24 10:17         ` Kevin Wolf
  0 siblings, 1 reply; 57+ messages in thread
From: Michael S. Tsirkin @ 2023-11-23 23:53 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Alex Bennée, qemu-devel, Richard Henderson, Alexander Graf,
	Paolo Bonzini, Markus Armbruster, Phil Mathieu-Daudé,
	Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann,
	Mark Cave-Ayland, Peter Maydell

On Thu, Nov 23, 2023 at 05:46:16PM +0000, Daniel P. Berrangé wrote:
> On Thu, Nov 23, 2023 at 12:57:42PM +0000, Alex Bennée wrote:
> > Daniel P. Berrangé <berrange@redhat.com> writes:
> > 
> > > There has been an explosion of interest in so called "AI" (LLM)
> > > code generators in the past year or so. Thus far though, this is
> > > has not been matched by a broadly accepted legal interpretation
> > > of the licensing implications for code generator outputs. While
> > > the vendors may claim there is no problem and a free choice of
> > > license is possible, they have an inherent conflict of interest
> > > in promoting this interpretation. More broadly there is, as yet,
> > > no broad consensus on the licensing implications of code generators
> > > trained on inputs under a wide variety of licenses.
> > >
> > > The DCO requires contributors to assert they have the right to
> > > contribute under the designated project license. Given the lack
> > > of consensus on the licensing of "AI" (LLM) code generator output,
> > > it is not considered credible to assert compliance with the DCO
> > > clause (b) or (c) where a patch includes such generated code.
> > >
> > > This patch thus defines a policy that the QEMU project will not
> > > accept contributions where use of "AI" (LLM) code generators is
> > > either known, or suspected.
> > >
> > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > > ---
> > >  docs/devel/code-provenance.rst | 40 ++++++++++++++++++++++++++++++++++
> > >  1 file changed, 40 insertions(+)
> > >
> > > diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
> > > index b4591a2dec..a6e42c6b1b 100644
> > > --- a/docs/devel/code-provenance.rst
> > > +++ b/docs/devel/code-provenance.rst
> > > @@ -195,3 +195,43 @@ example::
> > >    Signed-off-by: Some Person <some.person@example.com>
> > >    [Rebased and added support for 'foo']
> > >    Signed-off-by: New Person <new.person@example.com>
> > > +
> > > +Use of "AI" (LLM) code generators
> > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > +
> > > +TL;DR:
> > > +
> > > +  **Current QEMU project policy is to DECLINE any contributions
> > > +  which are believed to include or derive from "AI" (LLM)
> > > +  generated code.**
> > > +
> > > +The existence of "AI" (`Large Language Model <https://en.wikipedia.org/wiki/Large_language_model>`__
> > > +/ LLM) code generators raises a number of difficult legal questions, a
> > > +number of which impact on Open Source projects. As noted earlier, the
> > > +QEMU community requires that contributors certify their patch submissions
> > > +are made in accordance with the rules of the :ref:`dco` (DCO). When a
> > > +patch contains "AI" generated code this raises difficulties with code
> > > +provenence and thus DCO compliance.
> > 
> > I agree this is going to be a field that keeps lawyers well re-numerated
> > for the foreseeable future. However I suspect this elides over the main
> > use case for LLM generators which is non-novel transformation. One good
> > example is generating text fixtures where you write a piece of original
> > code and then ask the code completion engine to fill out some unit tests
> > to exercise the code. It's boring mechanical work but one an LLM is very
> > suited to (even if you might tweak the final result).
> 
> Yes, I can see how that is helpful, but I think in many cases the
> resulting code will be complex enough to be considered copyrightable,
> and so even with the original input code, I feel the licensing of the
> output is still ill-defined.
> 
> > 
> > > +To satisfy the DCO, the patch contributor has to fully understand
> > > +the origins and license of code they are contributing to QEMU. The
> > > +license terms that should apply to the output of an "AI" code generator
> > > +are ill-defined, given that both training data and operation of the
> > > +"AI" are typically opaque to the user. Even where the training data
> > > +is said to all be open source, it will likely be under a wide variety
> > > +of license terms.
> > > +
> > > +While the vendor's of "AI" code generators may promote the idea that
> > > +code output can be taken under a free choice of license, this is not
> > > +yet considered to be a generally accepted, nor tested, legal opinion.
> > > +
> > > +With this in mind, the QEMU maintainers does not consider it is
> > > +currently possible to comply with DCO terms (b) or (c) for most "AI"
> > > +generated code.
> > 
> > There is a load of code out that isn't eligible for copyright projection
> > because it doesn't demonstrate much originality or creativity. In the
> > experimentation I've done so far I've not seen much sign of genuine
> > creativity. LLM's benefit from having access to a wide corpus of
> > training data and tend to do a better job of inferencing solutions from
> > semi-related posts than say for example human manually comparing posts
> > having pasted an error message in google.
> 
> The boundary between what is considered copyrightable and not, it
> itself quite ill-defined, and thus it is hard to express a clear
> rule that can be applied.
> 
> I think more experience long term contributors end up getting somewhat
> of a "gut feeling" about what's ok and what's not, but I'm not sure if
> that is true for contibutors in general.
> 
> IOW, while there are likely cases where it is possible to safely use
> a AI generator, I'm not sure how to best express that in an way that
> makes sense.
> 
> Perhaps a loosely worded addendum  about possible exception for
> "trivial" output
> 
> > > +The QEMU maintainers thus require that contributors refrain from using
> > > +"AI" code generators on patches intended to be submitted to the project,
> > > +and will decline any contribution if use of "AI" is known or suspected.
> > > +
> > > +Examples of tools impacted by this policy includes both GitHub CoPilot,
> > > +and ChatGPT, amongst many others which are less well known.
> > 
> > What about if you took an LLM and then fine tuned it by using project
> > data so it could better help new users in making contributions to the
> > project? You would be biasing the model to your own data for the
> > purposes of helping developers write better QEMU code?
> 
> It is hard to provide an answer to that question, since I think it is
> something that would need to be considered case by case. It hinges
> around how much does the new QEMU specific training data influence
> the model, vs other pre-existing training (if any)
> 
> Perhaps we can finish this policy with a general point to solicit
> feedback on possible exceptions ?
> 
>   "If a contributor believes they can demonstrate that the output of
>    a particular tool has deterministic licensing, such that they can
>    satisfy the DCO, they should provide such info to the mailing list"
> 
> With regards,
> Daniel


But the question is not about what QEMU should accept. We can trust
maintainers to DTRT. The question is the meaning of DCO.  If you want
DCO to mean "this code was not generated by AI" then you better define
"AI" in an unambiguous way otherwise what is it certifying?

Instead, I propose adding simply this:

	Thus, generally, Signed-off-by from *each* person who has written
	a substantial portion of the patch is required.

	If a substantial portion of the patch was not written by any
	human person but was instead generated automatically (e.g. by an AI such
	as ChatGPT, or a decompiler) then you *must* clearly document
	this in the patch commit message. As a matter of policy, and out of an
	abundance of caution, such contributions will generally be rejected.

	When in doubt whether a specific portion is substantial - assume
	that Signed-off-by is required.





-- 
MST



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off
  2023-11-23 17:08     ` Daniel P. Berrangé
@ 2023-11-23 23:56       ` Michael S. Tsirkin
  0 siblings, 0 replies; 57+ messages in thread
From: Michael S. Tsirkin @ 2023-11-23 23:56 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Philippe Mathieu-Daudé, qemu-devel, Richard Henderson,
	Alexander Graf, Alex Bennée, Paolo Bonzini,
	Markus Armbruster, Stefan Hajnoczi, Thomas Huth, Kevin Wolf,
	Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell

On Thu, Nov 23, 2023 at 05:08:46PM +0000, Daniel P. Berrangé wrote:
> On Thu, Nov 23, 2023 at 12:58:18PM +0100, Philippe Mathieu-Daudé wrote:
> > On 23/11/23 12:40, Daniel P. Berrangé wrote:
> > > Currently we have a short paragraph saying that patches must include
> > > a Signed-off-by line, and merely link to the kernel documentation.
> > > The linked kernel docs have alot of content beyond the part about
> > > sign-off an thus is misleading/distracting to QEMU contributors.
> > > 
> > > This introduces a dedicated 'code-provenance' page in QEMU talking
> > > about why we require sign-off, explaining the other tags we commonly
> > > use, and what to do in some edge cases.
> > > 
> > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > > ---
> > >   docs/devel/code-provenance.rst    | 197 ++++++++++++++++++++++++++++++
> > >   docs/devel/index-process.rst      |   1 +
> > >   docs/devel/submitting-a-patch.rst |  18 +--
> > >   3 files changed, 201 insertions(+), 15 deletions(-)
> > >   create mode 100644 docs/devel/code-provenance.rst
> 
> > > +Other commit tags
> > > +~~~~~~~~~~~~~~~~~
> > > +
> > > +While the ``Signed-off-by`` tag is mandatory, there are a number of
> > > +other tags that are commonly used during QEMU development
> > > +
> > > + * **``Reviewed-by``**: when a QEMU community member reviews a patch
> > > +   on the mailing list, if they consider the patch acceptable, they
> > > +   should send an email reply containing a ``Reviewed-by`` tag.
> > > +
> > > +   NB: a subsystem maintainer sending a pull request would replace
> > > +   their own ``Reviewed-by`` with another ``Signed-off-by``
> > 
> > Hmm not sure about replacing, they have different meaning. You can merge
> > patch you haven't reviewed. But as a maintainer you must S-o-b what you
> > end merging (what is mentioned below in "subsystem maintainer").
> 
> I've always taken it as implied that patches I queue are reviewed by me,

Well sometimes I queue patches not in my area that I have seen languish
on list with no replies for too long. I generally do a cursory review
but not to the level that I feel justifies Reviewed-by.


> but replies here suggest I'm in a minority on that.  That shows why it is
> worth documenting this for QEMU explicitly :-)

Absolutely.



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators
  2023-11-23 22:39       ` Michael S. Tsirkin
@ 2023-11-24  9:06         ` Daniel P. Berrangé
  2023-11-24  9:27           ` Michael S. Tsirkin
  2023-11-24 10:21           ` Alex Bennée
  0 siblings, 2 replies; 57+ messages in thread
From: Daniel P. Berrangé @ 2023-11-24  9:06 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, Richard Henderson, Alexander Graf, Alex Bennée,
	Paolo Bonzini, Markus Armbruster, Phil Mathieu-Daudé,
	Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann,
	Mark Cave-Ayland, Peter Maydell

On Thu, Nov 23, 2023 at 05:39:18PM -0500, Michael S. Tsirkin wrote:
> On Thu, Nov 23, 2023 at 05:58:45PM +0000, Daniel P. Berrangé wrote:
> > The license of a code generation tool itself is usually considered
> > to be not a factor in the license of its output.
> 
> Really? I would find it very surprising if a code generation tool that
> is not a language model and so is not understanding the code it's
> generating did not include some code snippets going into the output.
> It is also possible to unintentionally run afoul of GPL's definition of source
> code which is "the preferred form of the work for making modifications to it". 
> So even if you have copyright to input, dumping just output and putting
> GPL on it might or might not be ok.

Consider the C pre-processor. This takes an input .c file, and expands
all the macros, to split out a new .c file.

The license of the output .c file is determined by the license of the
input .c file. The license of the CPP impl (whether OSS or proprietary)
doesn't have any influence on the license of the output file, it cannot
magically force the output file to be proprietary any more than it can
force it to be output file GPL.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators
  2023-11-24  9:06         ` Daniel P. Berrangé
@ 2023-11-24  9:27           ` Michael S. Tsirkin
  2023-11-24 10:21           ` Alex Bennée
  1 sibling, 0 replies; 57+ messages in thread
From: Michael S. Tsirkin @ 2023-11-24  9:27 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Richard Henderson, Alexander Graf, Alex Bennée,
	Paolo Bonzini, Markus Armbruster, Phil Mathieu-Daudé,
	Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann,
	Mark Cave-Ayland, Peter Maydell

On Fri, Nov 24, 2023 at 09:06:29AM +0000, Daniel P. Berrangé wrote:
> On Thu, Nov 23, 2023 at 05:39:18PM -0500, Michael S. Tsirkin wrote:
> > On Thu, Nov 23, 2023 at 05:58:45PM +0000, Daniel P. Berrangé wrote:
> > > The license of a code generation tool itself is usually considered
> > > to be not a factor in the license of its output.
> > 
> > Really? I would find it very surprising if a code generation tool that
> > is not a language model and so is not understanding the code it's
> > generating did not include some code snippets going into the output.
> > It is also possible to unintentionally run afoul of GPL's definition of source
> > code which is "the preferred form of the work for making modifications to it". 
> > So even if you have copyright to input, dumping just output and putting
> > GPL on it might or might not be ok.
> 
> Consider the C pre-processor. This takes an input .c file, and expands
> all the macros, to split out a new .c file.
> 
> The license of the output .c file is determined by the license of the
> input .c file. The license of the CPP impl (whether OSS or proprietary)
> doesn't have any influence on the license of the output file, it cannot
> magically force the output file to be proprietary any more than it can
> force it to be output file GPL.
> 
> With regards,
> Daniel

Sorry I don't get how is C preprocessor relevant here? It does not
generate source code in the GPL sense. We won't accept C preprocessor
output in a patch.

Not being a lawyer I personally am not really interested in discussing
how copyright works, certainly not at this highly abstract and
simplified level.

-- 
MST



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off
  2023-11-23 17:16     ` Daniel P. Berrangé
  2023-11-23 17:33       ` Michael S. Tsirkin
@ 2023-11-24  9:49       ` Kevin Wolf
  1 sibling, 0 replies; 57+ messages in thread
From: Kevin Wolf @ 2023-11-24  9:49 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Michael S. Tsirkin, qemu-devel, Richard Henderson, Alexander Graf,
	Alex Bennée, Paolo Bonzini, Markus Armbruster,
	Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth,
	Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell

Am 23.11.2023 um 18:16 hat Daniel P. Berrangé geschrieben:
> > Suggested-by is also common.
> > 
> > As long as we are here, let's document Fixes: and Cc: ?
> 
> The submitting-a-patch doc covers more general commit message information.
> I think this doc just ought to focus on tags that identify humans involved
> in the process.
> 
> I've never been sure what the point of the 'Cc' tag is, when you actually
> want to use the Cc email header ? 

By default, git-send-email automatically copies the addresses mentioned
with Cc: in the commit message, so I always assumed that this is what
people intend with these tags.

Of course, in practice many of us have suppresscc = "all" in their
config to avoid downstream accidents, so maybe there is another use?

The only time I use it is for "Cc: qemu-stable@nongnu.org". I'm not sure
if it still works like this, but people used to look for this in the
commit history when preparing stable releases. (It's useful because
sometimes people forget to actually CC the qemu-stable list when sending
the patches.)

Kevin

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators
  2023-11-23 23:53       ` Michael S. Tsirkin
@ 2023-11-24 10:17         ` Kevin Wolf
  2023-11-24 10:33           ` Alex Bennée
  0 siblings, 1 reply; 57+ messages in thread
From: Kevin Wolf @ 2023-11-24 10:17 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Daniel P. Berrangé, Alex Bennée, qemu-devel,
	Richard Henderson, Alexander Graf, Paolo Bonzini,
	Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi,
	Thomas Huth, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell

Am 24.11.2023 um 00:53 hat Michael S. Tsirkin geschrieben:
> On Thu, Nov 23, 2023 at 05:46:16PM +0000, Daniel P. Berrangé wrote:
> > On Thu, Nov 23, 2023 at 12:57:42PM +0000, Alex Bennée wrote:
> > > Daniel P. Berrangé <berrange@redhat.com> writes:
> > > 
> > > > There has been an explosion of interest in so called "AI" (LLM)
> > > > code generators in the past year or so. Thus far though, this is
> > > > has not been matched by a broadly accepted legal interpretation
> > > > of the licensing implications for code generator outputs. While
> > > > the vendors may claim there is no problem and a free choice of
> > > > license is possible, they have an inherent conflict of interest
> > > > in promoting this interpretation. More broadly there is, as yet,
> > > > no broad consensus on the licensing implications of code generators
> > > > trained on inputs under a wide variety of licenses.
> > > >
> > > > The DCO requires contributors to assert they have the right to
> > > > contribute under the designated project license. Given the lack
> > > > of consensus on the licensing of "AI" (LLM) code generator output,
> > > > it is not considered credible to assert compliance with the DCO
> > > > clause (b) or (c) where a patch includes such generated code.
> > > >
> > > > This patch thus defines a policy that the QEMU project will not
> > > > accept contributions where use of "AI" (LLM) code generators is
> > > > either known, or suspected.
> > > >
> > > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > > > ---
> > > >  docs/devel/code-provenance.rst | 40 ++++++++++++++++++++++++++++++++++
> > > >  1 file changed, 40 insertions(+)
> > > >
> > > > diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
> > > > index b4591a2dec..a6e42c6b1b 100644
> > > > --- a/docs/devel/code-provenance.rst
> > > > +++ b/docs/devel/code-provenance.rst
> > > > @@ -195,3 +195,43 @@ example::
> > > >    Signed-off-by: Some Person <some.person@example.com>
> > > >    [Rebased and added support for 'foo']
> > > >    Signed-off-by: New Person <new.person@example.com>
> > > > +
> > > > +Use of "AI" (LLM) code generators
> > > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > > +
> > > > +TL;DR:
> > > > +
> > > > +  **Current QEMU project policy is to DECLINE any contributions
> > > > +  which are believed to include or derive from "AI" (LLM)
> > > > +  generated code.**
> > > > +
> > > > +The existence of "AI" (`Large Language Model <https://en.wikipedia.org/wiki/Large_language_model>`__
> > > > +/ LLM) code generators raises a number of difficult legal questions, a
> > > > +number of which impact on Open Source projects. As noted earlier, the
> > > > +QEMU community requires that contributors certify their patch submissions
> > > > +are made in accordance with the rules of the :ref:`dco` (DCO). When a
> > > > +patch contains "AI" generated code this raises difficulties with code
> > > > +provenence and thus DCO compliance.
> > > 
> > > I agree this is going to be a field that keeps lawyers well re-numerated
> > > for the foreseeable future. However I suspect this elides over the main
> > > use case for LLM generators which is non-novel transformation. One good
> > > example is generating text fixtures where you write a piece of original
> > > code and then ask the code completion engine to fill out some unit tests
> > > to exercise the code. It's boring mechanical work but one an LLM is very
> > > suited to (even if you might tweak the final result).
> > 
> > Yes, I can see how that is helpful, but I think in many cases the
> > resulting code will be complex enough to be considered copyrightable,
> > and so even with the original input code, I feel the licensing of the
> > output is still ill-defined.
> > 
> > > 
> > > > +To satisfy the DCO, the patch contributor has to fully understand
> > > > +the origins and license of code they are contributing to QEMU. The
> > > > +license terms that should apply to the output of an "AI" code generator
> > > > +are ill-defined, given that both training data and operation of the
> > > > +"AI" are typically opaque to the user. Even where the training data
> > > > +is said to all be open source, it will likely be under a wide variety
> > > > +of license terms.
> > > > +
> > > > +While the vendor's of "AI" code generators may promote the idea that
> > > > +code output can be taken under a free choice of license, this is not
> > > > +yet considered to be a generally accepted, nor tested, legal opinion.
> > > > +
> > > > +With this in mind, the QEMU maintainers does not consider it is
> > > > +currently possible to comply with DCO terms (b) or (c) for most "AI"
> > > > +generated code.
> > > 
> > > There is a load of code out that isn't eligible for copyright projection
> > > because it doesn't demonstrate much originality or creativity. In the
> > > experimentation I've done so far I've not seen much sign of genuine
> > > creativity. LLM's benefit from having access to a wide corpus of
> > > training data and tend to do a better job of inferencing solutions from
> > > semi-related posts than say for example human manually comparing posts
> > > having pasted an error message in google.
> > 
> > The boundary between what is considered copyrightable and not, it
> > itself quite ill-defined, and thus it is hard to express a clear
> > rule that can be applied.
> > 
> > I think more experience long term contributors end up getting somewhat
> > of a "gut feeling" about what's ok and what's not, but I'm not sure if
> > that is true for contibutors in general.
> > 
> > IOW, while there are likely cases where it is possible to safely use
> > a AI generator, I'm not sure how to best express that in an way that
> > makes sense.
> > 
> > Perhaps a loosely worded addendum  about possible exception for
> > "trivial" output
> > 
> > > > +The QEMU maintainers thus require that contributors refrain from using
> > > > +"AI" code generators on patches intended to be submitted to the project,
> > > > +and will decline any contribution if use of "AI" is known or suspected.
> > > > +
> > > > +Examples of tools impacted by this policy includes both GitHub CoPilot,
> > > > +and ChatGPT, amongst many others which are less well known.
> > > 
> > > What about if you took an LLM and then fine tuned it by using project
> > > data so it could better help new users in making contributions to the
> > > project? You would be biasing the model to your own data for the
> > > purposes of helping developers write better QEMU code?
> > 
> > It is hard to provide an answer to that question, since I think it is
> > something that would need to be considered case by case. It hinges
> > around how much does the new QEMU specific training data influence
> > the model, vs other pre-existing training (if any)

I suspect fine tuning won't be enough because it doesn't make the
unlicensed original training data go away.

If you could make sure that all of the training data consists only of
code for which you have the right to contribute it to QEMU, that would
be a different case.

> > Perhaps we can finish this policy with a general point to solicit
> > feedback on possible exceptions ?
> > 
> >   "If a contributor believes they can demonstrate that the output of
> >    a particular tool has deterministic licensing, such that they can
> >    satisfy the DCO, they should provide such info to the mailing list"
> > 
> > With regards,
> > Daniel
> 
> 
> But the question is not about what QEMU should accept. We can trust
> maintainers to DTRT. The question is the meaning of DCO.  If you want
> DCO to mean "this code was not generated by AI" then you better define
> "AI" in an unambiguous way otherwise what is it certifying?

That you can state confidently that you have the legal right to
contribute this code.

The problem is not AI per se, the problem is incompatibly licensed - or
really, unlicensed (should I call it "pirated" for effect?) - training
input for the AI.

So if you got the code from ChatGPT, I simply won't believe you even if
you claim that you have the right.

> Instead, I propose adding simply this:
> 
> 	Thus, generally, Signed-off-by from *each* person who has written
> 	a substantial portion of the patch is required.
> 
> 	If a substantial portion of the patch was not written by any
> 	human person but was instead generated automatically (e.g. by an AI such
> 	as ChatGPT, or a decompiler) then you *must* clearly document
> 	this in the patch commit message. As a matter of policy, and out of an
> 	abundance of caution, such contributions will generally be rejected.
> 
> 	When in doubt whether a specific portion is substantial - assume
> 	that Signed-off-by is required.

"generated automatically" is going way too far. There is no problem at
all with code changes generated by Coccinelle if you wrote the rules
yourself or received them under a license that allows their inclusion in
QEMU.

The problem with ChatGPT etc. is that there is no licensing information
attached to the generated code. You know it's based on someone else's
work, but you don't know who it is, if they are willing to give you a
license and under which conditions.

And it's not an "abundance of caution" why we reject such patches, but
that you obviously can't actually sign the DCO under such cirumstances
and therefore the S-o-b is wrong.

Kevin



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators
  2023-11-24  9:06         ` Daniel P. Berrangé
  2023-11-24  9:27           ` Michael S. Tsirkin
@ 2023-11-24 10:21           ` Alex Bennée
  2023-11-24 10:30             ` Michael S. Tsirkin
  2023-11-24 11:41             ` Daniel P. Berrangé
  1 sibling, 2 replies; 57+ messages in thread
From: Alex Bennée @ 2023-11-24 10:21 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Michael S. Tsirkin, qemu-devel, Richard Henderson, Alexander Graf,
	Paolo Bonzini, Markus Armbruster, Phil Mathieu-Daudé,
	Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann,
	Mark Cave-Ayland, Peter Maydell

Daniel P. Berrangé <berrange@redhat.com> writes:

> On Thu, Nov 23, 2023 at 05:39:18PM -0500, Michael S. Tsirkin wrote:
>> On Thu, Nov 23, 2023 at 05:58:45PM +0000, Daniel P. Berrangé wrote:
>> > The license of a code generation tool itself is usually considered
>> > to be not a factor in the license of its output.
>> 
>> Really? I would find it very surprising if a code generation tool that
>> is not a language model and so is not understanding the code it's
>> generating did not include some code snippets going into the output.
>> It is also possible to unintentionally run afoul of GPL's definition of source
>> code which is "the preferred form of the work for making modifications to it". 
>> So even if you have copyright to input, dumping just output and putting
>> GPL on it might or might not be ok.
>
> Consider the C pre-processor. This takes an input .c file, and expands
> all the macros, to split out a new .c file.
>
> The license of the output .c file is determined by the license of the
> input .c file. The license of the CPP impl (whether OSS or proprietary)
> doesn't have any influence on the license of the output file, it cannot
> magically force the output file to be proprietary any more than it can
> force it to be output file GPL.

LLM's are just a tool like a compiler (albeit with spookier different
internals). The prompt and the instructions are arguably the more
important part of how to get good results from the LLM transformation.
In fact most of the way I've been using them has been by pasting some
existing code and asking for review or transformation of it.

However I totally get that using the various online LLMs you have very
little transparency about what has gone into their training and therefor
there is a danger of proprietary code being hallucinated out of their
matricies. Conversely what if I use an LLM like OpenLLaMa:

  https://github.com/openlm-research/open_llama

I have fairly exhaustive definitions of what went into the training data
which of most interest is probably the StarCoder dataset (paper):

  https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view

where there are tools to detect if generated code has been lifted
directly from the dataset or is indeed a transformation.


>
> With regards,
> Daniel

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators
  2023-11-23 14:56     ` Manos Pitsidianakis
                         ` (3 preceding siblings ...)
  2023-11-23 18:02       ` Daniel P. Berrangé
@ 2023-11-24 10:25       ` Kevin Wolf
  2023-11-24 10:37         ` Michael S. Tsirkin
  2023-11-24 10:42         ` Manos Pitsidianakis
  4 siblings, 2 replies; 57+ messages in thread
From: Kevin Wolf @ 2023-11-24 10:25 UTC (permalink / raw)
  To: Manos Pitsidianakis
  Cc: qemu-devel, Michael S. Tsirkin, Daniel P. Berrangé,
	Richard Henderson, Alexander Graf, Alex Benné e,
	Paolo Bonzini, Markus Armbruster, Phil Mathieu-Daudé,
	Stefan Hajnoczi, Thomas Huth, Gerd Hoffmann, Mark Cave-Ayland,
	Peter Maydell

Am 23.11.2023 um 15:56 hat Manos Pitsidianakis geschrieben:
> On Thu, 23 Nov 2023 16:35, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Thu, Nov 23, 2023 at 11:40:26AM +0000, Daniel P. Berrangé wrote:
> > > There has been an explosion of interest in so called "AI" (LLM)
> > > code generators in the past year or so. Thus far though, this is
> > > has not been matched by a broadly accepted legal interpretation
> > > of the licensing implications for code generator outputs. While
> > > the vendors may claim there is no problem and a free choice of
> > > license is possible, they have an inherent conflict of interest
> > > in promoting this interpretation. More broadly there is, as yet,
> > > no broad consensus on the licensing implications of code generators
> > > trained on inputs under a wide variety of licenses.
> > > 
> > > The DCO requires contributors to assert they have the right to
> > > contribute under the designated project license. Given the lack
> > > of consensus on the licensing of "AI" (LLM) code generator output,
> > > it is not considered credible to assert compliance with the DCO
> > > clause (b) or (c) where a patch includes such generated code.
> > > 
> > > This patch thus defines a policy that the QEMU project will not
> > > accept contributions where use of "AI" (LLM) code generators is
> > > either known, or suspected.
> > > 
> > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > > ---
> > >  docs/devel/code-provenance.rst | 40 ++++++++++++++++++++++++++++++++++
> > >  1 file changed, 40 insertions(+)
> > > 
> > > diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
> > > index b4591a2dec..a6e42c6b1b 100644
> > > --- a/docs/devel/code-provenance.rst
> > > +++ b/docs/devel/code-provenance.rst
> > > @@ -195,3 +195,43 @@ example::
> > >    Signed-off-by: Some Person <some.person@example.com>
> > >    [Rebased and added support for 'foo']
> > >    Signed-off-by: New Person <new.person@example.com>
> > > +
> > > +Use of "AI" (LLM) code generators
> > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > +
> > > +TL;DR:
> > > +
> > > +  **Current QEMU project policy is to DECLINE any contributions
> > > +  which are believed to include or derive from "AI" (LLM)
> > > +  generated code.**
> > > +
> > > +The existence of "AI" (`Large Language Model <https://en.wikipedia.org/wiki/Large_language_model>`__
> > > +/ LLM) code generators raises a number of difficult legal questions, a
> > > +number of which impact on Open Source projects. As noted earlier, the
> > > +QEMU community requires that contributors certify their patch submissions
> > > +are made in accordance with the rules of the :ref:`dco` (DCO). When a
> > > +patch contains "AI" generated code this raises difficulties with code
> > > +provenence and thus DCO compliance.
> > > +
> > > +To satisfy the DCO, the patch contributor has to fully understand
> > > +the origins and license of code they are contributing to QEMU. The
> > > +license terms that should apply to the output of an "AI" code generator
> > > +are ill-defined, given that both training data and operation of the
> > > +"AI" are typically opaque to the user. Even where the training data
> > > +is said to all be open source, it will likely be under a wide variety
> > > +of license terms.
> > > +
> > > +While the vendor's of "AI" code generators may promote the idea that
> > > +code output can be taken under a free choice of license, this is not
> > > +yet considered to be a generally accepted, nor tested, legal opinion.
> > > +
> > > +With this in mind, the QEMU maintainers does not consider it is
> > > +currently possible to comply with DCO terms (b) or (c) for most "AI"
> > > +generated code.
> > > +
> > > +The QEMU maintainers thus require that contributors refrain from using
> > > +"AI" code generators on patches intended to be submitted to the project,
> > > +and will decline any contribution if use of "AI" is known or suspected.
> > > +
> > > +Examples of tools impacted by this policy includes both GitHub CoPilot,
> > > +and ChatGPT, amongst many others which are less well known.
> > 
> > 
> > So you called out these two by name, fine, but given "AI" is in scare
> > quotes I don't really know what is or is not allowed and I don't know
> > how will contributors know.  Is the "AI" that one must not use
> > necessarily an LLM?  And how do you define LLM even? Wikipedia says
> > "general-purpose language understanding and generation".
> > 
> > 
> > All this seems vague to me.
> > 
> > 
> > However, can't we define a simpler more specific policy?
> > For example, isn't it true that *any* automatically generated code
> > can only be included if the scripts producing said code
> > are also included or otherwise available under GPLv2?
> 
> The following definition makes sense to me:
> 
> - Automated codegen tool must be idempotent.
> - Automated codegen tool must not use statistical modelling.

How are these definitions related to your ability to sign the DCO?

Kevin



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators
  2023-11-24 10:21           ` Alex Bennée
@ 2023-11-24 10:30             ` Michael S. Tsirkin
  2023-11-24 11:41             ` Daniel P. Berrangé
  1 sibling, 0 replies; 57+ messages in thread
From: Michael S. Tsirkin @ 2023-11-24 10:30 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Daniel P. Berrangé, qemu-devel, Richard Henderson,
	Alexander Graf, Paolo Bonzini, Markus Armbruster,
	Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Kevin Wolf,
	Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell

On Fri, Nov 24, 2023 at 10:21:17AM +0000, Alex Bennée wrote:
> LLM's are just a tool like a compiler (albeit with spookier different
> internals).

We already generally don't accept compiler output in patches since
it is not source code by the definition of GPL.

-- 
MST



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators
  2023-11-24 10:17         ` Kevin Wolf
@ 2023-11-24 10:33           ` Alex Bennée
  2023-11-24 10:42             ` Michael S. Tsirkin
  0 siblings, 1 reply; 57+ messages in thread
From: Alex Bennée @ 2023-11-24 10:33 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Michael S. Tsirkin, Daniel P. Berrangé, qemu-devel,
	Richard Henderson, Alexander Graf, Paolo Bonzini,
	Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi,
	Thomas Huth, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell

Kevin Wolf <kwolf@redhat.com> writes:

> Am 24.11.2023 um 00:53 hat Michael S. Tsirkin geschrieben:
>> On Thu, Nov 23, 2023 at 05:46:16PM +0000, Daniel P. Berrangé wrote:
>> > On Thu, Nov 23, 2023 at 12:57:42PM +0000, Alex Bennée wrote:
>> > > Daniel P. Berrangé <berrange@redhat.com> writes:
>> > > 
<snip>
>> > > > +The QEMU maintainers thus require that contributors refrain from using
>> > > > +"AI" code generators on patches intended to be submitted to the project,
>> > > > +and will decline any contribution if use of "AI" is known or suspected.
>> > > > +
>> > > > +Examples of tools impacted by this policy includes both GitHub CoPilot,
>> > > > +and ChatGPT, amongst many others which are less well known.
>> > > 
>> > > What about if you took an LLM and then fine tuned it by using project
>> > > data so it could better help new users in making contributions to the
>> > > project? You would be biasing the model to your own data for the
>> > > purposes of helping developers write better QEMU code?
>> > 
>> > It is hard to provide an answer to that question, since I think it is
>> > something that would need to be considered case by case. It hinges
>> > around how much does the new QEMU specific training data influence
>> > the model, vs other pre-existing training (if any)
>
> I suspect fine tuning won't be enough because it doesn't make the
> unlicensed original training data go away.
>
> If you could make sure that all of the training data consists only of
> code for which you have the right to contribute it to QEMU, that would
> be a different case.

That probably means we can never use even open source LLMs to generate
code for QEMU because while the source data is all open source it won't
necessarily be GPL compatible.

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators
  2023-11-24 10:25       ` Kevin Wolf
@ 2023-11-24 10:37         ` Michael S. Tsirkin
  2023-11-24 10:42         ` Manos Pitsidianakis
  1 sibling, 0 replies; 57+ messages in thread
From: Michael S. Tsirkin @ 2023-11-24 10:37 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Manos Pitsidianakis, qemu-devel, Daniel P. Berrangé,
	Richard Henderson, Alexander Graf, Alex Benné e,
	Paolo Bonzini, Markus Armbruster, Phil Mathieu-Daudé,
	Stefan Hajnoczi, Thomas Huth, Gerd Hoffmann, Mark Cave-Ayland,
	Peter Maydell

On Fri, Nov 24, 2023 at 11:25:55AM +0100, Kevin Wolf wrote:
> > - Automated codegen tool must be idempotent.
> > - Automated codegen tool must not use statistical modelling.
> 
> How are these definitions related to your ability to sign the DCO?

Not only that - while the question of whether code generated e.g. by copilot
would be source code by GPL definition is unclear at least to me,
code generated by an idempotent automated tool seems highly
likely not to satisfy the GPL definition.
Though I am not a lawyer and do not speak for Red Hat.

-- 
MST



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators
  2023-11-24 10:33           ` Alex Bennée
@ 2023-11-24 10:42             ` Michael S. Tsirkin
  2023-11-24 10:43               ` Peter Maydell
  0 siblings, 1 reply; 57+ messages in thread
From: Michael S. Tsirkin @ 2023-11-24 10:42 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Kevin Wolf, Daniel P. Berrangé, qemu-devel,
	Richard Henderson, Alexander Graf, Paolo Bonzini,
	Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi,
	Thomas Huth, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell

On Fri, Nov 24, 2023 at 10:33:49AM +0000, Alex Bennée wrote:
> That probably means we can never use even open source LLMs to generate
> code for QEMU because while the source data is all open source it won't
> necessarily be GPL compatible.

I would probably wait until the dust settles before we start accepting
LLM generated code. If nothing else, generated code quality
in our niche area is at this point still nowhere near being useful.

-- 
MST



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators
  2023-11-24 10:25       ` Kevin Wolf
  2023-11-24 10:37         ` Michael S. Tsirkin
@ 2023-11-24 10:42         ` Manos Pitsidianakis
  1 sibling, 0 replies; 57+ messages in thread
From: Manos Pitsidianakis @ 2023-11-24 10:42 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-devel, Michael S. Tsirkin, Daniel P. Berrangé ,
	Richard Henderson, Alexander Graf, Alex Benné e,
	Paolo Bonzini, Markus Armbruster, Phil Mathieu-Daudé ,
	Stefan Hajnoczi, Thomas Huth, Gerd Hoffmann, Mark Cave-Ayland,
	Peter Maydell

On Fri, 24 Nov 2023 12:25, Kevin Wolf <kwolf@redhat.com> wrote:
>Am 23.11.2023 um 15:56 hat Manos Pitsidianakis geschrieben:
>> On Thu, 23 Nov 2023 16:35, "Michael S. Tsirkin" <mst@redhat.com> wrote:
>> > On Thu, Nov 23, 2023 at 11:40:26AM +0000, Daniel P. Berrangé wrote:
>> > > There has been an explosion of interest in so called "AI" (LLM)
>> > > code generators in the past year or so. Thus far though, this is
>> > > has not been matched by a broadly accepted legal interpretation
>> > > of the licensing implications for code generator outputs. While
>> > > the vendors may claim there is no problem and a free choice of
>> > > license is possible, they have an inherent conflict of interest
>> > > in promoting this interpretation. More broadly there is, as yet,
>> > > no broad consensus on the licensing implications of code generators
>> > > trained on inputs under a wide variety of licenses.
>> > > 
>> > > The DCO requires contributors to assert they have the right to
>> > > contribute under the designated project license. Given the lack
>> > > of consensus on the licensing of "AI" (LLM) code generator output,
>> > > it is not considered credible to assert compliance with the DCO
>> > > clause (b) or (c) where a patch includes such generated code.
>> > > 
>> > > This patch thus defines a policy that the QEMU project will not
>> > > accept contributions where use of "AI" (LLM) code generators is
>> > > either known, or suspected.
>> > > 
>> > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
>> > > ---
>> > >  docs/devel/code-provenance.rst | 40 ++++++++++++++++++++++++++++++++++
>> > >  1 file changed, 40 insertions(+)
>> > > 
>> > > diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
>> > > index b4591a2dec..a6e42c6b1b 100644
>> > > --- a/docs/devel/code-provenance.rst
>> > > +++ b/docs/devel/code-provenance.rst
>> > > @@ -195,3 +195,43 @@ example::
>> > >    Signed-off-by: Some Person <some.person@example.com>
>> > >    [Rebased and added support for 'foo']
>> > >    Signed-off-by: New Person <new.person@example.com>
>> > > +
>> > > +Use of "AI" (LLM) code generators
>> > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> > > +
>> > > +TL;DR:
>> > > +
>> > > +  **Current QEMU project policy is to DECLINE any contributions
>> > > +  which are believed to include or derive from "AI" (LLM)
>> > > +  generated code.**
>> > > +
>> > > +The existence of "AI" (`Large Language Model <https://en.wikipedia.org/wiki/Large_language_model>`__
>> > > +/ LLM) code generators raises a number of difficult legal questions, a
>> > > +number of which impact on Open Source projects. As noted earlier, the
>> > > +QEMU community requires that contributors certify their patch submissions
>> > > +are made in accordance with the rules of the :ref:`dco` (DCO). When a
>> > > +patch contains "AI" generated code this raises difficulties with code
>> > > +provenence and thus DCO compliance.
>> > > +
>> > > +To satisfy the DCO, the patch contributor has to fully understand
>> > > +the origins and license of code they are contributing to QEMU. The
>> > > +license terms that should apply to the output of an "AI" code generator
>> > > +are ill-defined, given that both training data and operation of the
>> > > +"AI" are typically opaque to the user. Even where the training data
>> > > +is said to all be open source, it will likely be under a wide variety
>> > > +of license terms.
>> > > +
>> > > +While the vendor's of "AI" code generators may promote the idea that
>> > > +code output can be taken under a free choice of license, this is not
>> > > +yet considered to be a generally accepted, nor tested, legal opinion.
>> > > +
>> > > +With this in mind, the QEMU maintainers does not consider it is
>> > > +currently possible to comply with DCO terms (b) or (c) for most "AI"
>> > > +generated code.
>> > > +
>> > > +The QEMU maintainers thus require that contributors refrain from using
>> > > +"AI" code generators on patches intended to be submitted to the project,
>> > > +and will decline any contribution if use of "AI" is known or suspected.
>> > > +
>> > > +Examples of tools impacted by this policy includes both GitHub CoPilot,
>> > > +and ChatGPT, amongst many others which are less well known.
>> > 
>> > 
>> > So you called out these two by name, fine, but given "AI" is in scare
>> > quotes I don't really know what is or is not allowed and I don't know
>> > how will contributors know.  Is the "AI" that one must not use
>> > necessarily an LLM?  And how do you define LLM even? Wikipedia says
>> > "general-purpose language understanding and generation".
>> > 
>> > 
>> > All this seems vague to me.
>> > 
>> > 
>> > However, can't we define a simpler more specific policy?
>> > For example, isn't it true that *any* automatically generated code
>> > can only be included if the scripts producing said code
>> > are also included or otherwise available under GPLv2?
>> 
>> The following definition makes sense to me:
>> 
>> - Automated codegen tool must be idempotent.
>> - Automated codegen tool must not use statistical modelling.
>
>How are these definitions related to your ability to sign the DCO?
>
>Kevin

This was a response to Michael's salient observation that AI and LLM are 
very vague and not clearly defined terms. I did not mention DCO at all.

Manos


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators
  2023-11-24 10:42             ` Michael S. Tsirkin
@ 2023-11-24 10:43               ` Peter Maydell
  2023-11-24 11:02                 ` Michael S. Tsirkin
  2023-11-24 11:37                 ` Daniel P. Berrangé
  0 siblings, 2 replies; 57+ messages in thread
From: Peter Maydell @ 2023-11-24 10:43 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Alex Bennée, Kevin Wolf, Daniel P. Berrangé, qemu-devel,
	Richard Henderson, Alexander Graf, Paolo Bonzini,
	Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi,
	Thomas Huth, Gerd Hoffmann, Mark Cave-Ayland

On Fri, 24 Nov 2023 at 10:42, Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Fri, Nov 24, 2023 at 10:33:49AM +0000, Alex Bennée wrote:
> > That probably means we can never use even open source LLMs to generate
> > code for QEMU because while the source data is all open source it won't
> > necessarily be GPL compatible.
>
> I would probably wait until the dust settles before we start accepting
> LLM generated code.

I think that's pretty much my take on what this policy is:
"say no for now; we can always come back later when the legal
situation seems clearer".

-- PMM


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators
  2023-11-24 10:43               ` Peter Maydell
@ 2023-11-24 11:02                 ` Michael S. Tsirkin
  2023-11-24 11:37                 ` Daniel P. Berrangé
  1 sibling, 0 replies; 57+ messages in thread
From: Michael S. Tsirkin @ 2023-11-24 11:02 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Alex Bennée, Kevin Wolf, Daniel P. Berrangé, qemu-devel,
	Richard Henderson, Alexander Graf, Paolo Bonzini,
	Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi,
	Thomas Huth, Gerd Hoffmann, Mark Cave-Ayland

On Fri, Nov 24, 2023 at 10:43:05AM +0000, Peter Maydell wrote:
> On Fri, 24 Nov 2023 at 10:42, Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Fri, Nov 24, 2023 at 10:33:49AM +0000, Alex Bennée wrote:
> > > That probably means we can never use even open source LLMs to generate
> > > code for QEMU because while the source data is all open source it won't
> > > necessarily be GPL compatible.
> >
> > I would probably wait until the dust settles before we start accepting
> > LLM generated code.
> 
> I think that's pretty much my take on what this policy is:
> "say no for now; we can always come back later when the legal
> situation seems clearer".

Absolutely. So I think we should not try and venture into terminology
such as what is ai or try and promote legal copyright theories.
ATM there's no good reason for someone who did not write the code
to put their DCO on the code. If it is not clear who wrote the code
because it was generated and not written then we don't want it.

-- 
MST



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off
  2023-11-23 17:33       ` Michael S. Tsirkin
@ 2023-11-24 11:11         ` Philippe Mathieu-Daudé
  2023-11-24 11:27           ` Michael S. Tsirkin
  0 siblings, 1 reply; 57+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-11-24 11:11 UTC (permalink / raw)
  To: Michael S. Tsirkin, Daniel P. Berrangé
  Cc: qemu-devel, Richard Henderson, Alexander Graf, Alex Bennée,
	Paolo Bonzini, Markus Armbruster, Stefan Hajnoczi, Thomas Huth,
	Kevin Wolf, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell

On 23/11/23 18:33, Michael S. Tsirkin wrote:
> On Thu, Nov 23, 2023 at 05:16:45PM +0000, Daniel P. Berrangé wrote:
>> On Thu, Nov 23, 2023 at 09:25:13AM -0500, Michael S. Tsirkin wrote:
>>> On Thu, Nov 23, 2023 at 11:40:25AM +0000, Daniel P. Berrangé wrote:
>>>> Currently we have a short paragraph saying that patches must include
>>>> a Signed-off-by line, and merely link to the kernel documentation.
>>>> The linked kernel docs have alot of content beyond the part about
>>>> sign-off an thus is misleading/distracting to QEMU contributors.
>>>>
>>>> This introduces a dedicated 'code-provenance' page in QEMU talking
>>>> about why we require sign-off, explaining the other tags we commonly
>>>> use, and what to do in some edge cases.
>>>>
>>>> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
>>>
>>
>>>> +  * The non-primary author's contributions were so trivial that
>>>> +    they can be considered not subject to copyright. In this case
>>>> +    the secondary authors need not include a ``Signed-off-by``.
>>>> +
>>>> +    This case most commonly applies where QEMU reviewers give short
>>>> +    snippets of code as suggested fixes to a patch. The reviewers
>>>> +    don't need to have their own ``Signed-off-by`` added unless
>>>> +    their code suggestion was unusually large.
>>>
>>> It is still a good policy to include attribution, e.g.
>>> by adding a Suggested-by tag.
>>
>> Will add this tag.

Thanks!

>>>> +Other commit tags
>>>> +~~~~~~~~~~~~~~~~~


>>> As long as we are here, let's document Fixes: and Cc: ?
>>
>> The submitting-a-patch doc covers more general commit message information.
>> I think this doc just ought to focus on tags that identify humans involved
>> in the process.
>>
>> I've never been sure what the point of the 'Cc' tag is, when you actually
>> want to use the Cc email header ?
>>
> 
> It records the fact that these people have been copied but did not
> respond.
This might be felt aggressive or forcing. My understanding of this Cc
tag in a commit is "now that it is merged, you can't complain". We can
be absent, sick, on holidays... If I missed a merged patch review I'll
try to kindly ask on the list if it can be reworked, or suggest a patch
to fix what I missed.

Not sure this is really useful to commit that to the repository.

IMHO the only useful Cc tag is for qemu-stable@nongnu.org, as Kevin
mentioned.

If you want to be sure your patch is Cc to a set of developers, you can
add Cc: lines below the '---' patch separator. My 2 cents eh...

Regards,

Phil.


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off
  2023-11-24 11:11         ` Philippe Mathieu-Daudé
@ 2023-11-24 11:27           ` Michael S. Tsirkin
  0 siblings, 0 replies; 57+ messages in thread
From: Michael S. Tsirkin @ 2023-11-24 11:27 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: Daniel P. Berrangé, qemu-devel, Richard Henderson,
	Alexander Graf, Alex Bennée, Paolo Bonzini,
	Markus Armbruster, Stefan Hajnoczi, Thomas Huth, Kevin Wolf,
	Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell

On Fri, Nov 24, 2023 at 12:11:30PM +0100, Philippe Mathieu-Daudé wrote:
> On 23/11/23 18:33, Michael S. Tsirkin wrote:
> > On Thu, Nov 23, 2023 at 05:16:45PM +0000, Daniel P. Berrangé wrote:
> > > On Thu, Nov 23, 2023 at 09:25:13AM -0500, Michael S. Tsirkin wrote:
> > > > On Thu, Nov 23, 2023 at 11:40:25AM +0000, Daniel P. Berrangé wrote:
> > > > > Currently we have a short paragraph saying that patches must include
> > > > > a Signed-off-by line, and merely link to the kernel documentation.
> > > > > The linked kernel docs have alot of content beyond the part about
> > > > > sign-off an thus is misleading/distracting to QEMU contributors.
> > > > > 
> > > > > This introduces a dedicated 'code-provenance' page in QEMU talking
> > > > > about why we require sign-off, explaining the other tags we commonly
> > > > > use, and what to do in some edge cases.
> > > > > 
> > > > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > > > 
> > > 
> > > > > +  * The non-primary author's contributions were so trivial that
> > > > > +    they can be considered not subject to copyright. In this case
> > > > > +    the secondary authors need not include a ``Signed-off-by``.
> > > > > +
> > > > > +    This case most commonly applies where QEMU reviewers give short
> > > > > +    snippets of code as suggested fixes to a patch. The reviewers
> > > > > +    don't need to have their own ``Signed-off-by`` added unless
> > > > > +    their code suggestion was unusually large.
> > > > 
> > > > It is still a good policy to include attribution, e.g.
> > > > by adding a Suggested-by tag.
> > > 
> > > Will add this tag.
> 
> Thanks!
> 
> > > > > +Other commit tags
> > > > > +~~~~~~~~~~~~~~~~~
> 
> 
> > > > As long as we are here, let's document Fixes: and Cc: ?
> > > 
> > > The submitting-a-patch doc covers more general commit message information.
> > > I think this doc just ought to focus on tags that identify humans involved
> > > in the process.
> > > 
> > > I've never been sure what the point of the 'Cc' tag is, when you actually
> > > want to use the Cc email header ?
> > > 
> > 
> > It records the fact that these people have been copied but did not
> > respond.
> This might be felt aggressive or forcing.
> My understanding of this Cc
> tag in a commit is "now that it is merged, you can't complain". We can
> be absent, sick, on holidays... If I missed a merged patch review I'll
> try to kindly ask on the list if it can be reworked, or suggest a patch
> to fix what I missed.

> Not sure this is really useful to commit that to the repository.

I don't see it as forcing. Sometimes I do a fly-by review of a patch
that caught my eye not in my area. Later people address my comments
and start copying me but I don't have time to re-review.
Recoding the fact that they copied me seems important.

This info might be helpful in git history for other reasons
- helps looking for someone to help review backports
- to guess at code quality - to help understand whether code had all the needed
  people copied


> 
> IMHO the only useful Cc tag is for qemu-stable@nongnu.org, as Kevin
> mentioned.
> 
> If you want to be sure your patch is Cc to a set of developers, you can
> add Cc: lines below the '---' patch separator. My 2 cents eh...
> 
> Regards,
> 
> Phil.


If people feel threatened by CC I don't have a problem to ask people
to put it in a note so it comes after ---.

-- 
MST



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators
  2023-11-24 10:43               ` Peter Maydell
  2023-11-24 11:02                 ` Michael S. Tsirkin
@ 2023-11-24 11:37                 ` Daniel P. Berrangé
  2023-11-24 11:39                   ` Michael S. Tsirkin
  1 sibling, 1 reply; 57+ messages in thread
From: Daniel P. Berrangé @ 2023-11-24 11:37 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Michael S. Tsirkin, Alex Bennée, Kevin Wolf, qemu-devel,
	Richard Henderson, Alexander Graf, Paolo Bonzini,
	Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi,
	Thomas Huth, Gerd Hoffmann, Mark Cave-Ayland

On Fri, Nov 24, 2023 at 10:43:05AM +0000, Peter Maydell wrote:
> On Fri, 24 Nov 2023 at 10:42, Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Fri, Nov 24, 2023 at 10:33:49AM +0000, Alex Bennée wrote:
> > > That probably means we can never use even open source LLMs to generate
> > > code for QEMU because while the source data is all open source it won't
> > > necessarily be GPL compatible.
> >
> > I would probably wait until the dust settles before we start accepting
> > LLM generated code.
> 
> I think that's pretty much my take on what this policy is:
> "say no for now; we can always come back later when the legal
> situation seems clearer".

Yes, that was my thoughts exactly.

And if anyone comes along with a specific LLM/AI code generator that
they believe can be used in a way compatible with the DCO, they can
ask for an exception to the general policy which we can discuss then.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators
  2023-11-24 11:37                 ` Daniel P. Berrangé
@ 2023-11-24 11:39                   ` Michael S. Tsirkin
  2023-11-24 11:40                     ` Michael S. Tsirkin
  0 siblings, 1 reply; 57+ messages in thread
From: Michael S. Tsirkin @ 2023-11-24 11:39 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Peter Maydell, Alex Bennée, Kevin Wolf, qemu-devel,
	Richard Henderson, Alexander Graf, Paolo Bonzini,
	Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi,
	Thomas Huth, Gerd Hoffmann, Mark Cave-Ayland

On Fri, Nov 24, 2023 at 11:37:15AM +0000, Daniel P. Berrangé wrote:
> On Fri, Nov 24, 2023 at 10:43:05AM +0000, Peter Maydell wrote:
> > On Fri, 24 Nov 2023 at 10:42, Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Fri, Nov 24, 2023 at 10:33:49AM +0000, Alex Bennée wrote:
> > > > That probably means we can never use even open source LLMs to generate
> > > > code for QEMU because while the source data is all open source it won't
> > > > necessarily be GPL compatible.
> > >
> > > I would probably wait until the dust settles before we start accepting
> > > LLM generated code.
> > 
> > I think that's pretty much my take on what this policy is:
> > "say no for now; we can always come back later when the legal
> > situation seems clearer".
> 
> Yes, that was my thoughts exactly.
> 
> And if anyone comes along with a specific LLM/AI code generator that
> they believe can be used in a way compatible with the DCO, they can
> ask for an exception to the general policy which we can discuss then.

Yea. But why do you keep worrying about LLM/AI mess?  Are there code
generators whose output do allow? What are these?

-- 
MST



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators
  2023-11-24 11:39                   ` Michael S. Tsirkin
@ 2023-11-24 11:40                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 57+ messages in thread
From: Michael S. Tsirkin @ 2023-11-24 11:40 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Peter Maydell, Alex Bennée, Kevin Wolf, qemu-devel,
	Richard Henderson, Alexander Graf, Paolo Bonzini,
	Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi,
	Thomas Huth, Gerd Hoffmann, Mark Cave-Ayland

On Fri, Nov 24, 2023 at 06:39:21AM -0500, Michael S. Tsirkin wrote:
> On Fri, Nov 24, 2023 at 11:37:15AM +0000, Daniel P. Berrangé wrote:
> > On Fri, Nov 24, 2023 at 10:43:05AM +0000, Peter Maydell wrote:
> > > On Fri, 24 Nov 2023 at 10:42, Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Fri, Nov 24, 2023 at 10:33:49AM +0000, Alex Bennée wrote:
> > > > > That probably means we can never use even open source LLMs to generate
> > > > > code for QEMU because while the source data is all open source it won't
> > > > > necessarily be GPL compatible.
> > > >
> > > > I would probably wait until the dust settles before we start accepting
> > > > LLM generated code.
> > > 
> > > I think that's pretty much my take on what this policy is:
> > > "say no for now; we can always come back later when the legal
> > > situation seems clearer".
> > 
> > Yes, that was my thoughts exactly.
> > 
> > And if anyone comes along with a specific LLM/AI code generator that
> > they believe can be used in a way compatible with the DCO, they can
> > ask for an exception to the general policy which we can discuss then.
> 
> Yea. But why do you keep worrying about LLM/AI mess?  Are there code
> generators whose output do allow? What are these?

And to clarify I mean source code in the GPL sense so please do not
say "compiler".

-- 
MST



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators
  2023-11-24 10:21           ` Alex Bennée
  2023-11-24 10:30             ` Michael S. Tsirkin
@ 2023-11-24 11:41             ` Daniel P. Berrangé
  1 sibling, 0 replies; 57+ messages in thread
From: Daniel P. Berrangé @ 2023-11-24 11:41 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Michael S. Tsirkin, qemu-devel, Richard Henderson, Alexander Graf,
	Paolo Bonzini, Markus Armbruster, Phil Mathieu-Daudé,
	Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann,
	Mark Cave-Ayland, Peter Maydell

On Fri, Nov 24, 2023 at 10:21:17AM +0000, Alex Bennée wrote:
> Daniel P. Berrangé <berrange@redhat.com> writes:
> 
> > On Thu, Nov 23, 2023 at 05:39:18PM -0500, Michael S. Tsirkin wrote:
> >> On Thu, Nov 23, 2023 at 05:58:45PM +0000, Daniel P. Berrangé wrote:
> >> > The license of a code generation tool itself is usually considered
> >> > to be not a factor in the license of its output.
> >> 
> >> Really? I would find it very surprising if a code generation tool that
> >> is not a language model and so is not understanding the code it's
> >> generating did not include some code snippets going into the output.
> >> It is also possible to unintentionally run afoul of GPL's definition of source
> >> code which is "the preferred form of the work for making modifications to it". 
> >> So even if you have copyright to input, dumping just output and putting
> >> GPL on it might or might not be ok.
> >
> > Consider the C pre-processor. This takes an input .c file, and expands
> > all the macros, to split out a new .c file.
> >
> > The license of the output .c file is determined by the license of the
> > input .c file. The license of the CPP impl (whether OSS or proprietary)
> > doesn't have any influence on the license of the output file, it cannot
> > magically force the output file to be proprietary any more than it can
> > force it to be output file GPL.
> 
> LLM's are just a tool like a compiler (albeit with spookier different
> internals). The prompt and the instructions are arguably the more
> important part of how to get good results from the LLM transformation.
> In fact most of the way I've been using them has been by pasting some
> existing code and asking for review or transformation of it.
> 
> However I totally get that using the various online LLMs you have very
> little transparency about what has gone into their training and therefor
> there is a danger of proprietary code being hallucinated out of their
> matricies. Conversely what if I use an LLM like OpenLLaMa:
> 
>   https://github.com/openlm-research/open_llama
> 
> I have fairly exhaustive definitions of what went into the training data
> which of most interest is probably the StarCoder dataset (paper):
> 
>   https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view
> 
> where there are tools to detect if generated code has been lifted
> directly from the dataset or is indeed a transformation.

I've not looked at the links above, but I think if someone can make an
compelling argument that *specific* tools have sufficient transparency
to be compatible with signing the DCO, then I think we could maintain a
list of exceptions in the policy.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off
  2023-11-23 11:40 ` [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off Daniel P. Berrangé
                     ` (4 preceding siblings ...)
  2023-11-23 15:13   ` Stefan Hajnoczi
@ 2024-01-27 14:36   ` Zhao Liu
  2024-01-29  9:31     ` Daniel P. Berrangé
  5 siblings, 1 reply; 57+ messages in thread
From: Zhao Liu @ 2024-01-27 14:36 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Richard Henderson, Alexander Graf, Alex Bennée,
	Paolo Bonzini, Michael S. Tsirkin, Markus Armbruster,
	Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Kevin Wolf,
	Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell

Hi Daniel,

On Thu, Nov 23, 2023 at 11:40:25AM +0000, Daniel P. Berrangé wrote:
> Date: Thu, 23 Nov 2023 11:40:25 +0000
> From: "Daniel P. Berrangé" <berrange@redhat.com>
> Subject: [PATCH 1/2] docs: introduce dedicated page about code provenance /
>  sign-off
> 
> Currently we have a short paragraph saying that patches must include
> a Signed-off-by line, and merely link to the kernel documentation.
> The linked kernel docs have alot of content beyond the part about
> sign-off an thus is misleading/distracting to QEMU contributors.
> 
> This introduces a dedicated 'code-provenance' page in QEMU talking
> about why we require sign-off, explaining the other tags we commonly
> use, and what to do in some edge cases.
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>  docs/devel/code-provenance.rst    | 197 ++++++++++++++++++++++++++++++
>  docs/devel/index-process.rst      |   1 +
>  docs/devel/submitting-a-patch.rst |  18 +--
>  3 files changed, 201 insertions(+), 15 deletions(-)
>  create mode 100644 docs/devel/code-provenance.rst
> 
> diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
> new file mode 100644
> index 0000000000..b4591a2dec
> --- /dev/null
> +++ b/docs/devel/code-provenance.rst
> @@ -0,0 +1,197 @@
> +.. _code-provenance:
> +
> +Code provenance
> +===============
> +
> +Certifying patch submissions
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The QEMU community **mandates** all contributors to certify provenance
> +of patch submissions they make to the project. To put it another way,
> +contributors must indicate that they are legally permitted to contribute
> +to the project.
> +
> +Certification is achieved with a low overhead by adding a single line
> +to the bottom of every git commit::
> +
> +   Signed-off-by: YOUR NAME <YOUR@EMAIL>
> +
> +This existence of this line asserts that the author of the patch is
> +contributing in accordance with the `Developer's Certificate of
> +Origin <https://developercertifcate.org>`__:
> +
> +.. _dco:
> +
> +::
> +  Developer's Certificate of Origin 1.1
> +
> +  By making a contribution to this project, I certify that:
> +
> +  (a) The contribution was created in whole or in part by me and I
> +      have the right to submit it under the open source license
> +      indicated in the file; or
> +
> +  (b) The contribution is based upon previous work that, to the best
> +      of my knowledge, is covered under an appropriate open source
> +      license and I have the right under that license to submit that
> +      work with modifications, whether created in whole or in part
> +      by me, under the same open source license (unless I am
> +      permitted to submit under a different license), as indicated
> +      in the file; or
> +
> +  (c) The contribution was provided directly to me by some other
> +      person who certified (a), (b) or (c) and I have not modified
> +      it.
> +
> +  (d) I understand and agree that this project and the contribution
> +      are public and that a record of the contribution (including all
> +      personal information I submit with it, including my sign-off) is
> +      maintained indefinitely and may be redistributed consistent with
> +      this project or the open source license(s) involved.
> +
> +It is generally expected that the name and email addresses used in one
> +of the ``Signed-off-by`` lines, matches that of the git commit ``Author``
> +field. If the person sending the mail is also one of the patch authors,
> +it is further expected that the mail ``From:`` line name & address match
> +one of the ``Signed-off-by`` lines. 
> +
> +Multiple authorship
> +~~~~~~~~~~~~~~~~~~~
> +
> +It is not uncommon for a patch to have contributions from multiple
> +authors. In such a scenario, a git commit will usually be expected
> +to have a ``Signed-off-by`` line for each contributor involved in
> +creatin of the patch. Some edge cases:
> +
> +  * The non-primary author's contributions were so trivial that
> +    they can be considered not subject to copyright. In this case
> +    the secondary authors need not include a ``Signed-off-by``.
> +
> +    This case most commonly applies where QEMU reviewers give short
> +    snippets of code as suggested fixes to a patch. The reviewers
> +    don't need to have their own ``Signed-off-by`` added unless
> +    their code suggestion was unusually large.
> +
> +  * Both contributors work for the same employer and the employer
> +    requires copyright assignment.
> +
> +    It can be said that in this case a ``Signed-off-by`` is indicating
> +    that the person has permission to contributeo from their employer
> +    who is the copyright holder. 

For this case, maybe it needs the "Co-developed-by"?

> It is none the less still preferrable
> +    to include a ``Signed-off-by`` for each contributor, as in some
> +    countries employees are not able to assign copyright to their
> +    employer, and it also covers any time invested outside working
> +    hours.
> +
> +Other commit tags
> +~~~~~~~~~~~~~~~~~
> +
> +While the ``Signed-off-by`` tag is mandatory, there are a number of
> +other tags that are commonly used during QEMU development
> +
> + * **``Reviewed-by``**: when a QEMU community member reviews a patch
> +   on the mailing list, if they consider the patch acceptable, they
> +   should send an email reply containing a ``Reviewed-by`` tag.

Maybe just a question, the people should drop the Reviewed/ACKed/Tested
tags that have been obtained if he make the any code changes (including
function/variable renaming) as well as commit message changes during
the patch refresh process, am I understand correctly? ;-)

> +
> +   NB: a subsystem maintainer sending a pull request would replace
> +   their own ``Reviewed-by`` with another ``Signed-off-by``
> +
> + * **``Acked-by``**: when a QEMU subsystem maintainer approves a patch
> +   that touches their subsystem, but intends to allow a different
> +   maintainer to queue it and send a pull request, they would send
> +   a mail containing a ``Acked-by`` tag.
> +   
> + * **``Tested-by``**: when a QEMU community member has functionally
> +   tested the behaviour of the patch in some manner, they should
> +   send an email reply conmtaning a ``Tested-by`` tag.

Is there any requirement for the order of tags?

My previous understanding was that if the Reviewed-by/Tested-by tags
were obtained by the author within his company, then those tags should
be placed before the signed-off-by of the author. If the Reviewed-by/
Tested-by were acquired in the community, then they should be placed
after the author's signed-off-by, right?

> +
> + * **``Reported-by``**: when a QEMU community member reports a problem
> +   via the mailing list, or some other informal channel that is not
> +   the issue tracker, it is good practice to credit them by including
> +   a ``Reported-by`` tag on any patch fixing the issue. When the
> +   problem is reported via the GitLab issue tracker, however, it is
> +   sufficient to just include a link to the issue.
> +
> +Subsystem maintainer requirements
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +When a subsystem maintainer accepts a patch from a contributor, in
> +addition to the normal code review points, they are expected to validate
> +the presence of suitable ``Signed-off-by`` tags.
> +
> +At the time they queue the patch in their subsystem tree, the maintainer
> +**MUST** also then add their own ``Signed-off-by`` to indicate that they
> +have done the aforementioned validation.
> +
> +The subsystem maintainer submitting a pull request is **NOT** expected to
> +have a ``Reviewed-by`` tag on the patch, since this is implied by their
> +own ``Signed-off-by``.
> +  
> +Tools for adding ``Signed-of-by``
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +There are a variety of ways tools can support adding ``Signed-off-by``
> +tags for patches, avoiding the need for contributors to manually
> +type in this repetitive text each time.
> +
> +git commands
> +^^^^^^^^^^^^
> +
> +When creating, or amending, a commit the ``-s`` flag to ``git commit``
> +will append a suitable line matching the configuring git author
> +details.
> +
> +If preparing patches using the ``git format-patch`` tool, the ``-s``
> +flag can be used to append a suitable line in the emails it creates,
> +without modifying the local commits. Alternatively to modify the
> +local commits on a branch en-mass::
> +
> +  git rebase master -x 'git commit --amend --no-edit -s'
> +
> +emacs
> +^^^^^
> +
> +In the file ``$HOME/.emacs.d/abbrev_defs`` add::
> +
> +  (define-abbrev-table 'global-abbrev-table
> +    '(
> +      ("8rev" "Reviewed-by: YOUR NAME <your@email.addr>" nil 1)
> +      ("8ack" "Acked-by: YOUR NAME <your@email.addr>" nil 1)
> +      ("8test" "Tested-by: YOUR NAME <your@email.addr>" nil 1)
> +      ("8sob" "Signed-off-by: YOUR NAME <your@email.addr>" nil 1)
> +     ))
> +
> +with this change, if you type (for example) ``8rev`` followed
> +by ``<space>`` or ``<enter>`` it will expand to the whole phrase. 
> +
> +vim
> +^^^
> +
> +In the file ``$HOME/.vimrc`` add::
> +
> +  iabbrev 8rev Reviewed-by: YOUR NAME <your@email.addr>
> +  iabbrev 8ack Acked-by: YOUR NAME <your@email.addr>
> +  iabbrev 8test Tested-by: YOUR NAME <your@email.addr>
> +  iabbrev 8sob Signed-off-by: YOUR NAME <your@email.addr>
> +
> +with this change, if you type (for example) ``8rev`` followed
> +by ``<space>`` or ``<enter>`` it will expand to the whole phrase. 
> +
> +Re-starting abandoned work
> +~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +For a variety of reasons there are some patches that get submitted to
> +QEMU but never merged. An unrelated contributor may decide (months or
> +years later) to continue working from the abandoned patch and re-submit
> +it with extra changes.
> +
> +If the abandoned patch already had a ``Signed-off-by`` from the original
> +author this **must** be preserved.

I find some people added Originally-by, e.g., 8e86851bd6b9.

I guess if the code has been changed very significantly, or if the
original implementation has just been referenced and significantly
refactored, then Originally-by should be preferred instead of
Signed-off-by from the original author, right?

Thanks,
Zhao

> The new contributor **must** then add
> +their own ``Signed-off-by`` after the original one if they made any
> +further changes to it. It is common to include a comment just prior to
> +the new ``Signed-off-by`` indicating what extra changes were made. For
> +example::
> +
> +  Signed-off-by: Some Person <some.person@example.com>
> +  [Rebased and added support for 'foo']
> +  Signed-off-by: New Person <new.person@example.com>
> diff --git a/docs/devel/index-process.rst b/docs/devel/index-process.rst
> index 362f97ee30..b54e58105e 100644
> --- a/docs/devel/index-process.rst
> +++ b/docs/devel/index-process.rst
> @@ -13,6 +13,7 @@ Notes about how to interact with the community and how and where to submit patch
>     maintainers
>     style
>     submitting-a-patch
> +   code-provenance
>     trivial-patches
>     stable-process
>     submitting-a-pull-request
> diff --git a/docs/devel/submitting-a-patch.rst b/docs/devel/submitting-a-patch.rst
> index c641d948f1..ec541b3d15 100644
> --- a/docs/devel/submitting-a-patch.rst
> +++ b/docs/devel/submitting-a-patch.rst
> @@ -322,21 +322,9 @@ Patch emails must include a ``Signed-off-by:`` line
>  
>  Your patches **must** include a Signed-off-by: line. This is a hard
>  requirement because it's how you say "I'm legally okay to contribute
> -this and happy for it to go into QEMU". The process is modelled after
> -the `Linux kernel
> -<http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches?id=f6f94e2ab1b33f0082ac22d71f66385a60d8157f#n297>`__
> -policy.
> -
> -If you wrote the patch, make sure your "From:" and "Signed-off-by:"
> -lines use the same spelling. It's okay if you subscribe or contribute to
> -the list via more than one address, but using multiple addresses in one
> -commit just confuses things. If someone else wrote the patch, git will
> -include a "From:" line in the body of the email (different from your
> -envelope From:) that will give credit to the correct author; but again,
> -that author's Signed-off-by: line is mandatory, with the same spelling.
> -
> -There are various tooling options for automatically adding these tags
> -include using ``git commit -s`` or ``git format-patch -s``. For more
> +this and happy for it to go into QEMU". For full guidance, read the
> +:ref:`code-provenance` documentation.
> +
>  information see `SubmittingPatches 1.12
>  <http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches?id=f6f94e2ab1b33f0082ac22d71f66385a60d8157f#n297>`__.
>  
> -- 
> 2.41.0
> 
> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off
  2024-01-27 14:36   ` Zhao Liu
@ 2024-01-29  9:31     ` Daniel P. Berrangé
  2024-01-29  9:35       ` Samuel Tardieu
  0 siblings, 1 reply; 57+ messages in thread
From: Daniel P. Berrangé @ 2024-01-29  9:31 UTC (permalink / raw)
  To: Zhao Liu
  Cc: qemu-devel, Richard Henderson, Alexander Graf, Alex Bennée,
	Paolo Bonzini, Michael S. Tsirkin, Markus Armbruster,
	Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Kevin Wolf,
	Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell

On Sat, Jan 27, 2024 at 10:36:24PM +0800, Zhao Liu wrote:
> Hi Daniel,
> 
> On Thu, Nov 23, 2023 at 11:40:25AM +0000, Daniel P. Berrangé wrote:
> > +Multiple authorship
> > +~~~~~~~~~~~~~~~~~~~
> > +
> > +It is not uncommon for a patch to have contributions from multiple
> > +authors. In such a scenario, a git commit will usually be expected
> > +to have a ``Signed-off-by`` line for each contributor involved in
> > +creatin of the patch. Some edge cases:
> > +
> > +  * The non-primary author's contributions were so trivial that
> > +    they can be considered not subject to copyright. In this case
> > +    the secondary authors need not include a ``Signed-off-by``.
> > +
> > +    This case most commonly applies where QEMU reviewers give short
> > +    snippets of code as suggested fixes to a patch. The reviewers
> > +    don't need to have their own ``Signed-off-by`` added unless
> > +    their code suggestion was unusually large.
> > +
> > +  * Both contributors work for the same employer and the employer
> > +    requires copyright assignment.
> > +
> > +    It can be said that in this case a ``Signed-off-by`` is indicating
> > +    that the person has permission to contributeo from their employer
> > +    who is the copyright holder. 
> 
> For this case, maybe it needs the "Co-developed-by"?

If you're going to go to the trouble of adding multiple tags
to the commit for each author who participated, then IMHO they
should all be Signed-off-by. IOW, either just have S-o-B from
the main author within a company, or have S-o-B for every
author. Co-developed-by doesn't have value IMHO.

> > It is none the less still preferrable
> > +    to include a ``Signed-off-by`` for each contributor, as in some
> > +    countries employees are not able to assign copyright to their
> > +    employer, and it also covers any time invested outside working
> > +    hours.
> > +
> > +Other commit tags
> > +~~~~~~~~~~~~~~~~~
> > +
> > +While the ``Signed-off-by`` tag is mandatory, there are a number of
> > +other tags that are commonly used during QEMU development
> > +
> > + * **``Reviewed-by``**: when a QEMU community member reviews a patch
> > +   on the mailing list, if they consider the patch acceptable, they
> > +   should send an email reply containing a ``Reviewed-by`` tag.
> 
> Maybe just a question, the people should drop the Reviewed/ACKed/Tested
> tags that have been obtained if he make the any code changes (including
> function/variable renaming) as well as commit message changes during
> the patch refresh process, am I understand correctly? ;-)

It is a judgement call as to whether a Reviewed-by/etc should be
kept or dropped. It depends on the scale of the changes that
were made to the commit since the Reviewed-by/etc was first given.

> > +   NB: a subsystem maintainer sending a pull request would replace
> > +   their own ``Reviewed-by`` with another ``Signed-off-by``
> > +
> > + * **``Acked-by``**: when a QEMU subsystem maintainer approves a patch
> > +   that touches their subsystem, but intends to allow a different
> > +   maintainer to queue it and send a pull request, they would send
> > +   a mail containing a ``Acked-by`` tag.
> > +   
> > + * **``Tested-by``**: when a QEMU community member has functionally
> > +   tested the behaviour of the patch in some manner, they should
> > +   send an email reply conmtaning a ``Tested-by`` tag.
> 
> Is there any requirement for the order of tags?
> 
> My previous understanding was that if the Reviewed-by/Tested-by tags
> were obtained by the author within his company, then those tags should
> be placed before the signed-off-by of the author. If the Reviewed-by/
> Tested-by were acquired in the community, then they should be placed
> after the author's signed-off-by, right?

Common practice is for Signed-off-by tags to be kept in time order
from earliest author to latest author / maintainer. Common case is
2 S-o-B, the first from the patch author, and the last from the
sub-system maintainer who sends the pull request.

For other tags I don't see any broadly acceptable pattern. Some people
add Reviewed-by before the S-o-B, others add Reviewed-by after the
S-o-B. Either is fine IMHO.


> > +Re-starting abandoned work
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +For a variety of reasons there are some patches that get submitted to
> > +QEMU but never merged. An unrelated contributor may decide (months or
> > +years later) to continue working from the abandoned patch and re-submit
> > +it with extra changes.
> > +
> > +If the abandoned patch already had a ``Signed-off-by`` from the original
> > +author this **must** be preserved.
> 
> I find some people added Originally-by, e.g., 8e86851bd6b9.
> 
> I guess if the code has been changed very significantly, or if the
> original implementation has just been referenced and significantly
> refactored, then Originally-by should be preferred instead of
> Signed-off-by from the original author, right?

If patch submitted still contains any code that can be considered
copyrightable (ie anything non-trivial) from the original author,
then I would expect the original authors Signed-off-by to be retained.

I think the cases where it is ok to use Originally-by, without a
Signed-off-by, would be exceedingly.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off
  2024-01-29  9:31     ` Daniel P. Berrangé
@ 2024-01-29  9:35       ` Samuel Tardieu
  2024-01-29 10:41         ` Peter Maydell
  0 siblings, 1 reply; 57+ messages in thread
From: Samuel Tardieu @ 2024-01-29  9:35 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Zhao Liu, Richard Henderson, Alexander Graf, Alex Bennée,
	Paolo Bonzini, Michael S. Tsirkin, Markus Armbruster,
	Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Kevin Wolf,
	Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell, qemu-devel


Daniel P. Berrangé <berrange@redhat.com> writes:

>> Is there any requirement for the order of tags?
>> 
>> My previous understanding was that if the Reviewed-by/Tested-by 
>> tags
>> were obtained by the author within his company, then those tags 
>> should
>> be placed before the signed-off-by of the author. If the 
>> Reviewed-by/
>> Tested-by were acquired in the community, then they should be 
>> placed
>> after the author's signed-off-by, right?
>
> Common practice is for Signed-off-by tags to be kept in time 
> order
> from earliest author to latest author / maintainer. Common case 
> is
> 2 S-o-B, the first from the patch author, and the last from the
> sub-system maintainer who sends the pull request.
>
> For other tags I don't see any broadly acceptable pattern. Some 
> people
> add Reviewed-by before the S-o-B, others add Reviewed-by after 
> the
> S-o-B. Either is fine IMHO.

From what I've seen in other projects, S-o-B means that you accept 
accountability for everything above. One scenario would be:

- Send original patch, which has been tested inside the company:

  Tested-by: Tester <tester@example.com>
  Signed-off-by: Developper <developper@example.com>

- Get some R-b, but need to make some requested minor changes and 
  resend a new patch series:

  Tested-by: Tester <tester@example.com>
  Reviewed-by: Reviewer <reviewer@othercompany.com>
  Signed-off-by: Developper <developper@example.com>

  This is a way of saying "I guarantee that the R-b still applies 
  after the new changes I made to this series"

- Then reviewed and pulled into their tree by the maintainer:

  Tested-by: Tester <tester@example.com>
  Reviewed-by: Reviewer <reviewer@othercompany.com>
  Signed-off-by: Developper <developper@example.com>
  Reviewed-by: Maintainer <maintainer@org.org>
  Signed-off-by: Maintainer <maintainer@org.org>

If, after being reviewed, the initial patch would not have needed 
any change, the order would have been:

  Tested-by: Tester <tester@example.com>
  Signed-off-by: Developper <developper@example.com>
  Reviewed-by: Reviewer <reviewer@othercompany.com>
  Reviewed-by: Maintainer <maintainer@org.org>
  Signed-off-by: Maintainer <maintainer@org.org>

This is consistent with what software like "b4" do: if the S-o of 
the current user is present, it is moved last, as the current user 
is the one accepting accountability at this point.

However, this is not what QEMU has been using as far as I can see, 
as S-o-b tend to stay in their original positions. I even opened 
an issue on b4 a few weeks ago because of this 
<https://github.com/mricon/b4/issues/16>, and I reverted to using 
git-publish. But if this is ok to use an arbitrary order for 
non-S-o-b headers, I can get back to b4.

  Sam
-- 
Samuel Tardieu


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off
  2024-01-29  9:35       ` Samuel Tardieu
@ 2024-01-29 10:41         ` Peter Maydell
  2024-01-29 11:00           ` Daniel P. Berrangé
  0 siblings, 1 reply; 57+ messages in thread
From: Peter Maydell @ 2024-01-29 10:41 UTC (permalink / raw)
  To: Samuel Tardieu
  Cc: Daniel P. Berrangé, Zhao Liu, Richard Henderson,
	Alexander Graf, Alex Bennée, Paolo Bonzini,
	Michael S. Tsirkin, Markus Armbruster, Phil Mathieu-Daudé,
	Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann,
	Mark Cave-Ayland, qemu-devel

On Mon, 29 Jan 2024 at 09:47, Samuel Tardieu <sam@rfc1149.net> wrote:
> However, this is not what QEMU has been using as far as I can see,
> as S-o-b tend to stay in their original positions. I even opened
> an issue on b4 a few weeks ago because of this
> <https://github.com/mricon/b4/issues/16>, and I reverted to using
> git-publish. But if this is ok to use an arbitrary order for
> non-S-o-b headers, I can get back to b4.

I think QEMU doesn't have a specific existing practice here.
What you see is largely the result of people using whatever
tooling they have and accepting the ordering it gives them.
So I don't think you should stop using b4 just because
the ordering it happens to produce isn't the same as
somebody else's tooling.

I think trying to impose some subtle distinction of meaning
on the ordering of tags is not going to work, because there
are going to be too many cases where people don't adhere
to the ordering distinction because they don't know about
it or don't understand it.

As Daniel says, as long as the Signed-off-by tags are
in basically the right order for developer vs maintainer
that's the only strong ordering constraint we have.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off
  2024-01-29 10:41         ` Peter Maydell
@ 2024-01-29 11:00           ` Daniel P. Berrangé
  0 siblings, 0 replies; 57+ messages in thread
From: Daniel P. Berrangé @ 2024-01-29 11:00 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Samuel Tardieu, Zhao Liu, Richard Henderson, Alexander Graf,
	Alex Bennée, Paolo Bonzini, Michael S. Tsirkin,
	Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi,
	Thomas Huth, Kevin Wolf, Gerd Hoffmann, Mark Cave-Ayland,
	qemu-devel

On Mon, Jan 29, 2024 at 10:41:38AM +0000, Peter Maydell wrote:
> On Mon, 29 Jan 2024 at 09:47, Samuel Tardieu <sam@rfc1149.net> wrote:
> > However, this is not what QEMU has been using as far as I can see,
> > as S-o-b tend to stay in their original positions. I even opened
> > an issue on b4 a few weeks ago because of this
> > <https://github.com/mricon/b4/issues/16>, and I reverted to using
> > git-publish. But if this is ok to use an arbitrary order for
> > non-S-o-b headers, I can get back to b4.
> 
> I think QEMU doesn't have a specific existing practice here.
> What you see is largely the result of people using whatever
> tooling they have and accepting the ordering it gives them.
> So I don't think you should stop using b4 just because
> the ordering it happens to produce isn't the same as
> somebody else's tooling.
> 
> I think trying to impose some subtle distinction of meaning
> on the ordering of tags is not going to work, because there
> are going to be too many cases where people don't adhere
> to the ordering distinction because they don't know about
> it or don't understand it.
> 
> As Daniel says, as long as the Signed-off-by tags are
> in basically the right order for developer vs maintainer
> that's the only strong ordering constraint we have.

To think of it another way....

Signed-off-by is the only tag which has defined legal meaning
in terms of asserting that the people involved have permission
to contribute.

All the other tags (Reviewed/Tested/etc) are merely a historical
record of the development process, and have no legal implications.

This makes Signed-off-by the important one, and the others all
in the "nice to have" category.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2024-01-29 11:01 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-23 11:40 [PATCH 0/2] docs: define policy forbidding use of "AI" / LLM code generators Daniel P. Berrangé
2023-11-23 11:40 ` [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off Daniel P. Berrangé
2023-11-23 11:58   ` Philippe Mathieu-Daudé
2023-11-23 17:08     ` Daniel P. Berrangé
2023-11-23 23:56       ` Michael S. Tsirkin
2023-11-23 13:01   ` Peter Maydell
2023-11-23 17:12     ` Daniel P. Berrangé
2023-11-23 13:16   ` Kevin Wolf
2023-11-23 17:12     ` Daniel P. Berrangé
2023-11-23 14:25   ` Michael S. Tsirkin
2023-11-23 17:16     ` Daniel P. Berrangé
2023-11-23 17:33       ` Michael S. Tsirkin
2023-11-24 11:11         ` Philippe Mathieu-Daudé
2023-11-24 11:27           ` Michael S. Tsirkin
2023-11-24  9:49       ` Kevin Wolf
2023-11-23 15:13   ` Stefan Hajnoczi
2024-01-27 14:36   ` Zhao Liu
2024-01-29  9:31     ` Daniel P. Berrangé
2024-01-29  9:35       ` Samuel Tardieu
2024-01-29 10:41         ` Peter Maydell
2024-01-29 11:00           ` Daniel P. Berrangé
2023-11-23 11:40 ` [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators Daniel P. Berrangé
2023-11-23 12:57   ` Alex Bennée
2023-11-23 17:37     ` Michal Suchánek
2023-11-23 23:27       ` Michael S. Tsirkin
2023-11-23 17:46     ` Daniel P. Berrangé
2023-11-23 23:53       ` Michael S. Tsirkin
2023-11-24 10:17         ` Kevin Wolf
2023-11-24 10:33           ` Alex Bennée
2023-11-24 10:42             ` Michael S. Tsirkin
2023-11-24 10:43               ` Peter Maydell
2023-11-24 11:02                 ` Michael S. Tsirkin
2023-11-24 11:37                 ` Daniel P. Berrangé
2023-11-24 11:39                   ` Michael S. Tsirkin
2023-11-24 11:40                     ` Michael S. Tsirkin
2023-11-23 13:20   ` Kevin Wolf
2023-11-23 14:35   ` Michael S. Tsirkin
2023-11-23 14:56     ` Manos Pitsidianakis
2023-11-23 15:13       ` Michael S. Tsirkin
2023-11-23 15:29       ` Philippe Mathieu-Daudé
2023-11-23 17:06         ` Michael S. Tsirkin
2023-11-23 17:29           ` Michal Suchánek
2023-11-23 18:05             ` Michael S. Tsirkin
2023-11-23 15:32       ` Alex Bennée
2023-11-23 18:02       ` Daniel P. Berrangé
2023-11-23 18:10         ` Peter Maydell
2023-11-24 10:25       ` Kevin Wolf
2023-11-24 10:37         ` Michael S. Tsirkin
2023-11-24 10:42         ` Manos Pitsidianakis
2023-11-23 17:58     ` Daniel P. Berrangé
2023-11-23 22:39       ` Michael S. Tsirkin
2023-11-24  9:06         ` Daniel P. Berrangé
2023-11-24  9:27           ` Michael S. Tsirkin
2023-11-24 10:21           ` Alex Bennée
2023-11-24 10:30             ` Michael S. Tsirkin
2023-11-24 11:41             ` Daniel P. Berrangé
2023-11-23 15:22   ` Stefan Hajnoczi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).