[PATCH v2 0/3] docs: define policy forbidding use of "AI" / LLM code generators

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/3] docs: define policy forbidding use of "AI" / LLM code generators
@ 2024-05-16 16:22 Daniel P. Berrangé
  2024-05-16 16:22 ` [PATCH v2 1/3] docs: introduce dedicated page about code provenance / sign-off Daniel P. Berrangé
                   ` (5 more replies)
  0 siblings, 6 replies; 23+ messages in thread
From: Daniel P. Berrangé @ 2024-05-16 16:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, Alex Bennée, Michael S. Tsirkin, Gerd Hoffmann,
	Mark Cave-Ayland, Philippe Mathieu-Daudé, Kevin Wolf,
	Daniel P. Berrangé, Stefan Hajnoczi, Alexander Graf,
	Paolo Bonzini, Richard Henderson, Peter Maydell,
	Markus Armbruster

This patch kicks the hornet's nest of AI / LLM code generators.

With the increasing interest in code generators in recent times,
it is inevitable that QEMU contributions will include AI generated
code. Thus far we have remained silent on the matter. Given that
everyone knows these tools exist, our current position has to be
considered tacit acceptance of the use of AI generated code in QEMU.

The question for the project is whether that is a good position for
QEMU to take or not ?

IANAL, but I like to think I'm reasonably proficient at understanding
open source licensing. I am not inherantly against the use of AI tools,
rather I am anti-risk. I also want to see OSS licenses respected and
complied with.

AFAICT at its current state of (im)maturity the question of licensing
of AI code generator output does not have a broadly accepted / settled
legal position. This is an inherant bias/self-interest from the vendors
promoting their usage, who tend to minimize/dismiss the legal questions.
From my POV, this puts such tools in a position of elevated legal risk.

Given the fuzziness over the legal position of generated code from
such tools, I don't consider it credible (today) for a contributor
to assert compliance with the DCO terms (b) or (c) (which is a stated
pre-requisite for QEMU accepting patches) when a patch includes (or is
derived from) AI generated code.

By implication, I think that QEMU must (for now) explicitly decline
to (knowingly) accept AI generated code.

Perhaps a few years down the line the legal uncertainty will have
reduced and we can re-evaluate this policy.

Discuss...

Changes in v2:

 * Fix a huge number of typos in docs
 * Clarify that maintainers should still add R-b where relevant, even
   if they are already adding their own S-oB.
 * Clarify situation when contributor re-starts previously abandoned
   work from another contributor.
 * Add info about Suggested-by tag
 * Add new docs section dealing with the broad topic of "generated
   files" (whether code generators or compilers)
 * Simplify the section related to prohibition of AI generated files
   and give further examples of tools considered covered
 * Remove repeated references to "LLM" as a specific technology, just
   use the broad "AI" term, except for one use of LLM as an example.
 * Add note that the policy may evolve if the legal clarity improves
 * Add note that exceptions can be requested on case-by-case basis
   if contributor thinks they can demonstrate a credible copyright
   and licensing status

Daniel P. Berrangé (3):
  docs: introduce dedicated page about code provenance / sign-off
  docs: define policy limiting the inclusion of generated files
  docs: define policy forbidding use of AI code generators

 docs/devel/code-provenance.rst    | 315 ++++++++++++++++++++++++++++++
 docs/devel/index-process.rst      |   1 +
 docs/devel/submitting-a-patch.rst |  19 +-
 3 files changed, 318 insertions(+), 17 deletions(-)
 create mode 100644 docs/devel/code-provenance.rst

-- 
2.43.0

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 1/3] docs: introduce dedicated page about code provenance / sign-off
  2024-05-16 16:22 [PATCH v2 0/3] docs: define policy forbidding use of "AI" / LLM code generators Daniel P. Berrangé
@ 2024-05-16 16:22 ` Daniel P. Berrangé
  2024-05-16 17:29   ` Peter Maydell
                     ` (2 more replies)
  2024-05-16 16:22 ` [PATCH v2 2/3] docs: define policy limiting the inclusion of generated files Daniel P. Berrangé
                   ` (4 subsequent siblings)
  5 siblings, 3 replies; 23+ messages in thread
From: Daniel P. Berrangé @ 2024-05-16 16:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, Alex Bennée, Michael S. Tsirkin, Gerd Hoffmann,
	Mark Cave-Ayland, Philippe Mathieu-Daudé, Kevin Wolf,
	Daniel P. Berrangé, Stefan Hajnoczi, Alexander Graf,
	Paolo Bonzini, Richard Henderson, Peter Maydell,
	Markus Armbruster

Currently we have a short paragraph saying that patches must include
a Signed-off-by line, and merely link to the kernel documentation.
The linked kernel docs have a lot of content beyond the part about
sign-off an thus are misleading/distracting to QEMU contributors.

This introduces a dedicated 'code-provenance' page in QEMU talking
about why we require sign-off, explaining the other tags we commonly
use, and what to do in some edge cases.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 docs/devel/code-provenance.rst    | 212 ++++++++++++++++++++++++++++++
 docs/devel/index-process.rst      |   1 +
 docs/devel/submitting-a-patch.rst |  19 +--
 3 files changed, 215 insertions(+), 17 deletions(-)
 create mode 100644 docs/devel/code-provenance.rst

diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
new file mode 100644
index 0000000000..7c42fae571
--- /dev/null
+++ b/docs/devel/code-provenance.rst
@@ -0,0 +1,212 @@
+.. _code-provenance:
+
+Code provenance
+===============
+
+Certifying patch submissions
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The QEMU community **mandates** all contributors to certify provenance of
+patch submissions they make to the project. To put it another way,
+contributors must indicate that they are legally permitted to contribute to
+the project.
+
+Certification is achieved with a low overhead by adding a single line to the
+bottom of every git commit::
+
+   Signed-off-by: YOUR NAME <YOUR@EMAIL>
+
+The addition of this line asserts that the author of the patch is contributing
+in accordance with the clauses specified in the
+`Developer's Certificate of Origin <https://developercertificate.org>`__:
+
+.. _dco:
+
+::
+  Developer's Certificate of Origin 1.1
+
+  By making a contribution to this project, I certify that:
+
+  (a) The contribution was created in whole or in part by me and I
+      have the right to submit it under the open source license
+      indicated in the file; or
+
+  (b) The contribution is based upon previous work that, to the best
+      of my knowledge, is covered under an appropriate open source
+      license and I have the right under that license to submit that
+      work with modifications, whether created in whole or in part
+      by me, under the same open source license (unless I am
+      permitted to submit under a different license), as indicated
+      in the file; or
+
+  (c) The contribution was provided directly to me by some other
+      person who certified (a), (b) or (c) and I have not modified
+      it.
+
+  (d) I understand and agree that this project and the contribution
+      are public and that a record of the contribution (including all
+      personal information I submit with it, including my sign-off) is
+      maintained indefinitely and may be redistributed consistent with
+      this project or the open source license(s) involved.
+
+It is generally expected that the name and email addresses used in one of the
+``Signed-off-by`` lines, matches that of the git commit ``Author`` field.
+
+If the person sending the mail is not one of the patch authors, they are none
+the less expected to add their own ``Signed-off-by`` to comply with the DCO
+clause (c).
+
+Multiple authorship
+~~~~~~~~~~~~~~~~~~~
+
+It is not uncommon for a patch to have contributions from multiple authors. In
+this scenario, git commits will usually be expected to have a ``Signed-off-by``
+line for each contributor involved in creation of the patch. Some edge cases:
+
+  * The non-primary author's contributions were so trivial that they can be
+    considered not subject to copyright. In this case the secondary authors
+    need not include a ``Signed-off-by``.
+
+    This case most commonly applies where QEMU reviewers give short snippets
+    of code as suggested fixes to a patch. The reviewers don't need to have
+    their own ``Signed-off-by`` added unless their code suggestion was
+    unusually large, but it is common to add ``Suggested-by`` as a credit
+    for non-trivial code.
+
+  * Both contributors work for the same employer and the employer requires
+    copyright assignment.
+
+    It can be said that in this case a ``Signed-off-by`` is indicating that
+    the person has permission to contribute from their employer who is the
+    copyright holder. It is none the less still preferable to include a
+    ``Signed-off-by`` for each contributor, as in some countries employees are
+    not able to assign copyright to their employer, and it also covers any
+    time invested outside working hours.
+
+When multiple ``Signed-off-by`` tags are present, they should be strictly kept
+in order of authorship, from oldest to newest.
+
+Other commit tags
+~~~~~~~~~~~~~~~~~
+
+While the ``Signed-off-by`` tag is mandatory, there are a number of other tags
+that are commonly used during QEMU development:
+
+ * **``Reviewed-by``**: when a QEMU community member reviews a patch on the
+   mailing list, if they consider the patch acceptable, they should send an
+   email reply containing a ``Reviewed-by`` tag. Subsystem maintainers who
+   review a patch should add this even if they are also adding their
+   ``Signed-off-by`` to the same commit.
+
+ * **``Acked-by``**: when a QEMU subsystem maintainer approves a patch that
+   touches their subsystem, but intends to allow a different maintainer to
+   queue it and send a pull request, they would send a mail containing a
+   ``Acked-by`` tag. Where a patch touches multiple subsystems, ``Acked-by``
+   only implies review of the maintainers' own areas of responsibility. If a
+   maintainer wants to indicate they have done a full review they should use
+   a ``Reviewed-by`` tag.
+
+ * **``Tested-by``**: when a QEMU community member has functionally tested the
+   behaviour of the patch in some manner, they should send an email reply
+   containing a ``Tested-by`` tag.
+
+ * **``Reported-by``**: when a QEMU community member reports a problem via the
+   mailing list, or some other informal channel that is not the issue tracker,
+   it is good practice to credit them by including a ``Reported-by`` tag on
+   any patch fixing the issue. When the problem is reported via the GitLab
+   issue tracker, however, it is sufficient to just include a link to the
+   issue.
+
+ * **``Suggested-by``**: when a reviewer or other 3rd party makes non-trivial
+   suggestions for how to change a patch, it is good practice to credit them
+   by including a ``Suggested-by`` tag.
+
+Subsystem maintainer requirements
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+When a subsystem maintainer accepts a patch from a contributor, in addition to
+the normal code review points, they are expected to validate the presence of
+suitable ``Signed-off-by`` tags.
+
+At the time they queue the patch in their subsystem tree, the maintainer
+**must** also then add their own ``Signed-off-by`` to indicate that they have
+done the aforementioned validation. This is in addition to any of their own
+``Reviewed-by`` tags the subsystem maintainer may wish to include.
+
+Tools for adding ``Signed-off-by``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+There are a variety of ways tools can support adding ``Signed-off-by`` tags
+for patches, avoiding the need for contributors to manually type in this
+repetitive text each time.
+
+git commands
+^^^^^^^^^^^^
+
+When creating, or amending, a commit the ``-s`` flag to ``git commit`` will
+append a suitable line matching the configuring git author details.
+
+If preparing patches using the ``git format-patch`` tool, the ``-s`` flag can
+be used to append a suitable line in the emails it creates, without modifying
+the local commits. Alternatively to modify all the local commits on a branch::
+
+  git rebase master -x 'git commit --amend --no-edit -s'
+
+emacs
+^^^^^
+
+In the file ``$HOME/.emacs.d/abbrev_defs`` add::
+
+  (define-abbrev-table 'global-abbrev-table
+    '(
+      ("8rev" "Reviewed-by: YOUR NAME <your@email.addr>" nil 1)
+      ("8ack" "Acked-by: YOUR NAME <your@email.addr>" nil 1)
+      ("8test" "Tested-by: YOUR NAME <your@email.addr>" nil 1)
+      ("8sob" "Signed-off-by: YOUR NAME <your@email.addr>" nil 1)
+     ))
+
+with this change, if you type (for example) ``8rev`` followed by ``<space>``
+or ``<enter>`` it will expand to the whole phrase.
+
+vim
+^^^
+
+In the file ``$HOME/.vimrc`` add::
+
+  iabbrev 8rev Reviewed-by: YOUR NAME <your@email.addr>
+  iabbrev 8ack Acked-by: YOUR NAME <your@email.addr>
+  iabbrev 8test Tested-by: YOUR NAME <your@email.addr>
+  iabbrev 8sob Signed-off-by: YOUR NAME <your@email.addr>
+
+with this change, if you type (for example) ``8rev`` followed by ``<space>``
+or ``<enter>`` it will expand to the whole phrase.
+
+Re-starting abandoned work
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+For a variety of reasons there are some patches that get submitted to QEMU but
+never merged. An unrelated contributor may decide (months or years later) to
+continue working from the abandoned patch and re-submit it with extra changes.
+
+The general principles when picking up abandoned work are:
+
+ * Continue to credit the original author for their work, by maintaining their
+   original ``Signed-off-by``
+ * Indicate where the original patch was obtained from (mailing list, bug
+   tracker, author's git repo, etc) when sending it for review
+ * Acknowledge the extra work of the new contributor by including their
+   ``Signed-off-by`` in the patch in addition to the orignal author's
+ * Indicate who is responsible for what parts of the patch. This is typically
+   done via a note in the commit message, just prior to the new contributor's
+   ``Signed-off-by``::
+
+    Signed-off-by: Some Person <some.person@example.com>
+    [Rebased and added support for 'foo']
+    Signed-off-by: New Person <new.person@mycorp.test>
+
+In complicated cases, or if otherwise unsure, ask for advice on the project
+mailing list.
+
+It is also recommended to attempt to contact the original author to let them
+know you are interested in taking over their work, in case they still intended
+to return to the work, or had any suggestions about the best way to continue.
diff --git a/docs/devel/index-process.rst b/docs/devel/index-process.rst
index 362f97ee30..b54e58105e 100644
--- a/docs/devel/index-process.rst
+++ b/docs/devel/index-process.rst
@@ -13,6 +13,7 @@ Notes about how to interact with the community and how and where to submit patch
    maintainers
    style
    submitting-a-patch
+   code-provenance
    trivial-patches
    stable-process
    submitting-a-pull-request
diff --git a/docs/devel/submitting-a-patch.rst b/docs/devel/submitting-a-patch.rst
index 83e9092b8c..2cc4d53ff6 100644
--- a/docs/devel/submitting-a-patch.rst
+++ b/docs/devel/submitting-a-patch.rst
@@ -322,23 +322,8 @@ Patch emails must include a ``Signed-off-by:`` line
 
 Your patches **must** include a Signed-off-by: line. This is a hard
 requirement because it's how you say "I'm legally okay to contribute
-this and happy for it to go into QEMU". The process is modelled after
-the `Linux kernel
-<http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches?id=f6f94e2ab1b33f0082ac22d71f66385a60d8157f#n297>`__
-policy.
-
-If you wrote the patch, make sure your "From:" and "Signed-off-by:"
-lines use the same spelling. It's okay if you subscribe or contribute to
-the list via more than one address, but using multiple addresses in one
-commit just confuses things. If someone else wrote the patch, git will
-include a "From:" line in the body of the email (different from your
-envelope From:) that will give credit to the correct author; but again,
-that author's Signed-off-by: line is mandatory, with the same spelling.
-
-There are various tooling options for automatically adding these tags
-include using ``git commit -s`` or ``git format-patch -s``. For more
-information see `SubmittingPatches 1.12
-<http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches?id=f6f94e2ab1b33f0082ac22d71f66385a60d8157f#n297>`__.
+this and happy for it to go into QEMU". For full guidance, read the
+:ref:`code-provenance` documentation.
 
 .. _include_a_meaningful_cover_letter:
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/3] docs: introduce dedicated page about code provenance / sign-off
  2024-05-16 16:22 ` [PATCH v2 1/3] docs: introduce dedicated page about code provenance / sign-off Daniel P. Berrangé
@ 2024-05-16 17:29   ` Peter Maydell
  2024-05-16 17:34     ` Michael S. Tsirkin
  2024-05-16 17:33   ` Michael S. Tsirkin
  2024-05-17 18:08   ` Alex Bennée
  2 siblings, 1 reply; 23+ messages in thread
From: Peter Maydell @ 2024-05-16 17:29 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Thomas Huth, Alex Bennée, Michael S. Tsirkin,
	Gerd Hoffmann, Mark Cave-Ayland, Philippe Mathieu-Daudé,
	Kevin Wolf, Stefan Hajnoczi, Alexander Graf, Paolo Bonzini,
	Richard Henderson, Markus Armbruster

On Thu, 16 May 2024 at 17:22, Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> Currently we have a short paragraph saying that patches must include
> a Signed-off-by line, and merely link to the kernel documentation.
> The linked kernel docs have a lot of content beyond the part about
> sign-off an thus are misleading/distracting to QEMU contributors.

Thanks for this -- I've felt for ages that it was a bit awkward
that we didn't have a good place to link people to for the fuller
explanation of this.

> This introduces a dedicated 'code-provenance' page in QEMU talking
> about why we require sign-off, explaining the other tags we commonly
> use, and what to do in some edge cases.

The version of the kernel SubmittingPatches we used to link to
includes the text "sorry, no pseudonyms or anonymous contributions".
This new documentation doesn't say anything either way about
our approach to pseudonyms. I think we should probably say
something, but I don't know if we have an in-practice consensus
there, so maybe we should approach that as a separate change on
top of this patch.

So for this patch:

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/3] docs: introduce dedicated page about code provenance / sign-off
  2024-05-16 17:29   ` Peter Maydell
@ 2024-05-16 17:34     ` Michael S. Tsirkin
  2024-05-16 17:43       ` Peter Maydell
  0 siblings, 1 reply; 23+ messages in thread
From: Michael S. Tsirkin @ 2024-05-16 17:34 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Daniel P. Berrangé, qemu-devel, Thomas Huth,
	Alex Bennée, Gerd Hoffmann, Mark Cave-Ayland,
	Philippe Mathieu-Daudé, Kevin Wolf, Stefan Hajnoczi,
	Alexander Graf, Paolo Bonzini, Richard Henderson,
	Markus Armbruster

On Thu, May 16, 2024 at 06:29:39PM +0100, Peter Maydell wrote:
> On Thu, 16 May 2024 at 17:22, Daniel P. Berrangé <berrange@redhat.com> wrote:
> >
> > Currently we have a short paragraph saying that patches must include
> > a Signed-off-by line, and merely link to the kernel documentation.
> > The linked kernel docs have a lot of content beyond the part about
> > sign-off an thus are misleading/distracting to QEMU contributors.
> 
> Thanks for this -- I've felt for ages that it was a bit awkward
> that we didn't have a good place to link people to for the fuller
> explanation of this.
> 
> > This introduces a dedicated 'code-provenance' page in QEMU talking
> > about why we require sign-off, explaining the other tags we commonly
> > use, and what to do in some edge cases.
> 
> The version of the kernel SubmittingPatches we used to link to
> includes the text "sorry, no pseudonyms or anonymous contributions".
> This new documentation doesn't say anything either way about
> our approach to pseudonyms. I think we should probably say
> something, but I don't know if we have an in-practice consensus
> there, so maybe we should approach that as a separate change on
> top of this patch.


Well given we referred to kernel previously then I guess that's
the concensus, no?


> So for this patch:
> 
> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
> 
> thanks
> -- PMM



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/3] docs: introduce dedicated page about code provenance / sign-off
  2024-05-16 17:34     ` Michael S. Tsirkin
@ 2024-05-16 17:43       ` Peter Maydell
  2024-05-17  5:05         ` Thomas Huth
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Maydell @ 2024-05-16 17:43 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Daniel P. Berrangé, qemu-devel, Thomas Huth,
	Alex Bennée, Gerd Hoffmann, Mark Cave-Ayland,
	Philippe Mathieu-Daudé, Kevin Wolf, Stefan Hajnoczi,
	Alexander Graf, Paolo Bonzini, Richard Henderson,
	Markus Armbruster

On Thu, 16 May 2024 at 18:34, Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Thu, May 16, 2024 at 06:29:39PM +0100, Peter Maydell wrote:
> > On Thu, 16 May 2024 at 17:22, Daniel P. Berrangé <berrange@redhat.com> wrote:
> > >
> > > Currently we have a short paragraph saying that patches must include
> > > a Signed-off-by line, and merely link to the kernel documentation.
> > > The linked kernel docs have a lot of content beyond the part about
> > > sign-off an thus are misleading/distracting to QEMU contributors.
> >
> > Thanks for this -- I've felt for ages that it was a bit awkward
> > that we didn't have a good place to link people to for the fuller
> > explanation of this.
> >
> > > This introduces a dedicated 'code-provenance' page in QEMU talking
> > > about why we require sign-off, explaining the other tags we commonly
> > > use, and what to do in some edge cases.
> >
> > The version of the kernel SubmittingPatches we used to link to
> > includes the text "sorry, no pseudonyms or anonymous contributions".
> > This new documentation doesn't say anything either way about
> > our approach to pseudonyms. I think we should probably say
> > something, but I don't know if we have an in-practice consensus
> > there, so maybe we should approach that as a separate change on
> > top of this patch.
>
>
> Well given we referred to kernel previously then I guess that's
> the concensus, no?

AIUI the kernel devs have changed their point of view on the
pseudonym question, so it's a question of whether we were
deliberately referring to that specific revision of the kernel's
practice because we agreed with it or just by chance...

https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=d4563201f33a022fc0353033d9dfeb1606a88330

is where the kernel changed to saying merely "no anonymous
contributions", dropping the 'pseudonyms' part.

-- PMM


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/3] docs: introduce dedicated page about code provenance / sign-off
  2024-05-16 17:43       ` Peter Maydell
@ 2024-05-17  5:05         ` Thomas Huth
  2024-05-17 10:03           ` Daniel P. Berrangé
  0 siblings, 1 reply; 23+ messages in thread
From: Thomas Huth @ 2024-05-17  5:05 UTC (permalink / raw)
  To: Peter Maydell, Michael S. Tsirkin, Daniel P. Berrangé
  Cc: qemu-devel, Alex Bennée, Gerd Hoffmann, Mark Cave-Ayland,
	Philippe Mathieu-Daudé, Kevin Wolf, Stefan Hajnoczi,
	Alexander Graf, Paolo Bonzini, Richard Henderson,
	Markus Armbruster

On 16/05/2024 19.43, Peter Maydell wrote:
> On Thu, 16 May 2024 at 18:34, Michael S. Tsirkin <mst@redhat.com> wrote:
>>
>> On Thu, May 16, 2024 at 06:29:39PM +0100, Peter Maydell wrote:
>>> On Thu, 16 May 2024 at 17:22, Daniel P. Berrangé <berrange@redhat.com> wrote:
>>>>
>>>> Currently we have a short paragraph saying that patches must include
>>>> a Signed-off-by line, and merely link to the kernel documentation.
>>>> The linked kernel docs have a lot of content beyond the part about
>>>> sign-off an thus are misleading/distracting to QEMU contributors.
>>>
>>> Thanks for this -- I've felt for ages that it was a bit awkward
>>> that we didn't have a good place to link people to for the fuller
>>> explanation of this.
>>>
>>>> This introduces a dedicated 'code-provenance' page in QEMU talking
>>>> about why we require sign-off, explaining the other tags we commonly
>>>> use, and what to do in some edge cases.
>>>
>>> The version of the kernel SubmittingPatches we used to link to
>>> includes the text "sorry, no pseudonyms or anonymous contributions".
>>> This new documentation doesn't say anything either way about
>>> our approach to pseudonyms. I think we should probably say
>>> something, but I don't know if we have an in-practice consensus
>>> there, so maybe we should approach that as a separate change on
>>> top of this patch.
>>
>>
>> Well given we referred to kernel previously then I guess that's
>> the concensus, no?
> 
> AIUI the kernel devs have changed their point of view on the
> pseudonym question, so it's a question of whether we were
> deliberately referring to that specific revision of the kernel's
> practice because we agreed with it or just by chance...
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=d4563201f33a022fc0353033d9dfeb1606a88330
> 
> is where the kernel changed to saying merely "no anonymous
> contributions", dropping the 'pseudonyms' part.

FWIW, we had a clear statement in our document in the past:

https://gitlab.com/qemu-project/qemu/-/commit/ca127fe96ddb827f3ea153610c1e8f6e374708e2#9620a1442f724c9d8bfd5408e4611ba1839fcb8a_315_321

Quoting: "Please use your real name to sign a patch (not an alias or acronym)."

But it got lost in that rework, I assume by accident?

So IMHO we had a consensus once to not allow anonymous contributions. I'm in 
favor of adding such a sentence back here now.

  Thomas



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/3] docs: introduce dedicated page about code provenance / sign-off
  2024-05-17  5:05         ` Thomas Huth
@ 2024-05-17 10:03           ` Daniel P. Berrangé
  0 siblings, 0 replies; 23+ messages in thread
From: Daniel P. Berrangé @ 2024-05-17 10:03 UTC (permalink / raw)
  To: Thomas Huth
  Cc: Peter Maydell, Michael S. Tsirkin, qemu-devel, Alex Bennée,
	Gerd Hoffmann, Mark Cave-Ayland, Philippe Mathieu-Daudé,
	Kevin Wolf, Stefan Hajnoczi, Alexander Graf, Paolo Bonzini,
	Richard Henderson, Markus Armbruster

On Fri, May 17, 2024 at 07:05:05AM +0200, Thomas Huth wrote:
> On 16/05/2024 19.43, Peter Maydell wrote:
> > On Thu, 16 May 2024 at 18:34, Michael S. Tsirkin <mst@redhat.com> wrote:
> > > 
> > > On Thu, May 16, 2024 at 06:29:39PM +0100, Peter Maydell wrote:
> > > > On Thu, 16 May 2024 at 17:22, Daniel P. Berrangé <berrange@redhat.com> wrote:
> > > > > 
> > > > > Currently we have a short paragraph saying that patches must include
> > > > > a Signed-off-by line, and merely link to the kernel documentation.
> > > > > The linked kernel docs have a lot of content beyond the part about
> > > > > sign-off an thus are misleading/distracting to QEMU contributors.
> > > > 
> > > > Thanks for this -- I've felt for ages that it was a bit awkward
> > > > that we didn't have a good place to link people to for the fuller
> > > > explanation of this.
> > > > 
> > > > > This introduces a dedicated 'code-provenance' page in QEMU talking
> > > > > about why we require sign-off, explaining the other tags we commonly
> > > > > use, and what to do in some edge cases.
> > > > 
> > > > The version of the kernel SubmittingPatches we used to link to
> > > > includes the text "sorry, no pseudonyms or anonymous contributions".
> > > > This new documentation doesn't say anything either way about
> > > > our approach to pseudonyms. I think we should probably say
> > > > something, but I don't know if we have an in-practice consensus
> > > > there, so maybe we should approach that as a separate change on
> > > > top of this patch.
> > > 
> > > 
> > > Well given we referred to kernel previously then I guess that's
> > > the concensus, no?
> > 
> > AIUI the kernel devs have changed their point of view on the
> > pseudonym question, so it's a question of whether we were
> > deliberately referring to that specific revision of the kernel's
> > practice because we agreed with it or just by chance...
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=d4563201f33a022fc0353033d9dfeb1606a88330
> > 
> > is where the kernel changed to saying merely "no anonymous
> > contributions", dropping the 'pseudonyms' part.
> 
> FWIW, we had a clear statement in our document in the past:
> 
> https://gitlab.com/qemu-project/qemu/-/commit/ca127fe96ddb827f3ea153610c1e8f6e374708e2#9620a1442f724c9d8bfd5408e4611ba1839fcb8a_315_321
> 
> Quoting: "Please use your real name to sign a patch (not an alias or acronym)."
> 
> But it got lost in that rework, I assume by accident?

Yeah, probably an oversight.

> So IMHO we had a consensus once to not allow anonymous contributions. I'm in
> favor of adding such a sentence back here now.

That text has been in the submitting-a-patch file since day 1, but that
content was originally a copy of the old wiki page, and the wiki edits
never had any formal peer review, so we should be wary of claiming too
much about a consensus.

Going back in history we can see the specific wording arrived with
this change:

  https://wiki.qemu.org/index.php?title=Contribute%2FSubmitAPatch&type=revision&diff=2173&oldid=2094

This may have been an informally held opinion amongst at least some
of those in the community at the time, but don't recall there was a
specific debate about the allowance of psuedonyms, etc.



I have traditionally been in favour of requiring real names, which I
had pretty much interpreted to imply a person's legal name. That was
mostly because I was following what I (apparently incorrectly) thought
was the kernel's intent in this respect.

Looking at the kernel commit above, I have sympathy with the view that
interpreting "real name" too strictly as a "legal name" is exclusionary.

Thus I'd be in favour of following the kernels' clarified intent, which
broadly aligns with the CNCF explanatory text, that "real name" can be
loosely interpreted to be "a commonly known identity in the community".

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/3] docs: introduce dedicated page about code provenance / sign-off
  2024-05-16 16:22 ` [PATCH v2 1/3] docs: introduce dedicated page about code provenance / sign-off Daniel P. Berrangé
  2024-05-16 17:29   ` Peter Maydell
@ 2024-05-16 17:33   ` Michael S. Tsirkin
  2024-05-17 11:09     ` Daniel P. Berrangé
  2024-05-17 18:08   ` Alex Bennée
  2 siblings, 1 reply; 23+ messages in thread
From: Michael S. Tsirkin @ 2024-05-16 17:33 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Thomas Huth, Alex Bennée, Gerd Hoffmann,
	Mark Cave-Ayland, Philippe Mathieu-Daudé, Kevin Wolf,
	Stefan Hajnoczi, Alexander Graf, Paolo Bonzini, Richard Henderson,
	Peter Maydell, Markus Armbruster

On Thu, May 16, 2024 at 05:22:28PM +0100, Daniel P. Berrangé wrote:
> Currently we have a short paragraph saying that patches must include
> a Signed-off-by line, and merely link to the kernel documentation.
> The linked kernel docs have a lot of content beyond the part about
> sign-off an thus are misleading/distracting to QEMU contributors.
> 
> This introduces a dedicated 'code-provenance' page in QEMU talking
> about why we require sign-off, explaining the other tags we commonly
> use, and what to do in some edge cases.
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>  docs/devel/code-provenance.rst    | 212 ++++++++++++++++++++++++++++++
>  docs/devel/index-process.rst      |   1 +
>  docs/devel/submitting-a-patch.rst |  19 +--
>  3 files changed, 215 insertions(+), 17 deletions(-)
>  create mode 100644 docs/devel/code-provenance.rst
> 
> diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
> new file mode 100644
> index 0000000000..7c42fae571
> --- /dev/null
> +++ b/docs/devel/code-provenance.rst
> @@ -0,0 +1,212 @@
> +.. _code-provenance:
> +
> +Code provenance
> +===============
> +
> +Certifying patch submissions
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The QEMU community **mandates** all contributors to certify provenance of
> +patch submissions they make to the project. To put it another way,
> +contributors must indicate that they are legally permitted to contribute to
> +the project.
> +
> +Certification is achieved with a low overhead by adding a single line to the
> +bottom of every git commit::
> +
> +   Signed-off-by: YOUR NAME <YOUR@EMAIL>
> +
> +The addition of this line asserts that the author of the patch is contributing
> +in accordance with the clauses specified in the
> +`Developer's Certificate of Origin <https://developercertificate.org>`__:

Why are you linking to this one?
It's slightly different from kernel, with copyright and prohibition to change it.

there's also a bit more text in the kernel, e.g. the rule against
anonymous contributions.



> +.. _dco:
> +
> +::
> +  Developer's Certificate of Origin 1.1
> +  By making a contribution to this project, I certify that:
> +
> +  (a) The contribution was created in whole or in part by me and I
> +      have the right to submit it under the open source license
> +      indicated in the file; or
> +
> +  (b) The contribution is based upon previous work that, to the best
> +      of my knowledge, is covered under an appropriate open source
> +      license and I have the right under that license to submit that
> +      work with modifications, whether created in whole or in part
> +      by me, under the same open source license (unless I am
> +      permitted to submit under a different license), as indicated
> +      in the file; or
> +
> +  (c) The contribution was provided directly to me by some other
> +      person who certified (a), (b) or (c) and I have not modified
> +      it.
> +
> +  (d) I understand and agree that this project and the contribution
> +      are public and that a record of the contribution (including all
> +      personal information I submit with it, including my sign-off) is
> +      maintained indefinitely and may be redistributed consistent with
> +      this project or the open source license(s) involved.
> +
> +It is generally expected that the name and email addresses used in one of the
> +``Signed-off-by`` lines, matches that of the git commit ``Author`` field.
> +
> +If the person sending the mail is not one of the patch authors, they are none
> +the less expected to add their own ``Signed-off-by`` to comply with the DCO
> +clause (c).
> +
> +Multiple authorship
> +~~~~~~~~~~~~~~~~~~~
> +
> +It is not uncommon for a patch to have contributions from multiple authors. In
> +this scenario, git commits will usually be expected to have a ``Signed-off-by``
> +line for each contributor involved in creation of the patch. Some edge cases:
> +
> +  * The non-primary author's contributions were so trivial that they can be
> +    considered not subject to copyright. In this case the secondary authors
> +    need not include a ``Signed-off-by``.
> +
> +    This case most commonly applies where QEMU reviewers give short snippets
> +    of code as suggested fixes to a patch. The reviewers don't need to have
> +    their own ``Signed-off-by`` added unless their code suggestion was
> +    unusually large, but it is common to add ``Suggested-by`` as a credit
> +    for non-trivial code.
> +
> +  * Both contributors work for the same employer and the employer requires
> +    copyright assignment.
> +
> +    It can be said that in this case a ``Signed-off-by`` is indicating that
> +    the person has permission to contribute from their employer who is the
> +    copyright holder. It is none the less still preferable to include a
> +    ``Signed-off-by`` for each contributor, as in some countries employees are
> +    not able to assign copyright to their employer, and it also covers any
> +    time invested outside working hours.
> +
> +When multiple ``Signed-off-by`` tags are present, they should be strictly kept
> +in order of authorship, from oldest to newest.
> +
> +Other commit tags
> +~~~~~~~~~~~~~~~~~
> +
> +While the ``Signed-off-by`` tag is mandatory, there are a number of other tags
> +that are commonly used during QEMU development:
> +
> + * **``Reviewed-by``**: when a QEMU community member reviews a patch on the
> +   mailing list, if they consider the patch acceptable, they should send an
> +   email reply containing a ``Reviewed-by`` tag. Subsystem maintainers who
> +   review a patch should add this even if they are also adding their
> +   ``Signed-off-by`` to the same commit.
> +
> + * **``Acked-by``**: when a QEMU subsystem maintainer approves a patch that
> +   touches their subsystem, but intends to allow a different maintainer to
> +   queue it and send a pull request, they would send a mail containing a
> +   ``Acked-by`` tag. Where a patch touches multiple subsystems, ``Acked-by``
> +   only implies review of the maintainers' own areas of responsibility. If a
> +   maintainer wants to indicate they have done a full review they should use
> +   a ``Reviewed-by`` tag.
> +
> + * **``Tested-by``**: when a QEMU community member has functionally tested the
> +   behaviour of the patch in some manner, they should send an email reply
> +   containing a ``Tested-by`` tag.
> +
> + * **``Reported-by``**: when a QEMU community member reports a problem via the
> +   mailing list, or some other informal channel that is not the issue tracker,
> +   it is good practice to credit them by including a ``Reported-by`` tag on
> +   any patch fixing the issue. When the problem is reported via the GitLab
> +   issue tracker, however, it is sufficient to just include a link to the
> +   issue.
> +
> + * **``Suggested-by``**: when a reviewer or other 3rd party makes non-trivial
> +   suggestions for how to change a patch, it is good practice to credit them
> +   by including a ``Suggested-by`` tag.
> +
> +Subsystem maintainer requirements
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +When a subsystem maintainer accepts a patch from a contributor, in addition to
> +the normal code review points, they are expected to validate the presence of
> +suitable ``Signed-off-by`` tags.
> +
> +At the time they queue the patch in their subsystem tree, the maintainer
> +**must** also then add their own ``Signed-off-by`` to indicate that they have
> +done the aforementioned validation. This is in addition to any of their own
> +``Reviewed-by`` tags the subsystem maintainer may wish to include.
> +
> +Tools for adding ``Signed-off-by``
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +There are a variety of ways tools can support adding ``Signed-off-by`` tags
> +for patches, avoiding the need for contributors to manually type in this
> +repetitive text each time.
> +
> +git commands
> +^^^^^^^^^^^^
> +
> +When creating, or amending, a commit the ``-s`` flag to ``git commit`` will
> +append a suitable line matching the configuring git author details.
> +
> +If preparing patches using the ``git format-patch`` tool, the ``-s`` flag can
> +be used to append a suitable line in the emails it creates, without modifying
> +the local commits. Alternatively to modify all the local commits on a branch::
> +
> +  git rebase master -x 'git commit --amend --no-edit -s'
> +
> +emacs
> +^^^^^
> +
> +In the file ``$HOME/.emacs.d/abbrev_defs`` add::
> +
> +  (define-abbrev-table 'global-abbrev-table
> +    '(
> +      ("8rev" "Reviewed-by: YOUR NAME <your@email.addr>" nil 1)
> +      ("8ack" "Acked-by: YOUR NAME <your@email.addr>" nil 1)
> +      ("8test" "Tested-by: YOUR NAME <your@email.addr>" nil 1)
> +      ("8sob" "Signed-off-by: YOUR NAME <your@email.addr>" nil 1)
> +     ))
> +
> +with this change, if you type (for example) ``8rev`` followed by ``<space>``
> +or ``<enter>`` it will expand to the whole phrase.
> +
> +vim
> +^^^
> +
> +In the file ``$HOME/.vimrc`` add::
> +
> +  iabbrev 8rev Reviewed-by: YOUR NAME <your@email.addr>
> +  iabbrev 8ack Acked-by: YOUR NAME <your@email.addr>
> +  iabbrev 8test Tested-by: YOUR NAME <your@email.addr>
> +  iabbrev 8sob Signed-off-by: YOUR NAME <your@email.addr>
> +
> +with this change, if you type (for example) ``8rev`` followed by ``<space>``
> +or ``<enter>`` it will expand to the whole phrase.
> +
> +Re-starting abandoned work
> +~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +For a variety of reasons there are some patches that get submitted to QEMU but
> +never merged. An unrelated contributor may decide (months or years later) to
> +continue working from the abandoned patch and re-submit it with extra changes.
> +
> +The general principles when picking up abandoned work are:
> +
> + * Continue to credit the original author for their work, by maintaining their
> +   original ``Signed-off-by``
> + * Indicate where the original patch was obtained from (mailing list, bug
> +   tracker, author's git repo, etc) when sending it for review
> + * Acknowledge the extra work of the new contributor by including their
> +   ``Signed-off-by`` in the patch in addition to the orignal author's
> + * Indicate who is responsible for what parts of the patch. This is typically
> +   done via a note in the commit message, just prior to the new contributor's
> +   ``Signed-off-by``::
> +
> +    Signed-off-by: Some Person <some.person@example.com>
> +    [Rebased and added support for 'foo']
> +    Signed-off-by: New Person <new.person@mycorp.test>
> +
> +In complicated cases, or if otherwise unsure, ask for advice on the project
> +mailing list.
> +
> +It is also recommended to attempt to contact the original author to let them
> +know you are interested in taking over their work, in case they still intended
> +to return to the work, or had any suggestions about the best way to continue.
> diff --git a/docs/devel/index-process.rst b/docs/devel/index-process.rst
> index 362f97ee30..b54e58105e 100644
> --- a/docs/devel/index-process.rst
> +++ b/docs/devel/index-process.rst
> @@ -13,6 +13,7 @@ Notes about how to interact with the community and how and where to submit patch
>     maintainers
>     style
>     submitting-a-patch
> +   code-provenance
>     trivial-patches
>     stable-process
>     submitting-a-pull-request
> diff --git a/docs/devel/submitting-a-patch.rst b/docs/devel/submitting-a-patch.rst
> index 83e9092b8c..2cc4d53ff6 100644
> --- a/docs/devel/submitting-a-patch.rst
> +++ b/docs/devel/submitting-a-patch.rst
> @@ -322,23 +322,8 @@ Patch emails must include a ``Signed-off-by:`` line
>  
>  Your patches **must** include a Signed-off-by: line. This is a hard
>  requirement because it's how you say "I'm legally okay to contribute
> -this and happy for it to go into QEMU". The process is modelled after
> -the `Linux kernel
> -<http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches?id=f6f94e2ab1b33f0082ac22d71f66385a60d8157f#n297>`__
> -policy.
> -
> -If you wrote the patch, make sure your "From:" and "Signed-off-by:"
> -lines use the same spelling. It's okay if you subscribe or contribute to
> -the list via more than one address, but using multiple addresses in one
> -commit just confuses things.


I gather you no longer see value in discussing this use-case?
Maybe mention in commit log, why.

> If someone else wrote the patch, git will
> -include a "From:" line in the body of the email (different from your
> -envelope From:) that will give credit to the correct author; but again,
> -that author's Signed-off-by: line is mandatory, with the same spelling.
> -
> -There are various tooling options for automatically adding these tags
> -include using ``git commit -s`` or ``git format-patch -s``. For more
> -information see `SubmittingPatches 1.12
> -<http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches?id=f6f94e2ab1b33f0082ac22d71f66385a60d8157f#n297>`__.
> +this and happy for it to go into QEMU". For full guidance, read the
> +:ref:`code-provenance` documentation.
>  
>  .. _include_a_meaningful_cover_letter:
>  
> -- 
> 2.43.0



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/3] docs: introduce dedicated page about code provenance / sign-off
  2024-05-16 17:33   ` Michael S. Tsirkin
@ 2024-05-17 11:09     ` Daniel P. Berrangé
  0 siblings, 0 replies; 23+ messages in thread
From: Daniel P. Berrangé @ 2024-05-17 11:09 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, Thomas Huth, Alex Bennée, Gerd Hoffmann,
	Mark Cave-Ayland, Philippe Mathieu-Daudé, Kevin Wolf,
	Stefan Hajnoczi, Alexander Graf, Paolo Bonzini, Richard Henderson,
	Peter Maydell, Markus Armbruster

On Thu, May 16, 2024 at 01:33:01PM -0400, Michael S. Tsirkin wrote:
> On Thu, May 16, 2024 at 05:22:28PM +0100, Daniel P. Berrangé wrote:
> > Currently we have a short paragraph saying that patches must include
> > a Signed-off-by line, and merely link to the kernel documentation.
> > The linked kernel docs have a lot of content beyond the part about
> > sign-off an thus are misleading/distracting to QEMU contributors.
> > 
> > This introduces a dedicated 'code-provenance' page in QEMU talking
> > about why we require sign-off, explaining the other tags we commonly
> > use, and what to do in some edge cases.
> > 
> > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > ---
> >  docs/devel/code-provenance.rst    | 212 ++++++++++++++++++++++++++++++
> >  docs/devel/index-process.rst      |   1 +
> >  docs/devel/submitting-a-patch.rst |  19 +--
> >  3 files changed, 215 insertions(+), 17 deletions(-)
> >  create mode 100644 docs/devel/code-provenance.rst
> > 
> > diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
> > new file mode 100644
> > index 0000000000..7c42fae571
> > --- /dev/null
> > +++ b/docs/devel/code-provenance.rst
> > @@ -0,0 +1,212 @@
> > +.. _code-provenance:
> > +
> > +Code provenance
> > +===============
> > +
> > +Certifying patch submissions
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +The QEMU community **mandates** all contributors to certify provenance of
> > +patch submissions they make to the project. To put it another way,
> > +contributors must indicate that they are legally permitted to contribute to
> > +the project.
> > +
> > +Certification is achieved with a low overhead by adding a single line to the
> > +bottom of every git commit::
> > +
> > +   Signed-off-by: YOUR NAME <YOUR@EMAIL>
> > +
> > +The addition of this line asserts that the author of the patch is contributing
> > +in accordance with the clauses specified in the
> > +`Developer's Certificate of Origin <https://developercertificate.org>`__:
> 
> Why are you linking to this one?

The kernel doesn't have a standalone copy of the text, it is just
inline in the middle of their huge SubmittingPatches document.
We don't want to mislead people into thinking we're following
the kernel's patch submision rules in general, instead define
our own clear policy. 


> It's slightly different from kernel, with copyright and prohibition to change it.

That difference is not of any consequence. The probhition
aganist changing makes sense, to protect the value of the
"Developer Certificate of Origin" term to have a fixed
meaning.

The 4 clauses that you must certify against are all identical
to the kernel's copy, which is what matters.

> there's also a bit more text in the kernel, e.g. the rule against
> anonymous contributions.

Yes, we should clarify our intent in this respect, per the other
part of this thread around what we interpret "real name" to
mean for QEMU.


> > diff --git a/docs/devel/submitting-a-patch.rst b/docs/devel/submitting-a-patch.rst
> > index 83e9092b8c..2cc4d53ff6 100644
> > --- a/docs/devel/submitting-a-patch.rst
> > +++ b/docs/devel/submitting-a-patch.rst
> > @@ -322,23 +322,8 @@ Patch emails must include a ``Signed-off-by:`` line
> >  
> >  Your patches **must** include a Signed-off-by: line. This is a hard
> >  requirement because it's how you say "I'm legally okay to contribute
> > -this and happy for it to go into QEMU". The process is modelled after
> > -the `Linux kernel
> > -<http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches?id=f6f94e2ab1b33f0082ac22d71f66385a60d8157f#n297>`__
> > -policy.
> > -
> > -If you wrote the patch, make sure your "From:" and "Signed-off-by:"
> > -lines use the same spelling. It's okay if you subscribe or contribute to
> > -the list via more than one address, but using multiple addresses in one
> > -commit just confuses things.
> 
> 
> I gather you no longer see value in discussing this use-case?
> Maybe mention in commit log, why.

I should have preserved this phrase in the new doc.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/3] docs: introduce dedicated page about code provenance / sign-off
  2024-05-16 16:22 ` [PATCH v2 1/3] docs: introduce dedicated page about code provenance / sign-off Daniel P. Berrangé
  2024-05-16 17:29   ` Peter Maydell
  2024-05-16 17:33   ` Michael S. Tsirkin
@ 2024-05-17 18:08   ` Alex Bennée
  2 siblings, 0 replies; 23+ messages in thread
From: Alex Bennée @ 2024-05-17 18:08 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Thomas Huth, Michael S. Tsirkin, Gerd Hoffmann,
	Mark Cave-Ayland, Philippe Mathieu-Daudé, Kevin Wolf,
	Stefan Hajnoczi, Alexander Graf, Paolo Bonzini, Richard Henderson,
	Peter Maydell, Markus Armbruster

Daniel P. Berrangé <berrange@redhat.com> writes:

> Currently we have a short paragraph saying that patches must include
> a Signed-off-by line, and merely link to the kernel documentation.
> The linked kernel docs have a lot of content beyond the part about
> sign-off an thus are misleading/distracting to QEMU contributors.
>
> This introduces a dedicated 'code-provenance' page in QEMU talking
> about why we require sign-off, explaining the other tags we commonly
> use, and what to do in some edge cases.
>
<snip>
> +
> +Other commit tags
> +~~~~~~~~~~~~~~~~~
> +
> +While the ``Signed-off-by`` tag is mandatory, there are a number of other tags
> +that are commonly used during QEMU development:
> +
> + * **``Reviewed-by``**: when a QEMU community member reviews a patch on the
> +   mailing list, if they consider the patch acceptable, they should send an
> +   email reply containing a ``Reviewed-by`` tag. Subsystem maintainers who
> +   review a patch should add this even if they are also adding their
> +   ``Signed-off-by`` to the same commit.
> +
> + * **``Acked-by``**: when a QEMU subsystem maintainer approves a patch that
> +   touches their subsystem, but intends to allow a different maintainer to
> +   queue it and send a pull request, they would send a mail containing a
> +   ``Acked-by`` tag. Where a patch touches multiple subsystems, ``Acked-by``
> +   only implies review of the maintainers' own areas of responsibility. If a
> +   maintainer wants to indicate they have done a full review they should use
> +   a ``Reviewed-by`` tag.
> +
> + * **``Tested-by``**: when a QEMU community member has functionally tested the
> +   behaviour of the patch in some manner, they should send an email reply
> +   containing a ``Tested-by`` tag.
> +
> + * **``Reported-by``**: when a QEMU community member reports a problem via the
> +   mailing list, or some other informal channel that is not the issue tracker,
> +   it is good practice to credit them by including a ``Reported-by`` tag on
> +   any patch fixing the issue. When the problem is reported via the GitLab
> +   issue tracker, however, it is sufficient to just include a link to the
> +   issue.
> +
> + * **``Suggested-by``**: when a reviewer or other 3rd party makes non-trivial
> +   suggestions for how to change a patch, it is good practice to credit them
> +   by including a ``Suggested-by`` tag.

Should we mention our use of Message-Id in so far the informal good
practice is that we keep the Message-Id's of the last time a patch was
posted and potentially the message-ids of previous posters?

But this is definitely an improvement of what we had before so:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>


> +
> +Subsystem maintainer requirements
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +When a subsystem maintainer accepts a patch from a contributor, in addition to
> +the normal code review points, they are expected to validate the presence of
> +suitable ``Signed-off-by`` tags.
> +
> +At the time they queue the patch in their subsystem tree, the maintainer
> +**must** also then add their own ``Signed-off-by`` to indicate that they have
> +done the aforementioned validation. This is in addition to any of their own
> +``Reviewed-by`` tags the subsystem maintainer may wish to include.
> +
> +Tools for adding ``Signed-off-by``
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +There are a variety of ways tools can support adding ``Signed-off-by`` tags
> +for patches, avoiding the need for contributors to manually type in this
> +repetitive text each time.
> +
> +git commands
> +^^^^^^^^^^^^
> +
> +When creating, or amending, a commit the ``-s`` flag to ``git commit`` will
> +append a suitable line matching the configuring git author details.
> +
> +If preparing patches using the ``git format-patch`` tool, the ``-s`` flag can
> +be used to append a suitable line in the emails it creates, without modifying
> +the local commits. Alternatively to modify all the local commits on a branch::
> +
> +  git rebase master -x 'git commit --amend --no-edit -s'
> +
> +emacs
> +^^^^^
> +
> +In the file ``$HOME/.emacs.d/abbrev_defs`` add::
> +
> +  (define-abbrev-table 'global-abbrev-table
> +    '(
> +      ("8rev" "Reviewed-by: YOUR NAME <your@email.addr>" nil 1)
> +      ("8ack" "Acked-by: YOUR NAME <your@email.addr>" nil 1)
> +      ("8test" "Tested-by: YOUR NAME <your@email.addr>" nil 1)
> +      ("8sob" "Signed-off-by: YOUR NAME <your@email.addr>" nil 1)
> +     ))
> +
> +with this change, if you type (for example) ``8rev`` followed by ``<space>``
> +or ``<enter>`` it will expand to the whole phrase.
> +
> +vim
> +^^^
> +
> +In the file ``$HOME/.vimrc`` add::
> +
> +  iabbrev 8rev Reviewed-by: YOUR NAME <your@email.addr>
> +  iabbrev 8ack Acked-by: YOUR NAME <your@email.addr>
> +  iabbrev 8test Tested-by: YOUR NAME <your@email.addr>
> +  iabbrev 8sob Signed-off-by: YOUR NAME <your@email.addr>
> +
> +with this change, if you type (for example) ``8rev`` followed by ``<space>``
> +or ``<enter>`` it will expand to the whole phrase.
> +
> +Re-starting abandoned work
> +~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +For a variety of reasons there are some patches that get submitted to QEMU but
> +never merged. An unrelated contributor may decide (months or years later) to
> +continue working from the abandoned patch and re-submit it with extra changes.
> +
> +The general principles when picking up abandoned work are:
> +
> + * Continue to credit the original author for their work, by maintaining their
> +   original ``Signed-off-by``
> + * Indicate where the original patch was obtained from (mailing list, bug
> +   tracker, author's git repo, etc) when sending it for review
> + * Acknowledge the extra work of the new contributor by including their
> +   ``Signed-off-by`` in the patch in addition to the orignal author's
> + * Indicate who is responsible for what parts of the patch. This is typically
> +   done via a note in the commit message, just prior to the new contributor's
> +   ``Signed-off-by``::
> +
> +    Signed-off-by: Some Person <some.person@example.com>
> +    [Rebased and added support for 'foo']
> +    Signed-off-by: New Person <new.person@mycorp.test>
> +
> +In complicated cases, or if otherwise unsure, ask for advice on the project
> +mailing list.
> +
> +It is also recommended to attempt to contact the original author to let them
> +know you are interested in taking over their work, in case they still intended
> +to return to the work, or had any suggestions about the best way to continue.
> diff --git a/docs/devel/index-process.rst b/docs/devel/index-process.rst
> index 362f97ee30..b54e58105e 100644
> --- a/docs/devel/index-process.rst
> +++ b/docs/devel/index-process.rst
> @@ -13,6 +13,7 @@ Notes about how to interact with the community and how and where to submit patch
>     maintainers
>     style
>     submitting-a-patch
> +   code-provenance
>     trivial-patches
>     stable-process
>     submitting-a-pull-request
> diff --git a/docs/devel/submitting-a-patch.rst b/docs/devel/submitting-a-patch.rst
> index 83e9092b8c..2cc4d53ff6 100644
> --- a/docs/devel/submitting-a-patch.rst
> +++ b/docs/devel/submitting-a-patch.rst
> @@ -322,23 +322,8 @@ Patch emails must include a ``Signed-off-by:`` line
>  
>  Your patches **must** include a Signed-off-by: line. This is a hard
>  requirement because it's how you say "I'm legally okay to contribute
> -this and happy for it to go into QEMU". The process is modelled after
> -the `Linux kernel
> -<http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches?id=f6f94e2ab1b33f0082ac22d71f66385a60d8157f#n297>`__
> -policy.
> -
> -If you wrote the patch, make sure your "From:" and "Signed-off-by:"
> -lines use the same spelling. It's okay if you subscribe or contribute to
> -the list via more than one address, but using multiple addresses in one
> -commit just confuses things. If someone else wrote the patch, git will
> -include a "From:" line in the body of the email (different from your
> -envelope From:) that will give credit to the correct author; but again,
> -that author's Signed-off-by: line is mandatory, with the same spelling.
> -
> -There are various tooling options for automatically adding these tags
> -include using ``git commit -s`` or ``git format-patch -s``. For more
> -information see `SubmittingPatches 1.12
> -<http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches?id=f6f94e2ab1b33f0082ac22d71f66385a60d8157f#n297>`__.
> +this and happy for it to go into QEMU". For full guidance, read the
> +:ref:`code-provenance` documentation.
>  
>  .. _include_a_meaningful_cover_letter:

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 2/3] docs: define policy limiting the inclusion of generated files
  2024-05-16 16:22 [PATCH v2 0/3] docs: define policy forbidding use of "AI" / LLM code generators Daniel P. Berrangé
  2024-05-16 16:22 ` [PATCH v2 1/3] docs: introduce dedicated page about code provenance / sign-off Daniel P. Berrangé
@ 2024-05-16 16:22 ` Daniel P. Berrangé
  2024-05-16 17:04   ` Michael S. Tsirkin
                     ` (2 more replies)
  2024-05-16 16:22 ` [PATCH v2 3/3] docs: define policy forbidding use of AI code generators Daniel P. Berrangé
                   ` (3 subsequent siblings)
  5 siblings, 3 replies; 23+ messages in thread
From: Daniel P. Berrangé @ 2024-05-16 16:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, Alex Bennée, Michael S. Tsirkin, Gerd Hoffmann,
	Mark Cave-Ayland, Philippe Mathieu-Daudé, Kevin Wolf,
	Daniel P. Berrangé, Stefan Hajnoczi, Alexander Graf,
	Paolo Bonzini, Richard Henderson, Peter Maydell,
	Markus Armbruster

Files contributed to QEMU are generally expected to be provided in the
preferred format for manipulation. IOW, we generally don't expect to
have generated / compiled code included in the tree, rather, we expect
to run the code generator / compiler as part of the build process.

There are some obvious exceptions to this seen in our existing tree, the
biggest one being the inclusion of many binary firmware ROMs. A more
niche example is the inclusion of a generated eBPF program. Or the CI
dockerfiles which are mostly auto-generated. In these cases, however,
the preferred format source code is still required to be included,
alongside the generated output.

Tools which perform user defined algorithmic transformations on code are
not considered to be "code generators". ie, we permit use of coccinelle,
spell checkers, and sed/awk/etc to manipulate code. Such use of automated
manipulation should still be declared in the commit message.

One off generators which create a boilerplate file which the author then
fills in, are acceptable if their output has clear copyright and license
status. This could be where a contributor writes a throwaway python
script to automate creation of some mundane piece of code for example.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 docs/devel/code-provenance.rst | 55 ++++++++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)

diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
index 7c42fae571..eabb3e7c08 100644
--- a/docs/devel/code-provenance.rst
+++ b/docs/devel/code-provenance.rst
@@ -210,3 +210,58 @@ mailing list.
 It is also recommended to attempt to contact the original author to let them
 know you are interested in taking over their work, in case they still intended
 to return to the work, or had any suggestions about the best way to continue.
+
+Inclusion of generated files
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Files in patches contributed to QEMU are generally expected to be provided
+only in the preferred format for making modifications. The implication of
+this is that the output of code generators or compilers is usually not
+appropriate to contribute to QEMU.
+
+For reasons of practicality there are some exceptions to this rule, where
+generated code is permitted, provided it is also accompanied by the
+corresponding preferred source format. This is done where it is impractical
+to expect those building QEMU to run the code generation or compilation
+process. A non-exhustive list of examples is:
+
+ * Images: where an bitmap image is created from a vector file it is common
+   to include the rendered bitmaps at desired resolution(s), since subtle
+   changes in the rasterization process / tools may affect quality. The
+   original vector file is expected to accompany any generated bitmaps.
+
+ * Firmware: QEMU includes pre-compiled binary ROMs for a variety of guest
+   firmwares. When such binary ROMs are contributed, the corresponding source
+   must also be provided, either directly, or through a git submodule link.
+
+ * Dockerfiles: the majority of the dockerfiles are automatically generated
+   from a canonical list of build dependencies maintained in tree, together
+   with the libvirt-ci git submodule link. The generated dockerfiles are
+   included in tree because it is desirable to be able to directly build
+   container images from a clean git checkout.
+
+ * EBPF: QEMU includes some generated EBPF machine code, since the required
+   eBPF compilation tools are not broadly available on all targetted OS
+   distributions. The corresponding eBPF C code for the binary is also
+   provided. This is a time limited exception until the eBPF toolchain is
+   sufficiently broadly available in distros.
+
+In all cases above, the existence of generated files must be acknowledged
+and justified in the commit that introduces them.
+
+Tools which perform changes to existing code with deterministic algorithmic
+manipulation, driven by user specified inputs, are not generally considered
+to be "generators".
+
+IOW, using coccinelle to convert code from one pattern to another pattern, or
+fixing docs typos with a spell checker, or transforming code using sed / awk /
+etc, are not considered to be acts of code generation. Where an automated
+manipulation is performed on code, however, this should be declared in the
+commit message.
+
+At times contributors may use or create scripts/tools to generate an initial
+boilerplate code template which is then filled in to produce the final patch.
+The output of such a tool would still be considered the "preferred format",
+since it is intended to be a foundation for further human authored changes.
+Such tools are acceptable to use, provided they follow a deterministic process
+and there is clearly defined copyright and licensing for their output.
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/3] docs: define policy limiting the inclusion of generated files
  2024-05-16 16:22 ` [PATCH v2 2/3] docs: define policy limiting the inclusion of generated files Daniel P. Berrangé
@ 2024-05-16 17:04   ` Michael S. Tsirkin
  2024-05-17 10:51     ` Daniel P. Berrangé
  2024-05-17 18:23   ` Alex Bennée
  2024-05-28 15:41   ` Kevin Wolf
  2 siblings, 1 reply; 23+ messages in thread
From: Michael S. Tsirkin @ 2024-05-16 17:04 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Thomas Huth, Alex Bennée, Gerd Hoffmann,
	Mark Cave-Ayland, Philippe Mathieu-Daudé, Kevin Wolf,
	Stefan Hajnoczi, Alexander Graf, Paolo Bonzini, Richard Henderson,
	Peter Maydell, Markus Armbruster

On Thu, May 16, 2024 at 05:22:29PM +0100, Daniel P. Berrangé wrote:
> Files contributed to QEMU are generally expected to be provided in the
> preferred format for manipulation. IOW, we generally don't expect to
> have generated / compiled code included in the tree, rather, we expect
> to run the code generator / compiler as part of the build process.
> 
> There are some obvious exceptions to this seen in our existing tree, the
> biggest one being the inclusion of many binary firmware ROMs. A more
> niche example is the inclusion of a generated eBPF program. Or the CI
> dockerfiles which are mostly auto-generated. In these cases, however,
> the preferred format source code is still required to be included,
> alongside the generated output.
> 
> Tools which perform user defined algorithmic transformations on code are
> not considered to be "code generators". ie, we permit use of coccinelle,
> spell checkers, and sed/awk/etc to manipulate code. Such use of automated
> manipulation should still be declared in the commit message.
> 
> One off generators which create a boilerplate file which the author then
> fills in, are acceptable if their output has clear copyright and license
> status. This could be where a contributor writes a throwaway python
> script to automate creation of some mundane piece of code for example.
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>  docs/devel/code-provenance.rst | 55 ++++++++++++++++++++++++++++++++++
>  1 file changed, 55 insertions(+)
> 
> diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
> index 7c42fae571..eabb3e7c08 100644
> --- a/docs/devel/code-provenance.rst
> +++ b/docs/devel/code-provenance.rst
> @@ -210,3 +210,58 @@ mailing list.
>  It is also recommended to attempt to contact the original author to let them
>  know you are interested in taking over their work, in case they still intended
>  to return to the work, or had any suggestions about the best way to continue.
> +
> +Inclusion of generated files
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Files in patches contributed to QEMU are generally expected to be provided
> +only in the preferred format for making modifications. The implication of
> +this is that the output of code generators or compilers is usually not
> +appropriate to contribute to QEMU.
> +
> +For reasons of practicality there are some exceptions to this rule, where
> +generated code is permitted, provided it is also accompanied by the
> +corresponding preferred source format. This is done where it is impractical
> +to expect those building QEMU to run the code generation or compilation
> +process. A non-exhustive list of examples is:
> +
> + * Images: where an bitmap image is created from a vector file it is common
> +   to include the rendered bitmaps at desired resolution(s), since subtle
> +   changes in the rasterization process / tools may affect quality. The
> +   original vector file is expected to accompany any generated bitmaps.
> +
> + * Firmware: QEMU includes pre-compiled binary ROMs for a variety of guest
> +   firmwares. When such binary ROMs are contributed, the corresponding source
> +   must also be provided, either directly, or through a git submodule link.
> +
> + * Dockerfiles: the majority of the dockerfiles are automatically generated
> +   from a canonical list of build dependencies maintained in tree, together
> +   with the libvirt-ci git submodule link. The generated dockerfiles are
> +   included in tree because it is desirable to be able to directly build
> +   container images from a clean git checkout.
> +
> + * EBPF: QEMU includes some generated EBPF machine code, since the required
> +   eBPF compilation tools are not broadly available on all targetted OS
> +   distributions. The corresponding eBPF C code for the binary is also
> +   provided. This is a time limited exception until the eBPF toolchain is
> +   sufficiently broadly available in distros.
> +
> +In all cases above, the existence of generated files must be acknowledged
> +and justified in the commit that introduces them.
> +
> +Tools which perform changes to existing code with deterministic algorithmic
> +manipulation, driven by user specified inputs, are not generally considered
> +to be "generators".
> +
> +IOW, using coccinelle to convert code from one pattern to another pattern, or
> +fixing docs typos with a spell checker, or transforming code using sed / awk /
> +etc, are not considered to be acts of code generation. Where an automated
> +manipulation is performed on code, however, this should be declared in the
> +commit message.
> +
> +At times contributors may use or create scripts/tools to generate an initial
> +boilerplate code template which is then filled in to produce the final patch.
> +The output of such a tool would still be considered the "preferred format",
> +since it is intended to be a foundation for further human authored changes.
> +Such tools are acceptable to use, provided they follow a deterministic process
> +and there is clearly defined copyright and licensing for their output.

GPL seems sufficiently clear on the matter:
The source code for a work means the preferred form of the work for making modifications to it. 

Do we really need to play lawyer?

> -- 
> 2.43.0



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/3] docs: define policy limiting the inclusion of generated files
  2024-05-16 17:04   ` Michael S. Tsirkin
@ 2024-05-17 10:51     ` Daniel P. Berrangé
  0 siblings, 0 replies; 23+ messages in thread
From: Daniel P. Berrangé @ 2024-05-17 10:51 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, Thomas Huth, Alex Bennée, Gerd Hoffmann,
	Mark Cave-Ayland, Philippe Mathieu-Daudé, Kevin Wolf,
	Stefan Hajnoczi, Alexander Graf, Paolo Bonzini, Richard Henderson,
	Peter Maydell, Markus Armbruster

On Thu, May 16, 2024 at 01:04:42PM -0400, Michael S. Tsirkin wrote:
> On Thu, May 16, 2024 at 05:22:29PM +0100, Daniel P. Berrangé wrote:
> > Files contributed to QEMU are generally expected to be provided in the
> > preferred format for manipulation. IOW, we generally don't expect to
> > have generated / compiled code included in the tree, rather, we expect
> > to run the code generator / compiler as part of the build process.
> > 
> > There are some obvious exceptions to this seen in our existing tree, the
> > biggest one being the inclusion of many binary firmware ROMs. A more
> > niche example is the inclusion of a generated eBPF program. Or the CI
> > dockerfiles which are mostly auto-generated. In these cases, however,
> > the preferred format source code is still required to be included,
> > alongside the generated output.
> > 
> > Tools which perform user defined algorithmic transformations on code are
> > not considered to be "code generators". ie, we permit use of coccinelle,
> > spell checkers, and sed/awk/etc to manipulate code. Such use of automated
> > manipulation should still be declared in the commit message.
> > 
> > One off generators which create a boilerplate file which the author then
> > fills in, are acceptable if their output has clear copyright and license
> > status. This could be where a contributor writes a throwaway python
> > script to automate creation of some mundane piece of code for example.
> > 
> > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > ---
> >  docs/devel/code-provenance.rst | 55 ++++++++++++++++++++++++++++++++++
> >  1 file changed, 55 insertions(+)
> > 
> > diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
> > index 7c42fae571..eabb3e7c08 100644
> > --- a/docs/devel/code-provenance.rst
> > +++ b/docs/devel/code-provenance.rst
> > @@ -210,3 +210,58 @@ mailing list.
> >  It is also recommended to attempt to contact the original author to let them
> >  know you are interested in taking over their work, in case they still intended
> >  to return to the work, or had any suggestions about the best way to continue.
> > +
> > +Inclusion of generated files
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +Files in patches contributed to QEMU are generally expected to be provided
> > +only in the preferred format for making modifications. The implication of
> > +this is that the output of code generators or compilers is usually not
> > +appropriate to contribute to QEMU.
> > +
> > +For reasons of practicality there are some exceptions to this rule, where
> > +generated code is permitted, provided it is also accompanied by the
> > +corresponding preferred source format. This is done where it is impractical
> > +to expect those building QEMU to run the code generation or compilation
> > +process. A non-exhustive list of examples is:
> > +
> > + * Images: where an bitmap image is created from a vector file it is common
> > +   to include the rendered bitmaps at desired resolution(s), since subtle
> > +   changes in the rasterization process / tools may affect quality. The
> > +   original vector file is expected to accompany any generated bitmaps.
> > +
> > + * Firmware: QEMU includes pre-compiled binary ROMs for a variety of guest
> > +   firmwares. When such binary ROMs are contributed, the corresponding source
> > +   must also be provided, either directly, or through a git submodule link.
> > +
> > + * Dockerfiles: the majority of the dockerfiles are automatically generated
> > +   from a canonical list of build dependencies maintained in tree, together
> > +   with the libvirt-ci git submodule link. The generated dockerfiles are
> > +   included in tree because it is desirable to be able to directly build
> > +   container images from a clean git checkout.
> > +
> > + * EBPF: QEMU includes some generated EBPF machine code, since the required
> > +   eBPF compilation tools are not broadly available on all targetted OS
> > +   distributions. The corresponding eBPF C code for the binary is also
> > +   provided. This is a time limited exception until the eBPF toolchain is
> > +   sufficiently broadly available in distros.
> > +
> > +In all cases above, the existence of generated files must be acknowledged
> > +and justified in the commit that introduces them.
> > +
> > +Tools which perform changes to existing code with deterministic algorithmic
> > +manipulation, driven by user specified inputs, are not generally considered
> > +to be "generators".
> > +
> > +IOW, using coccinelle to convert code from one pattern to another pattern, or
> > +fixing docs typos with a spell checker, or transforming code using sed / awk /
> > +etc, are not considered to be acts of code generation. Where an automated
> > +manipulation is performed on code, however, this should be declared in the
> > +commit message.
> > +
> > +At times contributors may use or create scripts/tools to generate an initial
> > +boilerplate code template which is then filled in to produce the final patch.
> > +The output of such a tool would still be considered the "preferred format",
> > +since it is intended to be a foundation for further human authored changes.
> > +Such tools are acceptable to use, provided they follow a deterministic process
> > +and there is clearly defined copyright and licensing for their output.
> 
> GPL seems sufficiently clear on the matter:
> The source code for a work means the preferred form of the work for making modifications to it.

That's a different scenario.

That GPL clause applies to someone distributing the QEMU binaries. They
must make the corresponding source available in the "preferred form",
which would imply the form that the QEMU project released originally
in its source tarballs.

This doesn't say that QEMU maintainers have to choose a particular type
of source as the preferred one. We could easily choose the output of
"bison" as our preferred form of a given source file if we so wished, and
the GPL has no opinion on that matter. Just that downstream distributors
have to stick with the preferred source that QEMU originally chose.

This doc is about guiding our own contributors on what QEMU should pick
as the preferred form for source we release.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/3] docs: define policy limiting the inclusion of generated files
  2024-05-16 16:22 ` [PATCH v2 2/3] docs: define policy limiting the inclusion of generated files Daniel P. Berrangé
  2024-05-16 17:04   ` Michael S. Tsirkin
@ 2024-05-17 18:23   ` Alex Bennée
  2024-05-28 15:41   ` Kevin Wolf
  2 siblings, 0 replies; 23+ messages in thread
From: Alex Bennée @ 2024-05-17 18:23 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Thomas Huth, Michael S. Tsirkin, Gerd Hoffmann,
	Mark Cave-Ayland, Philippe Mathieu-Daudé, Kevin Wolf,
	Stefan Hajnoczi, Alexander Graf, Paolo Bonzini, Richard Henderson,
	Peter Maydell, Markus Armbruster

Daniel P. Berrangé <berrange@redhat.com> writes:

<snip>
> +
> +IOW, using coccinelle to convert code from one pattern to another pattern, or
> +fixing docs typos with a spell checker, or transforming code using sed / awk /
> +etc, are not considered to be acts of code generation. Where an automated
> +manipulation is performed on code, however, this should be declared in the
> +commit message.

Lets avoid IRC speak in documents (s/IOW/In other words/), otherwise:

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>


> +
> +At times contributors may use or create scripts/tools to generate an initial
> +boilerplate code template which is then filled in to produce the final patch.
> +The output of such a tool would still be considered the "preferred format",
> +since it is intended to be a foundation for further human authored changes.
> +Such tools are acceptable to use, provided they follow a deterministic process
> +and there is clearly defined copyright and licensing for their output.

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/3] docs: define policy limiting the inclusion of generated files
  2024-05-16 16:22 ` [PATCH v2 2/3] docs: define policy limiting the inclusion of generated files Daniel P. Berrangé
  2024-05-16 17:04   ` Michael S. Tsirkin
  2024-05-17 18:23   ` Alex Bennée
@ 2024-05-28 15:41   ` Kevin Wolf
  2 siblings, 0 replies; 23+ messages in thread
From: Kevin Wolf @ 2024-05-28 15:41 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Thomas Huth, Alex Bennée, Michael S. Tsirkin,
	Gerd Hoffmann, Mark Cave-Ayland, Philippe Mathieu-Daudé,
	Stefan Hajnoczi, Alexander Graf, Paolo Bonzini, Richard Henderson,
	Peter Maydell, Markus Armbruster

Am 16.05.2024 um 18:22 hat Daniel P. Berrangé geschrieben:
> Files contributed to QEMU are generally expected to be provided in the
> preferred format for manipulation. IOW, we generally don't expect to
> have generated / compiled code included in the tree, rather, we expect
> to run the code generator / compiler as part of the build process.
> 
> There are some obvious exceptions to this seen in our existing tree, the
> biggest one being the inclusion of many binary firmware ROMs. A more
> niche example is the inclusion of a generated eBPF program. Or the CI
> dockerfiles which are mostly auto-generated. In these cases, however,
> the preferred format source code is still required to be included,
> alongside the generated output.
> 
> Tools which perform user defined algorithmic transformations on code are
> not considered to be "code generators". ie, we permit use of coccinelle,
> spell checkers, and sed/awk/etc to manipulate code. Such use of automated
> manipulation should still be declared in the commit message.
> 
> One off generators which create a boilerplate file which the author then
> fills in, are acceptable if their output has clear copyright and license
> status. This could be where a contributor writes a throwaway python
> script to automate creation of some mundane piece of code for example.
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>  docs/devel/code-provenance.rst | 55 ++++++++++++++++++++++++++++++++++
>  1 file changed, 55 insertions(+)
> 
> diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
> index 7c42fae571..eabb3e7c08 100644
> --- a/docs/devel/code-provenance.rst
> +++ b/docs/devel/code-provenance.rst
> @@ -210,3 +210,58 @@ mailing list.
>  It is also recommended to attempt to contact the original author to let them
>  know you are interested in taking over their work, in case they still intended
>  to return to the work, or had any suggestions about the best way to continue.
> +
> +Inclusion of generated files
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Files in patches contributed to QEMU are generally expected to be provided
> +only in the preferred format for making modifications. The implication of
> +this is that the output of code generators or compilers is usually not
> +appropriate to contribute to QEMU.
> +
> +For reasons of practicality there are some exceptions to this rule, where
> +generated code is permitted, provided it is also accompanied by the
> +corresponding preferred source format. This is done where it is impractical
> +to expect those building QEMU to run the code generation or compilation
> +process. A non-exhustive list of examples is:
> +
> + * Images: where an bitmap image is created from a vector file it is common
> +   to include the rendered bitmaps at desired resolution(s), since subtle
> +   changes in the rasterization process / tools may affect quality. The
> +   original vector file is expected to accompany any generated bitmaps.
> +
> + * Firmware: QEMU includes pre-compiled binary ROMs for a variety of guest
> +   firmwares. When such binary ROMs are contributed, the corresponding source
> +   must also be provided, either directly, or through a git submodule link.
> +
> + * Dockerfiles: the majority of the dockerfiles are automatically generated
> +   from a canonical list of build dependencies maintained in tree, together
> +   with the libvirt-ci git submodule link. The generated dockerfiles are
> +   included in tree because it is desirable to be able to directly build
> +   container images from a clean git checkout.
> +
> + * EBPF: QEMU includes some generated EBPF machine code, since the required
> +   eBPF compilation tools are not broadly available on all targetted OS
> +   distributions. The corresponding eBPF C code for the binary is also
> +   provided. This is a time limited exception until the eBPF toolchain is
> +   sufficiently broadly available in distros.

This paragraph is inconsistent with the spelling of "EBPF"/"eBPF".

Kevin



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 3/3] docs: define policy forbidding use of AI code generators
  2024-05-16 16:22 [PATCH v2 0/3] docs: define policy forbidding use of "AI" / LLM code generators Daniel P. Berrangé
  2024-05-16 16:22 ` [PATCH v2 1/3] docs: introduce dedicated page about code provenance / sign-off Daniel P. Berrangé
  2024-05-16 16:22 ` [PATCH v2 2/3] docs: define policy limiting the inclusion of generated files Daniel P. Berrangé
@ 2024-05-16 16:22 ` Daniel P. Berrangé
  2024-05-16 17:11   ` Michael S. Tsirkin
  2024-05-16 17:20 ` [PATCH v2 0/3] docs: define policy forbidding use of "AI" / LLM " Michael S. Tsirkin
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 23+ messages in thread
From: Daniel P. Berrangé @ 2024-05-16 16:22 UTC (permalink / raw)
  To: qemu-devel
  Cc: Thomas Huth, Alex Bennée, Michael S. Tsirkin, Gerd Hoffmann,
	Mark Cave-Ayland, Philippe Mathieu-Daudé, Kevin Wolf,
	Daniel P. Berrangé, Stefan Hajnoczi, Alexander Graf,
	Paolo Bonzini, Richard Henderson, Peter Maydell,
	Markus Armbruster

There has been an explosion of interest in so called AI code generators
in the past year or two. Thus far though, this is has not been matched
by a broadly accepted legal interpretation of the licensing implications
for code generator outputs. While the vendors may claim there is no
problem and a free choice of license is possible, they have an inherent
conflict of interest in promoting this interpretation. More broadly
there is, as yet, no broad consensus on the licensing implications of
code generators trained on inputs under a wide variety of licenses

The DCO requires contributors to assert they have the right to
contribute under the designated project license. Given the lack of
consensus on the licensing of AI code generator output, it is not
considered credible to assert compliance with the DCO clause (b) or (c)
where a patch includes such generated code.

This patch thus defines a policy that the QEMU project will currently
not accept contributions where use of AI code generators is either
known, or suspected.

This merely reflects the current uncertainty of the field, and should
this situation change, the policy is of course subject to future
relaxation. Meanwhile requests for exceptions can also be considered on
a case by case basis.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 docs/devel/code-provenance.rst | 50 +++++++++++++++++++++++++++++++++-
 1 file changed, 49 insertions(+), 1 deletion(-)

diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
index eabb3e7c08..846dda9a35 100644
--- a/docs/devel/code-provenance.rst
+++ b/docs/devel/code-provenance.rst
@@ -264,4 +264,52 @@ boilerplate code template which is then filled in to produce the final patch.
 The output of such a tool would still be considered the "preferred format",
 since it is intended to be a foundation for further human authored changes.
 Such tools are acceptable to use, provided they follow a deterministic process
-and there is clearly defined copyright and licensing for their output.
+and there is clearly defined copyright and licensing for their output. Note
+in particular the caveats applying to AI code generators below.
+
+Use of AI code generators
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+TL;DR:
+
+  **Current QEMU project policy is to DECLINE any contributions which are
+  believed to include or derive from AI generated code. This includes ChatGPT,
+  CoPilot, Llama and similar tools**
+
+The increasing prevalence of AI code generators, most notably but not limited
+to, `Large Language Models <https://en.wikipedia.org/wiki/Large_language_model>`__
+(LLMs) results in a number of difficult legal questions and risks for software
+projects, including QEMU.
+
+The QEMU community requires that contributors certify their patch submissions
+are made in accordance with the rules of the :ref:`dco` (DCO).
+
+To satisfy the DCO, the patch contributor has to fully understand the
+copyright and license status of code they are contributing to QEMU. With AI
+code generators, the copyright and license status of the output is ill-defined
+with no generally accepted, settled legal foundation.
+
+Where the training material is known, it is common for it to include large
+volumes of material under restrictive licensing/copyright terms. Even where
+the training material is all known to be under open source licenses, it is
+likely to be under a variety of terms, not all of which will be compatible
+with QEMU's licensing requirements.
+
+With this in mind, the QEMU project does not consider it is currently possible
+for contributors to comply with DCO terms (b) or (c) for the output of commonly
+available AI code generators.
+
+The QEMU maintainers thus require that contributors refrain from using AI code
+generators on patches intended to be submitted to the project, and will
+decline any contribution if use of AI is either known or suspected.
+
+Examples of tools impacted by this policy includes both GitHub's CoPilot,
+OpenAI's ChatGPT, and Meta's Code Llama, amongst many others which are less
+well known.
+
+This policy may evolve as the legal situation is clarifed. In the meanwhile,
+requests for exceptions to this policy will be evaluated by the QEMU project
+on a case by case basis. To be granted an exception, a contributor will need
+to demonstrate clarity of the license and copyright status for the tool's
+output in relation to its training model and code, to the satisfaction of the
+project maintainers.
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 3/3] docs: define policy forbidding use of AI code generators
  2024-05-16 16:22 ` [PATCH v2 3/3] docs: define policy forbidding use of AI code generators Daniel P. Berrangé
@ 2024-05-16 17:11   ` Michael S. Tsirkin
  2024-05-17 10:57     ` Daniel P. Berrangé
  0 siblings, 1 reply; 23+ messages in thread
From: Michael S. Tsirkin @ 2024-05-16 17:11 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Thomas Huth, Alex Bennée, Gerd Hoffmann,
	Mark Cave-Ayland, Philippe Mathieu-Daudé, Kevin Wolf,
	Stefan Hajnoczi, Alexander Graf, Paolo Bonzini, Richard Henderson,
	Peter Maydell, Markus Armbruster

On Thu, May 16, 2024 at 05:22:30PM +0100, Daniel P. Berrangé wrote:
> There has been an explosion of interest in so called AI code generators
> in the past year or two. Thus far though, this is has not been matched
> by a broadly accepted legal interpretation of the licensing implications
> for code generator outputs. While the vendors may claim there is no
> problem and a free choice of license is possible, they have an inherent
> conflict of interest in promoting this interpretation. More broadly
> there is, as yet, no broad consensus on the licensing implications of
> code generators trained on inputs under a wide variety of licenses
> 
> The DCO requires contributors to assert they have the right to
> contribute under the designated project license. Given the lack of
> consensus on the licensing of AI code generator output, it is not
> considered credible to assert compliance with the DCO clause (b) or (c)
> where a patch includes such generated code.
> 
> This patch thus defines a policy that the QEMU project will currently
> not accept contributions where use of AI code generators is either
> known, or suspected.
> 
> This merely reflects the current uncertainty of the field, and should
> this situation change, the policy is of course subject to future
> relaxation. Meanwhile requests for exceptions can also be considered on
> a case by case basis.
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>  docs/devel/code-provenance.rst | 50 +++++++++++++++++++++++++++++++++-
>  1 file changed, 49 insertions(+), 1 deletion(-)
> 
> diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
> index eabb3e7c08..846dda9a35 100644
> --- a/docs/devel/code-provenance.rst
> +++ b/docs/devel/code-provenance.rst
> @@ -264,4 +264,52 @@ boilerplate code template which is then filled in to produce the final patch.
>  The output of such a tool would still be considered the "preferred format",
>  since it is intended to be a foundation for further human authored changes.
>  Such tools are acceptable to use, provided they follow a deterministic process
> -and there is clearly defined copyright and licensing for their output.
> +and there is clearly defined copyright and licensing for their output. Note
> +in particular the caveats applying to AI code generators below.
> +
> +Use of AI code generators
> +~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +TL;DR:
> +
> +  **Current QEMU project policy is to DECLINE any contributions which are
> +  believed to include or derive from AI generated code. This includes ChatGPT,
> +  CoPilot, Llama and similar tools**
> +
> +The increasing prevalence of AI code generators, most notably but not limited
> +to, `Large Language Models <https://en.wikipedia.org/wiki/Large_language_model>`__
> +(LLMs) results in a number of difficult legal questions and risks for software
> +projects, including QEMU.
> +
> +The QEMU community requires that contributors certify their patch submissions
> +are made in accordance with the rules of the :ref:`dco` (DCO).
> +
> +To satisfy the DCO, the patch contributor has to fully understand the
> +copyright and license status of code they are contributing to QEMU. With AI
> +code generators, the copyright and license status of the output is ill-defined
> +with no generally accepted, settled legal foundation.
> +
> +Where the training material is known, it is common for it to include large
> +volumes of material under restrictive licensing/copyright terms. Even where
> +the training material is all known to be under open source licenses, it is
> +likely to be under a variety of terms, not all of which will be compatible
> +with QEMU's licensing requirements.
> +
> +With this in mind, the QEMU project does not consider it is currently possible
> +for contributors to comply with DCO terms (b) or (c) for the output of commonly
> +available AI code generators.
> +
> +The QEMU maintainers thus require that contributors refrain from using AI code
> +generators on patches intended to be submitted to the project, and will
> +decline any contribution if use of AI is either known or suspected.
> +
> +Examples of tools impacted by this policy includes both GitHub's CoPilot,
> +OpenAI's ChatGPT, and Meta's Code Llama, amongst many others which are less
> +well known.
> +
> +This policy may evolve as the legal situation is clarifed. In the meanwhile,
> +requests for exceptions to this policy will be evaluated by the QEMU project
> +on a case by case basis. To be granted an exception, a contributor will need
> +to demonstrate clarity of the license and copyright status for the tool's
> +output in relation to its training model and code, to the satisfaction of the
> +project maintainers.

I would definitely want more contributors to pass their
comments and commit logs though a grammar checker.
It's unclear to me whether the contributors would
be required to know whether the checker in question is
considered "AI" or not.




> -- 
> 2.43.0



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 3/3] docs: define policy forbidding use of AI code generators
  2024-05-16 17:11   ` Michael S. Tsirkin
@ 2024-05-17 10:57     ` Daniel P. Berrangé
  0 siblings, 0 replies; 23+ messages in thread
From: Daniel P. Berrangé @ 2024-05-17 10:57 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: qemu-devel, Thomas Huth, Alex Bennée, Gerd Hoffmann,
	Mark Cave-Ayland, Philippe Mathieu-Daudé, Kevin Wolf,
	Stefan Hajnoczi, Alexander Graf, Paolo Bonzini, Richard Henderson,
	Peter Maydell, Markus Armbruster

On Thu, May 16, 2024 at 01:11:26PM -0400, Michael S. Tsirkin wrote:
> On Thu, May 16, 2024 at 05:22:30PM +0100, Daniel P. Berrangé wrote:
> > There has been an explosion of interest in so called AI code generators
> > in the past year or two. Thus far though, this is has not been matched
> > by a broadly accepted legal interpretation of the licensing implications
> > for code generator outputs. While the vendors may claim there is no
> > problem and a free choice of license is possible, they have an inherent
> > conflict of interest in promoting this interpretation. More broadly
> > there is, as yet, no broad consensus on the licensing implications of
> > code generators trained on inputs under a wide variety of licenses
> > 
> > The DCO requires contributors to assert they have the right to
> > contribute under the designated project license. Given the lack of
> > consensus on the licensing of AI code generator output, it is not
> > considered credible to assert compliance with the DCO clause (b) or (c)
> > where a patch includes such generated code.
> > 
> > This patch thus defines a policy that the QEMU project will currently
> > not accept contributions where use of AI code generators is either
> > known, or suspected.
> > 
> > This merely reflects the current uncertainty of the field, and should
> > this situation change, the policy is of course subject to future
> > relaxation. Meanwhile requests for exceptions can also be considered on
> > a case by case basis.
> > 
> > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > ---
> >  docs/devel/code-provenance.rst | 50 +++++++++++++++++++++++++++++++++-
> >  1 file changed, 49 insertions(+), 1 deletion(-)
> > 
> > diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst
> > index eabb3e7c08..846dda9a35 100644
> > --- a/docs/devel/code-provenance.rst
> > +++ b/docs/devel/code-provenance.rst
> > @@ -264,4 +264,52 @@ boilerplate code template which is then filled in to produce the final patch.
> >  The output of such a tool would still be considered the "preferred format",
> >  since it is intended to be a foundation for further human authored changes.
> >  Such tools are acceptable to use, provided they follow a deterministic process
> > -and there is clearly defined copyright and licensing for their output.
> > +and there is clearly defined copyright and licensing for their output. Note
> > +in particular the caveats applying to AI code generators below.
> > +
> > +Use of AI code generators
> > +~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +TL;DR:
> > +
> > +  **Current QEMU project policy is to DECLINE any contributions which are
> > +  believed to include or derive from AI generated code. This includes ChatGPT,
> > +  CoPilot, Llama and similar tools**
> > +
> > +The increasing prevalence of AI code generators, most notably but not limited
> > +to, `Large Language Models <https://en.wikipedia.org/wiki/Large_language_model>`__
> > +(LLMs) results in a number of difficult legal questions and risks for software
> > +projects, including QEMU.
> > +
> > +The QEMU community requires that contributors certify their patch submissions
> > +are made in accordance with the rules of the :ref:`dco` (DCO).
> > +
> > +To satisfy the DCO, the patch contributor has to fully understand the
> > +copyright and license status of code they are contributing to QEMU. With AI
> > +code generators, the copyright and license status of the output is ill-defined
> > +with no generally accepted, settled legal foundation.
> > +
> > +Where the training material is known, it is common for it to include large
> > +volumes of material under restrictive licensing/copyright terms. Even where
> > +the training material is all known to be under open source licenses, it is
> > +likely to be under a variety of terms, not all of which will be compatible
> > +with QEMU's licensing requirements.
> > +
> > +With this in mind, the QEMU project does not consider it is currently possible
> > +for contributors to comply with DCO terms (b) or (c) for the output of commonly
> > +available AI code generators.
> > +
> > +The QEMU maintainers thus require that contributors refrain from using AI code
> > +generators on patches intended to be submitted to the project, and will
> > +decline any contribution if use of AI is either known or suspected.
> > +
> > +Examples of tools impacted by this policy includes both GitHub's CoPilot,
> > +OpenAI's ChatGPT, and Meta's Code Llama, amongst many others which are less
> > +well known.
> > +
> > +This policy may evolve as the legal situation is clarifed. In the meanwhile,
> > +requests for exceptions to this policy will be evaluated by the QEMU project
> > +on a case by case basis. To be granted an exception, a contributor will need
> > +to demonstrate clarity of the license and copyright status for the tool's
> > +output in relation to its training model and code, to the satisfaction of the
> > +project maintainers.
> 
> I would definitely want more contributors to pass their
> comments and commit logs though a grammar checker.

Its a double edged sword. If someone's grammer is sufficiently bad to
need correcting, will a machine actually suggest an alternative phrasing
that accurately represents the author's intent, or will it result in
something more misleading, and will they realize the difference ?

Each to their own, but I'd prefer to see the author's own original words,
and query them myself if unclear.

> It's unclear to me whether the contributors would
> be required to know whether the checker in question is
> considered "AI" or not.

I don't think we're expecting users to go searching out fine details of
the internals of tools. Just use their best judgement based on easily
identifiable information.

IOW, if something openly advertizes itself as being "AI" driven, that's
a bad choice to use for QEMU, but if something is secretly using AI
internally without being advertized, we wouldn't/shouldn't blame users
if they aren't aware of this fact.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/3] docs: define policy forbidding use of "AI" / LLM code generators
  2024-05-16 16:22 [PATCH v2 0/3] docs: define policy forbidding use of "AI" / LLM code generators Daniel P. Berrangé
                   ` (2 preceding siblings ...)
  2024-05-16 16:22 ` [PATCH v2 3/3] docs: define policy forbidding use of AI code generators Daniel P. Berrangé
@ 2024-05-16 17:20 ` Michael S. Tsirkin
  2024-05-16 17:34   ` Peter Maydell
  2024-05-21 14:27 ` Stefan Hajnoczi
  2024-05-28 15:41 ` Kevin Wolf
  5 siblings, 1 reply; 23+ messages in thread
From: Michael S. Tsirkin @ 2024-05-16 17:20 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Thomas Huth, Alex Bennée, Gerd Hoffmann,
	Mark Cave-Ayland, Philippe Mathieu-Daudé, Kevin Wolf,
	Stefan Hajnoczi, Alexander Graf, Paolo Bonzini, Richard Henderson,
	Peter Maydell, Markus Armbruster

On Thu, May 16, 2024 at 05:22:27PM +0100, Daniel P. Berrangé wrote:
> This patch kicks the hornet's nest of AI / LLM code generators.
> 
> With the increasing interest in code generators in recent times,
> it is inevitable that QEMU contributions will include AI generated
> code. Thus far we have remained silent on the matter. Given that
> everyone knows these tools exist, our current position has to be
> considered tacit acceptance of the use of AI generated code in QEMU.
> 
> The question for the project is whether that is a good position for
> QEMU to take or not ?
> 
> IANAL, but I like to think I'm reasonably proficient at understanding
> open source licensing. I am not inherantly against the use of AI tools,
> rather I am anti-risk. I also want to see OSS licenses respected and
> complied with.
> 
> AFAICT at its current state of (im)maturity the question of licensing
> of AI code generator output does not have a broadly accepted / settled
> legal position. This is an inherant bias/self-interest from the vendors
> promoting their usage, who tend to minimize/dismiss the legal questions.
> >From my POV, this puts such tools in a position of elevated legal risk.
> 
> Given the fuzziness over the legal position of generated code from
> such tools, I don't consider it credible (today) for a contributor
> to assert compliance with the DCO terms (b) or (c) (which is a stated
> pre-requisite for QEMU accepting patches) when a patch includes (or is
> derived from) AI generated code.
> 
> By implication, I think that QEMU must (for now) explicitly decline
> to (knowingly) accept AI generated code.
> 
> Perhaps a few years down the line the legal uncertainty will have
> reduced and we can re-evaluate this policy.
> 
> Discuss...

At this junction, the code generated by these tools is of such
quality that I really won't expect it to pass even cursory code
review.


So for now, I propose adding a single paragraph:

 If you wrote the patch, make sure your "From:" and "Signed-off-by:"
 lines use the same spelling. It's okay if you subscribe or contribute to
 the list via more than one address, but using multiple addresses in one
 commit just confuses things. If someone else wrote the patch, git will
 include a "From:" line in the body of the email (different from your
 envelope From:) that will give credit to the correct author; but again,
 that author's Signed-off-by: line is mandatory, with the same spelling.

+Q: I prompted ChatGPT/Copilot/Llama and it wrote
+   the patch for me. Can I submit it and how do I sign it?
+A: Your patch is likely trash or trivial. Please write your own code.






> Changes in v2:
> 
>  * Fix a huge number of typos in docs
>  * Clarify that maintainers should still add R-b where relevant, even
>    if they are already adding their own S-oB.
>  * Clarify situation when contributor re-starts previously abandoned
>    work from another contributor.
>  * Add info about Suggested-by tag
>  * Add new docs section dealing with the broad topic of "generated
>    files" (whether code generators or compilers)
>  * Simplify the section related to prohibition of AI generated files
>    and give further examples of tools considered covered
>  * Remove repeated references to "LLM" as a specific technology, just
>    use the broad "AI" term, except for one use of LLM as an example.
>  * Add note that the policy may evolve if the legal clarity improves
>  * Add note that exceptions can be requested on case-by-case basis
>    if contributor thinks they can demonstrate a credible copyright
>    and licensing status
> 
> Daniel P. Berrangé (3):
>   docs: introduce dedicated page about code provenance / sign-off
>   docs: define policy limiting the inclusion of generated files
>   docs: define policy forbidding use of AI code generators
> 
>  docs/devel/code-provenance.rst    | 315 ++++++++++++++++++++++++++++++
>  docs/devel/index-process.rst      |   1 +
>  docs/devel/submitting-a-patch.rst |  19 +-
>  3 files changed, 318 insertions(+), 17 deletions(-)
>  create mode 100644 docs/devel/code-provenance.rst
> 
> -- 
> 2.43.0



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/3] docs: define policy forbidding use of "AI" / LLM code generators
  2024-05-16 17:20 ` [PATCH v2 0/3] docs: define policy forbidding use of "AI" / LLM " Michael S. Tsirkin
@ 2024-05-16 17:34   ` Peter Maydell
  2024-05-16 17:36     ` Michael S. Tsirkin
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Maydell @ 2024-05-16 17:34 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Daniel P. Berrangé, qemu-devel, Thomas Huth,
	Alex Bennée, Gerd Hoffmann, Mark Cave-Ayland,
	Philippe Mathieu-Daudé, Kevin Wolf, Stefan Hajnoczi,
	Alexander Graf, Paolo Bonzini, Richard Henderson,
	Markus Armbruster

On Thu, 16 May 2024 at 18:20, Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Thu, May 16, 2024 at 05:22:27PM +0100, Daniel P. Berrangé wrote:
> > AFAICT at its current state of (im)maturity the question of licensing
> > of AI code generator output does not have a broadly accepted / settled
> > legal position. This is an inherant bias/self-interest from the vendors
> > promoting their usage, who tend to minimize/dismiss the legal questions.
> > >From my POV, this puts such tools in a position of elevated legal risk.
> >
> > Given the fuzziness over the legal position of generated code from
> > such tools, I don't consider it credible (today) for a contributor
> > to assert compliance with the DCO terms (b) or (c) (which is a stated
> > pre-requisite for QEMU accepting patches) when a patch includes (or is
> > derived from) AI generated code.
> >
> > By implication, I think that QEMU must (for now) explicitly decline
> > to (knowingly) accept AI generated code.
> >
> > Perhaps a few years down the line the legal uncertainty will have
> > reduced and we can re-evaluate this policy.

> At this junction, the code generated by these tools is of such
> quality that I really won't expect it to pass even cursory code
> review.

I disagree, I think that in at least some cases they can
produce code that would pass our quality bar, especially with
human supervision and editing after the fact. If the problem
was merely "LLMs tend to produce lousy output" then we wouldn't
need to write anything new -- we already have a process for
dealing with bad patches, which is to say we do code review and
suggest changes or simply reject the patches. What we *don't* have
any process to handle is the legal uncertainties that Dan outlines
above.

-- PMM


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/3] docs: define policy forbidding use of "AI" / LLM code generators
  2024-05-16 17:34   ` Peter Maydell
@ 2024-05-16 17:36     ` Michael S. Tsirkin
  0 siblings, 0 replies; 23+ messages in thread
From: Michael S. Tsirkin @ 2024-05-16 17:36 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Daniel P. Berrangé, qemu-devel, Thomas Huth,
	Alex Bennée, Gerd Hoffmann, Mark Cave-Ayland,
	Philippe Mathieu-Daudé, Kevin Wolf, Stefan Hajnoczi,
	Alexander Graf, Paolo Bonzini, Richard Henderson,
	Markus Armbruster

On Thu, May 16, 2024 at 06:34:13PM +0100, Peter Maydell wrote:
> On Thu, 16 May 2024 at 18:20, Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Thu, May 16, 2024 at 05:22:27PM +0100, Daniel P. Berrangé wrote:
> > > AFAICT at its current state of (im)maturity the question of licensing
> > > of AI code generator output does not have a broadly accepted / settled
> > > legal position. This is an inherant bias/self-interest from the vendors
> > > promoting their usage, who tend to minimize/dismiss the legal questions.
> > > >From my POV, this puts such tools in a position of elevated legal risk.
> > >
> > > Given the fuzziness over the legal position of generated code from
> > > such tools, I don't consider it credible (today) for a contributor
> > > to assert compliance with the DCO terms (b) or (c) (which is a stated
> > > pre-requisite for QEMU accepting patches) when a patch includes (or is
> > > derived from) AI generated code.
> > >
> > > By implication, I think that QEMU must (for now) explicitly decline
> > > to (knowingly) accept AI generated code.
> > >
> > > Perhaps a few years down the line the legal uncertainty will have
> > > reduced and we can re-evaluate this policy.
> 
> > At this junction, the code generated by these tools is of such
> > quality that I really won't expect it to pass even cursory code
> > review.
> 
> I disagree, I think that in at least some cases they can
> produce code that would pass our quality bar, especially with
> human supervision and editing after the fact. If the problem
> was merely "LLMs tend to produce lousy output" then we wouldn't
> need to write anything new -- we already have a process for
> dealing with bad patches, which is to say we do code review and
> suggest changes or simply reject the patches. What we *don't* have
> any process to handle is the legal uncertainties that Dan outlines
> above.
> 
> -- PMM


Maybe I'm bad at prompting ;)

-- 
MST



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/3] docs: define policy forbidding use of "AI" / LLM code generators
  2024-05-16 16:22 [PATCH v2 0/3] docs: define policy forbidding use of "AI" / LLM code generators Daniel P. Berrangé
                   ` (3 preceding siblings ...)
  2024-05-16 17:20 ` [PATCH v2 0/3] docs: define policy forbidding use of "AI" / LLM " Michael S. Tsirkin
@ 2024-05-21 14:27 ` Stefan Hajnoczi
  2024-05-28 15:41 ` Kevin Wolf
  5 siblings, 0 replies; 23+ messages in thread
From: Stefan Hajnoczi @ 2024-05-21 14:27 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Thomas Huth, Alex Bennée, Michael S. Tsirkin,
	Gerd Hoffmann, Mark Cave-Ayland, Philippe Mathieu-Daudé,
	Kevin Wolf, Stefan Hajnoczi, Alexander Graf, Paolo Bonzini,
	Richard Henderson, Peter Maydell, Markus Armbruster

On Thu, 16 May 2024 at 12:23, Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> This patch kicks the hornet's nest of AI / LLM code generators.
>
> With the increasing interest in code generators in recent times,
> it is inevitable that QEMU contributions will include AI generated
> code. Thus far we have remained silent on the matter. Given that
> everyone knows these tools exist, our current position has to be
> considered tacit acceptance of the use of AI generated code in QEMU.
>
> The question for the project is whether that is a good position for
> QEMU to take or not ?
>
> IANAL, but I like to think I'm reasonably proficient at understanding
> open source licensing. I am not inherantly against the use of AI tools,
> rather I am anti-risk. I also want to see OSS licenses respected and
> complied with.
>
> AFAICT at its current state of (im)maturity the question of licensing
> of AI code generator output does not have a broadly accepted / settled
> legal position. This is an inherant bias/self-interest from the vendors
> promoting their usage, who tend to minimize/dismiss the legal questions.
> From my POV, this puts such tools in a position of elevated legal risk.
>
> Given the fuzziness over the legal position of generated code from
> such tools, I don't consider it credible (today) for a contributor
> to assert compliance with the DCO terms (b) or (c) (which is a stated
> pre-requisite for QEMU accepting patches) when a patch includes (or is
> derived from) AI generated code.
>
> By implication, I think that QEMU must (for now) explicitly decline
> to (knowingly) accept AI generated code.
>
> Perhaps a few years down the line the legal uncertainty will have
> reduced and we can re-evaluate this policy.
>
> Discuss...

Although this policy is unenforceable, I think it's a valid position
to take until the legal situation becomes clear.

Acked-by: Stefan Hajnoczi <stefanha@gmail.com>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/3] docs: define policy forbidding use of "AI" / LLM code generators
  2024-05-16 16:22 [PATCH v2 0/3] docs: define policy forbidding use of "AI" / LLM code generators Daniel P. Berrangé
                   ` (4 preceding siblings ...)
  2024-05-21 14:27 ` Stefan Hajnoczi
@ 2024-05-28 15:41 ` Kevin Wolf
  5 siblings, 0 replies; 23+ messages in thread
From: Kevin Wolf @ 2024-05-28 15:41 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, Thomas Huth, Alex Bennée, Michael S. Tsirkin,
	Gerd Hoffmann, Mark Cave-Ayland, Philippe Mathieu-Daudé,
	Stefan Hajnoczi, Alexander Graf, Paolo Bonzini, Richard Henderson,
	Peter Maydell, Markus Armbruster

Am 16.05.2024 um 18:22 hat Daniel P. Berrangé geschrieben:
> This patch kicks the hornet's nest of AI / LLM code generators.
> 
> With the increasing interest in code generators in recent times,
> it is inevitable that QEMU contributions will include AI generated
> code. Thus far we have remained silent on the matter. Given that
> everyone knows these tools exist, our current position has to be
> considered tacit acceptance of the use of AI generated code in QEMU.
> 
> The question for the project is whether that is a good position for
> QEMU to take or not ?
> 
> IANAL, but I like to think I'm reasonably proficient at understanding
> open source licensing. I am not inherantly against the use of AI tools,
> rather I am anti-risk. I also want to see OSS licenses respected and
> complied with.
> 
> AFAICT at its current state of (im)maturity the question of licensing
> of AI code generator output does not have a broadly accepted / settled
> legal position. This is an inherant bias/self-interest from the vendors
> promoting their usage, who tend to minimize/dismiss the legal questions.
> From my POV, this puts such tools in a position of elevated legal risk.
> 
> Given the fuzziness over the legal position of generated code from
> such tools, I don't consider it credible (today) for a contributor
> to assert compliance with the DCO terms (b) or (c) (which is a stated
> pre-requisite for QEMU accepting patches) when a patch includes (or is
> derived from) AI generated code.
> 
> By implication, I think that QEMU must (for now) explicitly decline
> to (knowingly) accept AI generated code.
> 
> Perhaps a few years down the line the legal uncertainty will have
> reduced and we can re-evaluate this policy.
> 
> Discuss...
> 
> Changes in v2:
> 
>  * Fix a huge number of typos in docs
>  * Clarify that maintainers should still add R-b where relevant, even
>    if they are already adding their own S-oB.
>  * Clarify situation when contributor re-starts previously abandoned
>    work from another contributor.
>  * Add info about Suggested-by tag
>  * Add new docs section dealing with the broad topic of "generated
>    files" (whether code generators or compilers)
>  * Simplify the section related to prohibition of AI generated files
>    and give further examples of tools considered covered
>  * Remove repeated references to "LLM" as a specific technology, just
>    use the broad "AI" term, except for one use of LLM as an example.
>  * Add note that the policy may evolve if the legal clarity improves
>  * Add note that exceptions can be requested on case-by-case basis
>    if contributor thinks they can demonstrate a credible copyright
>    and licensing status
> 
> Daniel P. Berrangé (3):
>   docs: introduce dedicated page about code provenance / sign-off
>   docs: define policy limiting the inclusion of generated files
>   docs: define policy forbidding use of AI code generators
> 
>  docs/devel/code-provenance.rst    | 315 ++++++++++++++++++++++++++++++
>  docs/devel/index-process.rst      |   1 +
>  docs/devel/submitting-a-patch.rst |  19 +-
>  3 files changed, 318 insertions(+), 17 deletions(-)
>  create mode 100644 docs/devel/code-provenance.rst

Reviewed-by: Kevin Wolf <kwolf@redhat.com>



^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2024-05-28 15:42 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-16 16:22 [PATCH v2 0/3] docs: define policy forbidding use of "AI" / LLM code generators Daniel P. Berrangé
2024-05-16 16:22 ` [PATCH v2 1/3] docs: introduce dedicated page about code provenance / sign-off Daniel P. Berrangé
2024-05-16 17:29   ` Peter Maydell
2024-05-16 17:34     ` Michael S. Tsirkin
2024-05-16 17:43       ` Peter Maydell
2024-05-17  5:05         ` Thomas Huth
2024-05-17 10:03           ` Daniel P. Berrangé
2024-05-16 17:33   ` Michael S. Tsirkin
2024-05-17 11:09     ` Daniel P. Berrangé
2024-05-17 18:08   ` Alex Bennée
2024-05-16 16:22 ` [PATCH v2 2/3] docs: define policy limiting the inclusion of generated files Daniel P. Berrangé
2024-05-16 17:04   ` Michael S. Tsirkin
2024-05-17 10:51     ` Daniel P. Berrangé
2024-05-17 18:23   ` Alex Bennée
2024-05-28 15:41   ` Kevin Wolf
2024-05-16 16:22 ` [PATCH v2 3/3] docs: define policy forbidding use of AI code generators Daniel P. Berrangé
2024-05-16 17:11   ` Michael S. Tsirkin
2024-05-17 10:57     ` Daniel P. Berrangé
2024-05-16 17:20 ` [PATCH v2 0/3] docs: define policy forbidding use of "AI" / LLM " Michael S. Tsirkin
2024-05-16 17:34   ` Peter Maydell
2024-05-16 17:36     ` Michael S. Tsirkin
2024-05-21 14:27 ` Stefan Hajnoczi
2024-05-28 15:41 ` Kevin Wolf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).