* [PATCH 0/2] docs: define policy forbidding use of "AI" / LLM code generators @ 2023-11-23 11:40 Daniel P. Berrangé 2023-11-23 11:40 ` [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off Daniel P. Berrangé 2023-11-23 11:40 ` [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators Daniel P. Berrangé 0 siblings, 2 replies; 57+ messages in thread From: Daniel P. Berrangé @ 2023-11-23 11:40 UTC (permalink / raw) To: qemu-devel Cc: Richard Henderson, Alexander Graf, Alex Bennée, Paolo Bonzini, Michael S. Tsirkin, Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell, Daniel P. Berrangé This patch kicks the hornet's nest of AI / LLM code generators. With the increasing interest in code generators in recent times, it is inevitable that QEMU contributions will include AI generated code. Thus far we have remained silent on the matter. Given that everyone knows these tools exist, our current position has to be considered tacit acceptance of the use of AI generated code in QEMU. The question for the project is whether that is a good position for QEMU to take or not ? IANAL, but I like to think I'm reasonably proficient at understanding open source licensing. I am not inherantly against the use of AI tools, rather I am anti-risk. I also want to see OSS licenses respected and complied with. AFAICT at its current state of (im)maturity the question of licensing of AI code generator output does not have a broadly accepted / settled legal position. This is an inherant bias/self-interest from the vendors promoting their usage, who tend to minimize/dismiss the legal questions. From my POV, this puts such tools in a position of elevated legal risk. Given the fuzziness over the legal position of generated code from such tools, I don't consider it credible (today) for a contributor to assert compliance with the DCO terms (b) or (c) (which is a stated pre-requisite for QEMU accepting patches) when a patch includes (or is derived from) AI generated code. By implication, I think that QEMU must (for now) explicitly decline to (knowingly) accept AI generated code. Perhaps a few years down the line the legal uncertainty will have reduced and we can re-evaluate this policy. NB I say "knowingly" because as reviewers we do ultimately have to trust what contributors tell us about their patch origins, and this has always been the case. Our policies and the use of the DCO, serve to shift legal risk/exposure away from the project. They let us as a project demonstrate that we took steps to set out our expectations / requirements, and thus any contravention is the responsibility of the contributor invovled, not the project. Discuss... Daniel P. Berrangé (2): docs: introduce dedicated page about code provenance / sign-off docs: define policy forbidding use of "AI" / LLM code generators docs/devel/code-provenance.rst | 237 ++++++++++++++++++++++++++++++ docs/devel/index-process.rst | 1 + docs/devel/submitting-a-patch.rst | 18 +-- 3 files changed, 241 insertions(+), 15 deletions(-) create mode 100644 docs/devel/code-provenance.rst -- 2.41.0 ^ permalink raw reply [flat|nested] 57+ messages in thread
* [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off 2023-11-23 11:40 [PATCH 0/2] docs: define policy forbidding use of "AI" / LLM code generators Daniel P. Berrangé @ 2023-11-23 11:40 ` Daniel P. Berrangé 2023-11-23 11:58 ` Philippe Mathieu-Daudé ` (5 more replies) 2023-11-23 11:40 ` [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators Daniel P. Berrangé 1 sibling, 6 replies; 57+ messages in thread From: Daniel P. Berrangé @ 2023-11-23 11:40 UTC (permalink / raw) To: qemu-devel Cc: Richard Henderson, Alexander Graf, Alex Bennée, Paolo Bonzini, Michael S. Tsirkin, Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell, Daniel P. Berrangé Currently we have a short paragraph saying that patches must include a Signed-off-by line, and merely link to the kernel documentation. The linked kernel docs have alot of content beyond the part about sign-off an thus is misleading/distracting to QEMU contributors. This introduces a dedicated 'code-provenance' page in QEMU talking about why we require sign-off, explaining the other tags we commonly use, and what to do in some edge cases. Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> --- docs/devel/code-provenance.rst | 197 ++++++++++++++++++++++++++++++ docs/devel/index-process.rst | 1 + docs/devel/submitting-a-patch.rst | 18 +-- 3 files changed, 201 insertions(+), 15 deletions(-) create mode 100644 docs/devel/code-provenance.rst diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst new file mode 100644 index 0000000000..b4591a2dec --- /dev/null +++ b/docs/devel/code-provenance.rst @@ -0,0 +1,197 @@ +.. _code-provenance: + +Code provenance +=============== + +Certifying patch submissions +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The QEMU community **mandates** all contributors to certify provenance +of patch submissions they make to the project. To put it another way, +contributors must indicate that they are legally permitted to contribute +to the project. + +Certification is achieved with a low overhead by adding a single line +to the bottom of every git commit:: + + Signed-off-by: YOUR NAME <YOUR@EMAIL> + +This existence of this line asserts that the author of the patch is +contributing in accordance with the `Developer's Certificate of +Origin <https://developercertifcate.org>`__: + +.. _dco: + +:: + Developer's Certificate of Origin 1.1 + + By making a contribution to this project, I certify that: + + (a) The contribution was created in whole or in part by me and I + have the right to submit it under the open source license + indicated in the file; or + + (b) The contribution is based upon previous work that, to the best + of my knowledge, is covered under an appropriate open source + license and I have the right under that license to submit that + work with modifications, whether created in whole or in part + by me, under the same open source license (unless I am + permitted to submit under a different license), as indicated + in the file; or + + (c) The contribution was provided directly to me by some other + person who certified (a), (b) or (c) and I have not modified + it. + + (d) I understand and agree that this project and the contribution + are public and that a record of the contribution (including all + personal information I submit with it, including my sign-off) is + maintained indefinitely and may be redistributed consistent with + this project or the open source license(s) involved. + +It is generally expected that the name and email addresses used in one +of the ``Signed-off-by`` lines, matches that of the git commit ``Author`` +field. If the person sending the mail is also one of the patch authors, +it is further expected that the mail ``From:`` line name & address match +one of the ``Signed-off-by`` lines. + +Multiple authorship +~~~~~~~~~~~~~~~~~~~ + +It is not uncommon for a patch to have contributions from multiple +authors. In such a scenario, a git commit will usually be expected +to have a ``Signed-off-by`` line for each contributor involved in +creatin of the patch. Some edge cases: + + * The non-primary author's contributions were so trivial that + they can be considered not subject to copyright. In this case + the secondary authors need not include a ``Signed-off-by``. + + This case most commonly applies where QEMU reviewers give short + snippets of code as suggested fixes to a patch. The reviewers + don't need to have their own ``Signed-off-by`` added unless + their code suggestion was unusually large. + + * Both contributors work for the same employer and the employer + requires copyright assignment. + + It can be said that in this case a ``Signed-off-by`` is indicating + that the person has permission to contributeo from their employer + who is the copyright holder. It is none the less still preferrable + to include a ``Signed-off-by`` for each contributor, as in some + countries employees are not able to assign copyright to their + employer, and it also covers any time invested outside working + hours. + +Other commit tags +~~~~~~~~~~~~~~~~~ + +While the ``Signed-off-by`` tag is mandatory, there are a number of +other tags that are commonly used during QEMU development + + * **``Reviewed-by``**: when a QEMU community member reviews a patch + on the mailing list, if they consider the patch acceptable, they + should send an email reply containing a ``Reviewed-by`` tag. + + NB: a subsystem maintainer sending a pull request would replace + their own ``Reviewed-by`` with another ``Signed-off-by`` + + * **``Acked-by``**: when a QEMU subsystem maintainer approves a patch + that touches their subsystem, but intends to allow a different + maintainer to queue it and send a pull request, they would send + a mail containing a ``Acked-by`` tag. + + * **``Tested-by``**: when a QEMU community member has functionally + tested the behaviour of the patch in some manner, they should + send an email reply conmtaning a ``Tested-by`` tag. + + * **``Reported-by``**: when a QEMU community member reports a problem + via the mailing list, or some other informal channel that is not + the issue tracker, it is good practice to credit them by including + a ``Reported-by`` tag on any patch fixing the issue. When the + problem is reported via the GitLab issue tracker, however, it is + sufficient to just include a link to the issue. + +Subsystem maintainer requirements +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +When a subsystem maintainer accepts a patch from a contributor, in +addition to the normal code review points, they are expected to validate +the presence of suitable ``Signed-off-by`` tags. + +At the time they queue the patch in their subsystem tree, the maintainer +**MUST** also then add their own ``Signed-off-by`` to indicate that they +have done the aforementioned validation. + +The subsystem maintainer submitting a pull request is **NOT** expected to +have a ``Reviewed-by`` tag on the patch, since this is implied by their +own ``Signed-off-by``. + +Tools for adding ``Signed-of-by`` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +There are a variety of ways tools can support adding ``Signed-off-by`` +tags for patches, avoiding the need for contributors to manually +type in this repetitive text each time. + +git commands +^^^^^^^^^^^^ + +When creating, or amending, a commit the ``-s`` flag to ``git commit`` +will append a suitable line matching the configuring git author +details. + +If preparing patches using the ``git format-patch`` tool, the ``-s`` +flag can be used to append a suitable line in the emails it creates, +without modifying the local commits. Alternatively to modify the +local commits on a branch en-mass:: + + git rebase master -x 'git commit --amend --no-edit -s' + +emacs +^^^^^ + +In the file ``$HOME/.emacs.d/abbrev_defs`` add:: + + (define-abbrev-table 'global-abbrev-table + '( + ("8rev" "Reviewed-by: YOUR NAME <your@email.addr>" nil 1) + ("8ack" "Acked-by: YOUR NAME <your@email.addr>" nil 1) + ("8test" "Tested-by: YOUR NAME <your@email.addr>" nil 1) + ("8sob" "Signed-off-by: YOUR NAME <your@email.addr>" nil 1) + )) + +with this change, if you type (for example) ``8rev`` followed +by ``<space>`` or ``<enter>`` it will expand to the whole phrase. + +vim +^^^ + +In the file ``$HOME/.vimrc`` add:: + + iabbrev 8rev Reviewed-by: YOUR NAME <your@email.addr> + iabbrev 8ack Acked-by: YOUR NAME <your@email.addr> + iabbrev 8test Tested-by: YOUR NAME <your@email.addr> + iabbrev 8sob Signed-off-by: YOUR NAME <your@email.addr> + +with this change, if you type (for example) ``8rev`` followed +by ``<space>`` or ``<enter>`` it will expand to the whole phrase. + +Re-starting abandoned work +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +For a variety of reasons there are some patches that get submitted to +QEMU but never merged. An unrelated contributor may decide (months or +years later) to continue working from the abandoned patch and re-submit +it with extra changes. + +If the abandoned patch already had a ``Signed-off-by`` from the original +author this **must** be preserved. The new contributor **must** then add +their own ``Signed-off-by`` after the original one if they made any +further changes to it. It is common to include a comment just prior to +the new ``Signed-off-by`` indicating what extra changes were made. For +example:: + + Signed-off-by: Some Person <some.person@example.com> + [Rebased and added support for 'foo'] + Signed-off-by: New Person <new.person@example.com> diff --git a/docs/devel/index-process.rst b/docs/devel/index-process.rst index 362f97ee30..b54e58105e 100644 --- a/docs/devel/index-process.rst +++ b/docs/devel/index-process.rst @@ -13,6 +13,7 @@ Notes about how to interact with the community and how and where to submit patch maintainers style submitting-a-patch + code-provenance trivial-patches stable-process submitting-a-pull-request diff --git a/docs/devel/submitting-a-patch.rst b/docs/devel/submitting-a-patch.rst index c641d948f1..ec541b3d15 100644 --- a/docs/devel/submitting-a-patch.rst +++ b/docs/devel/submitting-a-patch.rst @@ -322,21 +322,9 @@ Patch emails must include a ``Signed-off-by:`` line Your patches **must** include a Signed-off-by: line. This is a hard requirement because it's how you say "I'm legally okay to contribute -this and happy for it to go into QEMU". The process is modelled after -the `Linux kernel -<http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches?id=f6f94e2ab1b33f0082ac22d71f66385a60d8157f#n297>`__ -policy. - -If you wrote the patch, make sure your "From:" and "Signed-off-by:" -lines use the same spelling. It's okay if you subscribe or contribute to -the list via more than one address, but using multiple addresses in one -commit just confuses things. If someone else wrote the patch, git will -include a "From:" line in the body of the email (different from your -envelope From:) that will give credit to the correct author; but again, -that author's Signed-off-by: line is mandatory, with the same spelling. - -There are various tooling options for automatically adding these tags -include using ``git commit -s`` or ``git format-patch -s``. For more +this and happy for it to go into QEMU". For full guidance, read the +:ref:`code-provenance` documentation. + information see `SubmittingPatches 1.12 <http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches?id=f6f94e2ab1b33f0082ac22d71f66385a60d8157f#n297>`__. -- 2.41.0 ^ permalink raw reply related [flat|nested] 57+ messages in thread
* Re: [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off 2023-11-23 11:40 ` [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off Daniel P. Berrangé @ 2023-11-23 11:58 ` Philippe Mathieu-Daudé 2023-11-23 17:08 ` Daniel P. Berrangé 2023-11-23 13:01 ` Peter Maydell ` (4 subsequent siblings) 5 siblings, 1 reply; 57+ messages in thread From: Philippe Mathieu-Daudé @ 2023-11-23 11:58 UTC (permalink / raw) To: Daniel P. Berrangé, qemu-devel Cc: Richard Henderson, Alexander Graf, Alex Bennée, Paolo Bonzini, Michael S. Tsirkin, Markus Armbruster, Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell On 23/11/23 12:40, Daniel P. Berrangé wrote: > Currently we have a short paragraph saying that patches must include > a Signed-off-by line, and merely link to the kernel documentation. > The linked kernel docs have alot of content beyond the part about > sign-off an thus is misleading/distracting to QEMU contributors. > > This introduces a dedicated 'code-provenance' page in QEMU talking > about why we require sign-off, explaining the other tags we commonly > use, and what to do in some edge cases. > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> > --- > docs/devel/code-provenance.rst | 197 ++++++++++++++++++++++++++++++ > docs/devel/index-process.rst | 1 + > docs/devel/submitting-a-patch.rst | 18 +-- > 3 files changed, 201 insertions(+), 15 deletions(-) > create mode 100644 docs/devel/code-provenance.rst > > diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst > new file mode 100644 > index 0000000000..b4591a2dec > --- /dev/null > +++ b/docs/devel/code-provenance.rst > @@ -0,0 +1,197 @@ > +.. _code-provenance: > + > +Code provenance > +=============== > + > +Certifying patch submissions > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +The QEMU community **mandates** all contributors to certify provenance > +of patch submissions they make to the project. To put it another way, > +contributors must indicate that they are legally permitted to contribute > +to the project. > + > +Certification is achieved with a low overhead by adding a single line > +to the bottom of every git commit:: > + > + Signed-off-by: YOUR NAME <YOUR@EMAIL> > + > +This existence of this line asserts that the author of the patch is > +contributing in accordance with the `Developer's Certificate of > +Origin <https://developercertifcate.org>`__: Typo: https://developercertificate.org/ > + > +.. _dco: > + > +:: > + Developer's Certificate of Origin 1.1 > + > + By making a contribution to this project, I certify that: > + > + (a) The contribution was created in whole or in part by me and I > + have the right to submit it under the open source license > + indicated in the file; or > + > + (b) The contribution is based upon previous work that, to the best > + of my knowledge, is covered under an appropriate open source > + license and I have the right under that license to submit that > + work with modifications, whether created in whole or in part > + by me, under the same open source license (unless I am > + permitted to submit under a different license), as indicated > + in the file; or > + > + (c) The contribution was provided directly to me by some other > + person who certified (a), (b) or (c) and I have not modified > + it. > + > + (d) I understand and agree that this project and the contribution > + are public and that a record of the contribution (including all > + personal information I submit with it, including my sign-off) is > + maintained indefinitely and may be redistributed consistent with > + this project or the open source license(s) involved. > + > +It is generally expected that the name and email addresses used in one > +of the ``Signed-off-by`` lines, matches that of the git commit ``Author`` > +field. If the person sending the mail is also one of the patch authors, > +it is further expected that the mail ``From:`` line name & address match > +one of the ``Signed-off-by`` lines. > + > +Multiple authorship > +~~~~~~~~~~~~~~~~~~~ > + > +It is not uncommon for a patch to have contributions from multiple > +authors. In such a scenario, a git commit will usually be expected > +to have a ``Signed-off-by`` line for each contributor involved in > +creatin of the patch. Some edge cases: "creating" > + > + * The non-primary author's contributions were so trivial that > + they can be considered not subject to copyright. In this case > + the secondary authors need not include a ``Signed-off-by``. > + > + This case most commonly applies where QEMU reviewers give short > + snippets of code as suggested fixes to a patch. The reviewers > + don't need to have their own ``Signed-off-by`` added unless > + their code suggestion was unusually large. > + > + * Both contributors work for the same employer and the employer > + requires copyright assignment. > + > + It can be said that in this case a ``Signed-off-by`` is indicating > + that the person has permission to contributeo from their employer "contribute" > + who is the copyright holder. It is none the less still preferrable "preferable" > + to include a ``Signed-off-by`` for each contributor, as in some > + countries employees are not able to assign copyright to their > + employer, and it also covers any time invested outside working > + hours. > + > +Other commit tags > +~~~~~~~~~~~~~~~~~ > + > +While the ``Signed-off-by`` tag is mandatory, there are a number of > +other tags that are commonly used during QEMU development > + > + * **``Reviewed-by``**: when a QEMU community member reviews a patch > + on the mailing list, if they consider the patch acceptable, they > + should send an email reply containing a ``Reviewed-by`` tag. > + > + NB: a subsystem maintainer sending a pull request would replace > + their own ``Reviewed-by`` with another ``Signed-off-by`` Hmm not sure about replacing, they have different meaning. You can merge patch you haven't reviewed. But as a maintainer you must S-o-b what you end merging (what is mentioned below in "subsystem maintainer"). > + > + * **``Acked-by``**: when a QEMU subsystem maintainer approves a patch > + that touches their subsystem, but intends to allow a different > + maintainer to queue it and send a pull request, they would send > + a mail containing a ``Acked-by`` tag. > + > + * **``Tested-by``**: when a QEMU community member has functionally > + tested the behaviour of the patch in some manner, they should > + send an email reply conmtaning a ``Tested-by`` tag. "containing" > + > + * **``Reported-by``**: when a QEMU community member reports a problem > + via the mailing list, or some other informal channel that is not > + the issue tracker, it is good practice to credit them by including > + a ``Reported-by`` tag on any patch fixing the issue. When the > + problem is reported via the GitLab issue tracker, however, it is > + sufficient to just include a link to the issue. Hmm isn't related to the "Resolves:" tag? > + > +Subsystem maintainer requirements > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +When a subsystem maintainer accepts a patch from a contributor, in > +addition to the normal code review points, they are expected to validate > +the presence of suitable ``Signed-off-by`` tags. > + > +At the time they queue the patch in their subsystem tree, the maintainer > +**MUST** also then add their own ``Signed-off-by`` to indicate that they > +have done the aforementioned validation. > + > +The subsystem maintainer submitting a pull request is **NOT** expected to > +have a ``Reviewed-by`` tag on the patch, since this is implied by their > +own ``Signed-off-by``. > + > +Tools for adding ``Signed-of-by`` "Signed-off-by" > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +There are a variety of ways tools can support adding ``Signed-off-by`` > +tags for patches, avoiding the need for contributors to manually > +type in this repetitive text each time. > + > +git commands > +^^^^^^^^^^^^ > + > +When creating, or amending, a commit the ``-s`` flag to ``git commit`` > +will append a suitable line matching the configuring git author > +details. > + > +If preparing patches using the ``git format-patch`` tool, the ``-s`` > +flag can be used to append a suitable line in the emails it creates, > +without modifying the local commits. Alternatively to modify the > +local commits on a branch en-mass:: > + > + git rebase master -x 'git commit --amend --no-edit -s' > + > +emacs > +^^^^^ > + > +In the file ``$HOME/.emacs.d/abbrev_defs`` add:: > + > + (define-abbrev-table 'global-abbrev-table > + '( > + ("8rev" "Reviewed-by: YOUR NAME <your@email.addr>" nil 1) > + ("8ack" "Acked-by: YOUR NAME <your@email.addr>" nil 1) > + ("8test" "Tested-by: YOUR NAME <your@email.addr>" nil 1) > + ("8sob" "Signed-off-by: YOUR NAME <your@email.addr>" nil 1) > + )) > + > +with this change, if you type (for example) ``8rev`` followed > +by ``<space>`` or ``<enter>`` it will expand to the whole phrase. > + > +vim > +^^^ > + > +In the file ``$HOME/.vimrc`` add:: > + > + iabbrev 8rev Reviewed-by: YOUR NAME <your@email.addr> > + iabbrev 8ack Acked-by: YOUR NAME <your@email.addr> > + iabbrev 8test Tested-by: YOUR NAME <your@email.addr> > + iabbrev 8sob Signed-off-by: YOUR NAME <your@email.addr> > + > +with this change, if you type (for example) ``8rev`` followed > +by ``<space>`` or ``<enter>`` it will expand to the whole phrase. > + > +Re-starting abandoned work > +~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +For a variety of reasons there are some patches that get submitted to > +QEMU but never merged. An unrelated contributor may decide (months or > +years later) to continue working from the abandoned patch and re-submit > +it with extra changes. > + > +If the abandoned patch already had a ``Signed-off-by`` from the original > +author this **must** be preserved. The new contributor **must** then add > +their own ``Signed-off-by`` after the original one if they made any > +further changes to it. It is common to include a comment just prior to > +the new ``Signed-off-by`` indicating what extra changes were made. For > +example:: > + > + Signed-off-by: Some Person <some.person@example.com> > + [Rebased and added support for 'foo'] > + Signed-off-by: New Person <new.person@example.com> > diff --git a/docs/devel/index-process.rst b/docs/devel/index-process.rst > index 362f97ee30..b54e58105e 100644 > --- a/docs/devel/index-process.rst > +++ b/docs/devel/index-process.rst > @@ -13,6 +13,7 @@ Notes about how to interact with the community and how and where to submit patch > maintainers > style > submitting-a-patch > + code-provenance > trivial-patches > stable-process > submitting-a-pull-request > diff --git a/docs/devel/submitting-a-patch.rst b/docs/devel/submitting-a-patch.rst > index c641d948f1..ec541b3d15 100644 > --- a/docs/devel/submitting-a-patch.rst > +++ b/docs/devel/submitting-a-patch.rst > @@ -322,21 +322,9 @@ Patch emails must include a ``Signed-off-by:`` line > > Your patches **must** include a Signed-off-by: line. This is a hard > requirement because it's how you say "I'm legally okay to contribute > -this and happy for it to go into QEMU". The process is modelled after > -the `Linux kernel > -<http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches?id=f6f94e2ab1b33f0082ac22d71f66385a60d8157f#n297>`__ > -policy. > - > -If you wrote the patch, make sure your "From:" and "Signed-off-by:" > -lines use the same spelling. It's okay if you subscribe or contribute to > -the list via more than one address, but using multiple addresses in one > -commit just confuses things. If someone else wrote the patch, git will > -include a "From:" line in the body of the email (different from your > -envelope From:) that will give credit to the correct author; but again, > -that author's Signed-off-by: line is mandatory, with the same spelling. > - > -There are various tooling options for automatically adding these tags > -include using ``git commit -s`` or ``git format-patch -s``. For more > +this and happy for it to go into QEMU". For full guidance, read the > +:ref:`code-provenance` documentation. > + > information see `SubmittingPatches 1.12 > <http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches?id=f6f94e2ab1b33f0082ac22d71f66385a60d8157f#n297>`__. > ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off 2023-11-23 11:58 ` Philippe Mathieu-Daudé @ 2023-11-23 17:08 ` Daniel P. Berrangé 2023-11-23 23:56 ` Michael S. Tsirkin 0 siblings, 1 reply; 57+ messages in thread From: Daniel P. Berrangé @ 2023-11-23 17:08 UTC (permalink / raw) To: Philippe Mathieu-Daudé Cc: qemu-devel, Richard Henderson, Alexander Graf, Alex Bennée, Paolo Bonzini, Michael S. Tsirkin, Markus Armbruster, Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell On Thu, Nov 23, 2023 at 12:58:18PM +0100, Philippe Mathieu-Daudé wrote: > On 23/11/23 12:40, Daniel P. Berrangé wrote: > > Currently we have a short paragraph saying that patches must include > > a Signed-off-by line, and merely link to the kernel documentation. > > The linked kernel docs have alot of content beyond the part about > > sign-off an thus is misleading/distracting to QEMU contributors. > > > > This introduces a dedicated 'code-provenance' page in QEMU talking > > about why we require sign-off, explaining the other tags we commonly > > use, and what to do in some edge cases. > > > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> > > --- > > docs/devel/code-provenance.rst | 197 ++++++++++++++++++++++++++++++ > > docs/devel/index-process.rst | 1 + > > docs/devel/submitting-a-patch.rst | 18 +-- > > 3 files changed, 201 insertions(+), 15 deletions(-) > > create mode 100644 docs/devel/code-provenance.rst > > +Other commit tags > > +~~~~~~~~~~~~~~~~~ > > + > > +While the ``Signed-off-by`` tag is mandatory, there are a number of > > +other tags that are commonly used during QEMU development > > + > > + * **``Reviewed-by``**: when a QEMU community member reviews a patch > > + on the mailing list, if they consider the patch acceptable, they > > + should send an email reply containing a ``Reviewed-by`` tag. > > + > > + NB: a subsystem maintainer sending a pull request would replace > > + their own ``Reviewed-by`` with another ``Signed-off-by`` > > Hmm not sure about replacing, they have different meaning. You can merge > patch you haven't reviewed. But as a maintainer you must S-o-b what you > end merging (what is mentioned below in "subsystem maintainer"). I've always taken it as implied that patches I queue are reviewed by me, but replies here suggest I'm in a minority on that. That shows why it is worth documenting this for QEMU explicitly :-) > > + * **``Reported-by``**: when a QEMU community member reports a problem > > + via the mailing list, or some other informal channel that is not > > + the issue tracker, it is good practice to credit them by including > > + a ``Reported-by`` tag on any patch fixing the issue. When the > > + problem is reported via the GitLab issue tracker, however, it is > > + sufficient to just include a link to the issue. > > Hmm isn't related to the "Resolves:" tag? Gitlab supports a huge varity - resolves/fixes/closes/etc I don't think this wants to turn into a full guide on what info to include in a commit message, as we already have that in the submitting-a-patch doc, explaining the bug link syntax. So I'll still to just the tags that explicitly credit humans. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off 2023-11-23 17:08 ` Daniel P. Berrangé @ 2023-11-23 23:56 ` Michael S. Tsirkin 0 siblings, 0 replies; 57+ messages in thread From: Michael S. Tsirkin @ 2023-11-23 23:56 UTC (permalink / raw) To: Daniel P. Berrangé Cc: Philippe Mathieu-Daudé, qemu-devel, Richard Henderson, Alexander Graf, Alex Bennée, Paolo Bonzini, Markus Armbruster, Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell On Thu, Nov 23, 2023 at 05:08:46PM +0000, Daniel P. Berrangé wrote: > On Thu, Nov 23, 2023 at 12:58:18PM +0100, Philippe Mathieu-Daudé wrote: > > On 23/11/23 12:40, Daniel P. Berrangé wrote: > > > Currently we have a short paragraph saying that patches must include > > > a Signed-off-by line, and merely link to the kernel documentation. > > > The linked kernel docs have alot of content beyond the part about > > > sign-off an thus is misleading/distracting to QEMU contributors. > > > > > > This introduces a dedicated 'code-provenance' page in QEMU talking > > > about why we require sign-off, explaining the other tags we commonly > > > use, and what to do in some edge cases. > > > > > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> > > > --- > > > docs/devel/code-provenance.rst | 197 ++++++++++++++++++++++++++++++ > > > docs/devel/index-process.rst | 1 + > > > docs/devel/submitting-a-patch.rst | 18 +-- > > > 3 files changed, 201 insertions(+), 15 deletions(-) > > > create mode 100644 docs/devel/code-provenance.rst > > > > +Other commit tags > > > +~~~~~~~~~~~~~~~~~ > > > + > > > +While the ``Signed-off-by`` tag is mandatory, there are a number of > > > +other tags that are commonly used during QEMU development > > > + > > > + * **``Reviewed-by``**: when a QEMU community member reviews a patch > > > + on the mailing list, if they consider the patch acceptable, they > > > + should send an email reply containing a ``Reviewed-by`` tag. > > > + > > > + NB: a subsystem maintainer sending a pull request would replace > > > + their own ``Reviewed-by`` with another ``Signed-off-by`` > > > > Hmm not sure about replacing, they have different meaning. You can merge > > patch you haven't reviewed. But as a maintainer you must S-o-b what you > > end merging (what is mentioned below in "subsystem maintainer"). > > I've always taken it as implied that patches I queue are reviewed by me, Well sometimes I queue patches not in my area that I have seen languish on list with no replies for too long. I generally do a cursory review but not to the level that I feel justifies Reviewed-by. > but replies here suggest I'm in a minority on that. That shows why it is > worth documenting this for QEMU explicitly :-) Absolutely. ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off 2023-11-23 11:40 ` [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off Daniel P. Berrangé 2023-11-23 11:58 ` Philippe Mathieu-Daudé @ 2023-11-23 13:01 ` Peter Maydell 2023-11-23 17:12 ` Daniel P. Berrangé 2023-11-23 13:16 ` Kevin Wolf ` (3 subsequent siblings) 5 siblings, 1 reply; 57+ messages in thread From: Peter Maydell @ 2023-11-23 13:01 UTC (permalink / raw) To: Daniel P. Berrangé Cc: qemu-devel, Richard Henderson, Alexander Graf, Alex Bennée, Paolo Bonzini, Michael S. Tsirkin, Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann, Mark Cave-Ayland On Thu, 23 Nov 2023 at 11:40, Daniel P. Berrangé <berrange@redhat.com> wrote: > > Currently we have a short paragraph saying that patches must include > a Signed-off-by line, and merely link to the kernel documentation. > The linked kernel docs have alot of content beyond the part about "a lot" > sign-off an thus is misleading/distracting to QEMU contributors. "and thus are" > > This introduces a dedicated 'code-provenance' page in QEMU talking > about why we require sign-off, explaining the other tags we commonly > use, and what to do in some edge cases. Good idea; I've felt for a while now that it was a little awkward to have to point people at that big kernel doc page. > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> > --- > docs/devel/code-provenance.rst | 197 ++++++++++++++++++++++++++++++ > docs/devel/index-process.rst | 1 + > docs/devel/submitting-a-patch.rst | 18 +-- > 3 files changed, 201 insertions(+), 15 deletions(-) > create mode 100644 docs/devel/code-provenance.rst > > diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst > new file mode 100644 > index 0000000000..b4591a2dec > --- /dev/null > +++ b/docs/devel/code-provenance.rst > @@ -0,0 +1,197 @@ > +.. _code-provenance: > + > +Code provenance > +=============== > + > +Certifying patch submissions > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +The QEMU community **mandates** all contributors to certify provenance > +of patch submissions they make to the project. To put it another way, > +contributors must indicate that they are legally permitted to contribute > +to the project. > + > +Certification is achieved with a low overhead by adding a single line > +to the bottom of every git commit:: > + > + Signed-off-by: YOUR NAME <YOUR@EMAIL> > + > +This existence of this line asserts that the author of the patch is > +contributing in accordance with the `Developer's Certificate of > +Origin <https://developercertifcate.org>`__: > + > +.. _dco: > + > +:: > + Developer's Certificate of Origin 1.1 > + > + By making a contribution to this project, I certify that: > + > + (a) The contribution was created in whole or in part by me and I > + have the right to submit it under the open source license > + indicated in the file; or > + > + (b) The contribution is based upon previous work that, to the best > + of my knowledge, is covered under an appropriate open source > + license and I have the right under that license to submit that > + work with modifications, whether created in whole or in part > + by me, under the same open source license (unless I am > + permitted to submit under a different license), as indicated > + in the file; or > + > + (c) The contribution was provided directly to me by some other > + person who certified (a), (b) or (c) and I have not modified > + it. > + > + (d) I understand and agree that this project and the contribution > + are public and that a record of the contribution (including all > + personal information I submit with it, including my sign-off) is > + maintained indefinitely and may be redistributed consistent with > + this project or the open source license(s) involved. > + > +It is generally expected that the name and email addresses used in one > +of the ``Signed-off-by`` lines, matches that of the git commit ``Author`` > +field. If the person sending the mail is also one of the patch authors, > +it is further expected that the mail ``From:`` line name & address match > +one of the ``Signed-off-by`` lines. Is it? Patches sent via the sr.ht service won't do that, and I'm pretty sure we've had a few contributors in the past who send patches from different addresses to avoid problems with their corporate mail server mangling patches. I think this would be better softened to something like a recommendation ("Generally you should use the same email addresses ... "). > +Multiple authorship > +~~~~~~~~~~~~~~~~~~~ > + > +It is not uncommon for a patch to have contributions from multiple > +authors. In such a scenario, a git commit will usually be expected > +to have a ``Signed-off-by`` line for each contributor involved in > +creatin of the patch. Some edge cases: "creation" (not "creating") > + > + * The non-primary author's contributions were so trivial that > + they can be considered not subject to copyright. In this case > + the secondary authors need not include a ``Signed-off-by``. > + > + This case most commonly applies where QEMU reviewers give short > + snippets of code as suggested fixes to a patch. The reviewers > + don't need to have their own ``Signed-off-by`` added unless > + their code suggestion was unusually large. > + > + * Both contributors work for the same employer and the employer > + requires copyright assignment. > + > + It can be said that in this case a ``Signed-off-by`` is indicating > + that the person has permission to contributeo from their employer > + who is the copyright holder. It is none the less still preferrable > + to include a ``Signed-off-by`` for each contributor, as in some > + countries employees are not able to assign copyright to their > + employer, and it also covers any time invested outside working > + hours. > + > +Other commit tags > +~~~~~~~~~~~~~~~~~ > + > +While the ``Signed-off-by`` tag is mandatory, there are a number of > +other tags that are commonly used during QEMU development missing '.' (or perhaps ':'). > + > + * **``Reviewed-by``**: when a QEMU community member reviews a patch > + on the mailing list, if they consider the patch acceptable, they > + should send an email reply containing a ``Reviewed-by`` tag. > + > + NB: a subsystem maintainer sending a pull request would replace > + their own ``Reviewed-by`` with another ``Signed-off-by`` I agree with Philippe here -- you add signed-off-by, you don't replace reviewed-by. > + > + * **``Acked-by``**: when a QEMU subsystem maintainer approves a patch > + that touches their subsystem, but intends to allow a different > + maintainer to queue it and send a pull request, they would send > + a mail containing a ``Acked-by`` tag. I would personally also say "Acked-by does not imply a full code review of the patch; if the subsystem maintainer has done a full review, they should use the Reviewed-by tag instead." But I know that there are some differences of opinion on exactly what Acked-by: means... > + > + * **``Tested-by``**: when a QEMU community member has functionally > + tested the behaviour of the patch in some manner, they should > + send an email reply conmtaning a ``Tested-by`` tag. > + > + * **``Reported-by``**: when a QEMU community member reports a problem > + via the mailing list, or some other informal channel that is not > + the issue tracker, it is good practice to credit them by including > + a ``Reported-by`` tag on any patch fixing the issue. When the > + problem is reported via the GitLab issue tracker, however, it is > + sufficient to just include a link to the issue. Maybe we should add a bit of encouraging text here along the lines of: Reviewing and testing is something anybody can do -- if you've reviewed the code or tested it, feel free to send an email with your tag to say you've done that, or to ask questions if there's part of the patch you don't understand. ? Or perhaps that would be better elsewhere; IDK. > + > +Subsystem maintainer requirements > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +When a subsystem maintainer accepts a patch from a contributor, in > +addition to the normal code review points, they are expected to validate > +the presence of suitable ``Signed-off-by`` tags. > + > +At the time they queue the patch in their subsystem tree, the maintainer > +**MUST** also then add their own ``Signed-off-by`` to indicate that they > +have done the aforementioned validation. > + > +The subsystem maintainer submitting a pull request is **NOT** expected to > +have a ``Reviewed-by`` tag on the patch, since this is implied by their > +own ``Signed-off-by``. As above, Signed-off-by doesn't imply Reviewed-by. If the submaintainer has reviewed the patch, they add the R-by, but if they haven't done that, then they only add the S-o-by. > + > +Tools for adding ``Signed-of-by`` > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +There are a variety of ways tools can support adding ``Signed-off-by`` > +tags for patches, avoiding the need for contributors to manually > +type in this repetitive text each time. > + > +git commands > +^^^^^^^^^^^^ > + > +When creating, or amending, a commit the ``-s`` flag to ``git commit`` > +will append a suitable line matching the configuring git author > +details. > + > +If preparing patches using the ``git format-patch`` tool, the ``-s`` > +flag can be used to append a suitable line in the emails it creates, > +without modifying the local commits. Alternatively to modify the > +local commits on a branch en-mass:: > + > + git rebase master -x 'git commit --amend --no-edit -s' > + > +emacs > +^^^^^ > + > +In the file ``$HOME/.emacs.d/abbrev_defs`` add:: > + > + (define-abbrev-table 'global-abbrev-table > + '( > + ("8rev" "Reviewed-by: YOUR NAME <your@email.addr>" nil 1) > + ("8ack" "Acked-by: YOUR NAME <your@email.addr>" nil 1) > + ("8test" "Tested-by: YOUR NAME <your@email.addr>" nil 1) > + ("8sob" "Signed-off-by: YOUR NAME <your@email.addr>" nil 1) > + )) > + > +with this change, if you type (for example) ``8rev`` followed > +by ``<space>`` or ``<enter>`` it will expand to the whole phrase. > + > +vim > +^^^ > + > +In the file ``$HOME/.vimrc`` add:: > + > + iabbrev 8rev Reviewed-by: YOUR NAME <your@email.addr> > + iabbrev 8ack Acked-by: YOUR NAME <your@email.addr> > + iabbrev 8test Tested-by: YOUR NAME <your@email.addr> > + iabbrev 8sob Signed-off-by: YOUR NAME <your@email.addr> > + > +with this change, if you type (for example) ``8rev`` followed > +by ``<space>`` or ``<enter>`` it will expand to the whole phrase. > + > +Re-starting abandoned work > +~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +For a variety of reasons there are some patches that get submitted to > +QEMU but never merged. An unrelated contributor may decide (months or > +years later) to continue working from the abandoned patch and re-submit > +it with extra changes. > + > +If the abandoned patch already had a ``Signed-off-by`` from the original > +author this **must** be preserved. The new contributor **must** then add > +their own ``Signed-off-by`` after the original one if they made any > +further changes to it. It is common to include a comment just prior to > +the new ``Signed-off-by`` indicating what extra changes were made. For > +example:: > + > + Signed-off-by: Some Person <some.person@example.com> > + [Rebased and added support for 'foo'] > + Signed-off-by: New Person <new.person@example.com> You might want to use two different email domains in this example; an abandoned project picked up by somebody from the same company (assuming the usual copyright-belongs-to-company) is a bit different from an abandoned project picked up by an entirely unrelated person. I think in this case it's also worth stating the general principles: ===begin=== The general principles with picking up abandoned work are: * we should continue to credit the first author for their work * we should track the provenance of the code * we should also acknowledge the efforts of the person picking up the work * the commit messages should indicate who is responsible for what parts of the final patch In complicated cases or if in doubt, you can always ask on the mailing list for advice. If the new work you'd need to do to resubmit the patches is significant, it's worth dropping the original author a friendly email to let them know, in case you might be duplicating something the original author is still working on. ===endit=== perhaps ? thanks -- PMM ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off 2023-11-23 13:01 ` Peter Maydell @ 2023-11-23 17:12 ` Daniel P. Berrangé 0 siblings, 0 replies; 57+ messages in thread From: Daniel P. Berrangé @ 2023-11-23 17:12 UTC (permalink / raw) To: Peter Maydell Cc: qemu-devel, Richard Henderson, Alexander Graf, Alex Bennée, Paolo Bonzini, Michael S. Tsirkin, Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann, Mark Cave-Ayland On Thu, Nov 23, 2023 at 01:01:00PM +0000, Peter Maydell wrote: > On Thu, 23 Nov 2023 at 11:40, Daniel P. Berrangé <berrange@redhat.com> wrote: > > > > Currently we have a short paragraph saying that patches must include > > a Signed-off-by line, and merely link to the kernel documentation. > > The linked kernel docs have alot of content beyond the part about > > "a lot" > > > sign-off an thus is misleading/distracting to QEMU contributors. > > "and thus are" > > > > > This introduces a dedicated 'code-provenance' page in QEMU talking > > about why we require sign-off, explaining the other tags we commonly > > use, and what to do in some edge cases. > > Good idea; I've felt for a while now that it was a little awkward > to have to point people at that big kernel doc page. > > > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> > > --- > > docs/devel/code-provenance.rst | 197 ++++++++++++++++++++++++++++++ > > docs/devel/index-process.rst | 1 + > > docs/devel/submitting-a-patch.rst | 18 +-- > > 3 files changed, 201 insertions(+), 15 deletions(-) > > create mode 100644 docs/devel/code-provenance.rst > > > > diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst > > new file mode 100644 > > index 0000000000..b4591a2dec > > --- /dev/null > > +++ b/docs/devel/code-provenance.rst > > @@ -0,0 +1,197 @@ > > +.. _code-provenance: > > + > > +Code provenance > > +=============== > > + > > +Certifying patch submissions > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > + > > +The QEMU community **mandates** all contributors to certify provenance > > +of patch submissions they make to the project. To put it another way, > > +contributors must indicate that they are legally permitted to contribute > > +to the project. > > + > > +Certification is achieved with a low overhead by adding a single line > > +to the bottom of every git commit:: > > + > > + Signed-off-by: YOUR NAME <YOUR@EMAIL> > > + > > +This existence of this line asserts that the author of the patch is > > +contributing in accordance with the `Developer's Certificate of > > +Origin <https://developercertifcate.org>`__: > > + > > +.. _dco: > > + > > +:: > > + Developer's Certificate of Origin 1.1 > > + > > + By making a contribution to this project, I certify that: > > + > > + (a) The contribution was created in whole or in part by me and I > > + have the right to submit it under the open source license > > + indicated in the file; or > > + > > + (b) The contribution is based upon previous work that, to the best > > + of my knowledge, is covered under an appropriate open source > > + license and I have the right under that license to submit that > > + work with modifications, whether created in whole or in part > > + by me, under the same open source license (unless I am > > + permitted to submit under a different license), as indicated > > + in the file; or > > + > > + (c) The contribution was provided directly to me by some other > > + person who certified (a), (b) or (c) and I have not modified > > + it. > > + > > + (d) I understand and agree that this project and the contribution > > + are public and that a record of the contribution (including all > > + personal information I submit with it, including my sign-off) is > > + maintained indefinitely and may be redistributed consistent with > > + this project or the open source license(s) involved. > > + > > +It is generally expected that the name and email addresses used in one > > +of the ``Signed-off-by`` lines, matches that of the git commit ``Author`` > > +field. If the person sending the mail is also one of the patch authors, > > +it is further expected that the mail ``From:`` line name & address match > > +one of the ``Signed-off-by`` lines. > > Is it? Patches sent via the sr.ht service won't do that, and I'm > pretty sure we've had a few contributors in the past who send > patches from different addresses to avoid problems with their > corporate mail server mangling patches. I think this would be > better softened to something like a recommendation ("Generally > you should use the same email addresses ... "). Yes, I forgot about sr.ht being wierd in this respect, so I'll take your suggestion. > > + > > + * **``Reviewed-by``**: when a QEMU community member reviews a patch > > + on the mailing list, if they consider the patch acceptable, they > > + should send an email reply containing a ``Reviewed-by`` tag. > > + > > + NB: a subsystem maintainer sending a pull request would replace > > + their own ``Reviewed-by`` with another ``Signed-off-by`` > > I agree with Philippe here -- you add signed-off-by, you don't > replace reviewed-by. Yep, will change that. > > > + > > + * **``Acked-by``**: when a QEMU subsystem maintainer approves a patch > > + that touches their subsystem, but intends to allow a different > > + maintainer to queue it and send a pull request, they would send > > + a mail containing a ``Acked-by`` tag. > > I would personally also say "Acked-by does not imply a full code > review of the patch; if the subsystem maintainer has done a full > review, they should use the Reviewed-by tag instead." > > But I know that there are some differences of opinion on exactly > what Acked-by: means... I'll incorporate something along those lines with a little fuzzyness to give flexibility. > > + > > + * **``Tested-by``**: when a QEMU community member has functionally > > + tested the behaviour of the patch in some manner, they should > > + send an email reply conmtaning a ``Tested-by`` tag. > > + > > + * **``Reported-by``**: when a QEMU community member reports a problem > > + via the mailing list, or some other informal channel that is not > > + the issue tracker, it is good practice to credit them by including > > + a ``Reported-by`` tag on any patch fixing the issue. When the > > + problem is reported via the GitLab issue tracker, however, it is > > + sufficient to just include a link to the issue. > > Maybe we should add a bit of encouraging text here along the lines of: > > Reviewing and testing is something anybody can do -- if you've > reviewed the code or tested it, feel free to send an email with > your tag to say you've done that, or to ask questions if there's > part of the patch you don't understand. > > ? Or perhaps that would be better elsewhere; IDK. I'll put a little bit in here but want to keep it relatively concise, since we have other docs about more general contribution practices. > > +If the abandoned patch already had a ``Signed-off-by`` from the original > > +author this **must** be preserved. The new contributor **must** then add > > +their own ``Signed-off-by`` after the original one if they made any > > +further changes to it. It is common to include a comment just prior to > > +the new ``Signed-off-by`` indicating what extra changes were made. For > > +example:: > > + > > + Signed-off-by: Some Person <some.person@example.com> > > + [Rebased and added support for 'foo'] > > + Signed-off-by: New Person <new.person@example.com> > > You might want to use two different email domains in this example; > an abandoned project picked up by somebody from the same company > (assuming the usual copyright-belongs-to-company) is a bit different > from an abandoned project picked up by an entirely unrelated person. Yes good idea. > I think in this case it's also worth stating the general principles: > > ===begin=== > The general principles with picking up abandoned work are: > * we should continue to credit the first author for their work > * we should track the provenance of the code > * we should also acknowledge the efforts of the person picking > up the work > * the commit messages should indicate who is responsible for > what parts of the final patch > > In complicated cases or if in doubt, you can always ask on the > mailing list for advice. > > If the new work you'd need to do to resubmit the patches is > significant, it's worth dropping the original author a > friendly email to let them know, in case you might be > duplicating something the original author is still working on. > ===endit=== > > perhaps ? I'll incorporate somethnig along these lines. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off 2023-11-23 11:40 ` [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off Daniel P. Berrangé 2023-11-23 11:58 ` Philippe Mathieu-Daudé 2023-11-23 13:01 ` Peter Maydell @ 2023-11-23 13:16 ` Kevin Wolf 2023-11-23 17:12 ` Daniel P. Berrangé 2023-11-23 14:25 ` Michael S. Tsirkin ` (2 subsequent siblings) 5 siblings, 1 reply; 57+ messages in thread From: Kevin Wolf @ 2023-11-23 13:16 UTC (permalink / raw) To: Daniel P. Berrangé Cc: qemu-devel, Richard Henderson, Alexander Graf, Alex Bennée, Paolo Bonzini, Michael S. Tsirkin, Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell Am 23.11.2023 um 12:40 hat Daniel P. Berrangé geschrieben: > Currently we have a short paragraph saying that patches must include > a Signed-off-by line, and merely link to the kernel documentation. > The linked kernel docs have alot of content beyond the part about > sign-off an thus is misleading/distracting to QEMU contributors. > > This introduces a dedicated 'code-provenance' page in QEMU talking > about why we require sign-off, explaining the other tags we commonly > use, and what to do in some edge cases. > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> > --- > docs/devel/code-provenance.rst | 197 ++++++++++++++++++++++++++++++ > docs/devel/index-process.rst | 1 + > docs/devel/submitting-a-patch.rst | 18 +-- > 3 files changed, 201 insertions(+), 15 deletions(-) > create mode 100644 docs/devel/code-provenance.rst > > diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst > new file mode 100644 > index 0000000000..b4591a2dec > --- /dev/null > +++ b/docs/devel/code-provenance.rst > @@ -0,0 +1,197 @@ > +.. _code-provenance: > + > +Code provenance > +=============== > + > +Certifying patch submissions > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +The QEMU community **mandates** all contributors to certify provenance > +of patch submissions they make to the project. To put it another way, > +contributors must indicate that they are legally permitted to contribute > +to the project. > + > +Certification is achieved with a low overhead by adding a single line > +to the bottom of every git commit:: > + > + Signed-off-by: YOUR NAME <YOUR@EMAIL> > + > +This existence of this line asserts that the author of the patch is > +contributing in accordance with the `Developer's Certificate of > +Origin <https://developercertifcate.org>`__: > + > +.. _dco: > + > +:: > + Developer's Certificate of Origin 1.1 > + > + By making a contribution to this project, I certify that: > + > + (a) The contribution was created in whole or in part by me and I > + have the right to submit it under the open source license > + indicated in the file; or > + > + (b) The contribution is based upon previous work that, to the best > + of my knowledge, is covered under an appropriate open source > + license and I have the right under that license to submit that > + work with modifications, whether created in whole or in part > + by me, under the same open source license (unless I am > + permitted to submit under a different license), as indicated > + in the file; or > + > + (c) The contribution was provided directly to me by some other > + person who certified (a), (b) or (c) and I have not modified > + it. > + > + (d) I understand and agree that this project and the contribution > + are public and that a record of the contribution (including all > + personal information I submit with it, including my sign-off) is > + maintained indefinitely and may be redistributed consistent with > + this project or the open source license(s) involved. > + > +It is generally expected that the name and email addresses used in one > +of the ``Signed-off-by`` lines, matches that of the git commit ``Author`` > +field. If the person sending the mail is also one of the patch authors, > +it is further expected that the mail ``From:`` line name & address match > +one of the ``Signed-off-by`` lines. Isn't the S-o-b expected even if the person sending the mail isn't one of the patch authors, i.e. certifying (c) rather than (a) or (b) from the DCO? This is essentially the same case as what a subsystem maintainer does. > +Multiple authorship > +~~~~~~~~~~~~~~~~~~~ > + > +It is not uncommon for a patch to have contributions from multiple > +authors. In such a scenario, a git commit will usually be expected > +to have a ``Signed-off-by`` line for each contributor involved in > +creatin of the patch. Some edge cases: > + > + * The non-primary author's contributions were so trivial that > + they can be considered not subject to copyright. In this case > + the secondary authors need not include a ``Signed-off-by``. > + > + This case most commonly applies where QEMU reviewers give short > + snippets of code as suggested fixes to a patch. The reviewers > + don't need to have their own ``Signed-off-by`` added unless > + their code suggestion was unusually large. > + > + * Both contributors work for the same employer and the employer > + requires copyright assignment. > + > + It can be said that in this case a ``Signed-off-by`` is indicating > + that the person has permission to contributeo from their employer > + who is the copyright holder. It is none the less still preferrable > + to include a ``Signed-off-by`` for each contributor, as in some > + countries employees are not able to assign copyright to their > + employer, and it also covers any time invested outside working > + hours. > + > +Other commit tags > +~~~~~~~~~~~~~~~~~ > + > +While the ``Signed-off-by`` tag is mandatory, there are a number of > +other tags that are commonly used during QEMU development > + > + * **``Reviewed-by``**: when a QEMU community member reviews a patch > + on the mailing list, if they consider the patch acceptable, they > + should send an email reply containing a ``Reviewed-by`` tag. > + > + NB: a subsystem maintainer sending a pull request would replace > + their own ``Reviewed-by`` with another ``Signed-off-by`` As Philippe already mentioned, this isn't necessarily the case. It's a common enough practice to add a S-o-b (which technically only certifies the DCO) without removing the R-b (which tells that the content was actually reviewed in detail - maintainers don't always do that if there are already R-bs from trusted community members). > + * **``Acked-by``**: when a QEMU subsystem maintainer approves a patch > + that touches their subsystem, but intends to allow a different > + maintainer to queue it and send a pull request, they would send > + a mail containing a ``Acked-by`` tag. > + Trailing whitespace? > + * **``Tested-by``**: when a QEMU community member has functionally > + tested the behaviour of the patch in some manner, they should > + send an email reply conmtaning a ``Tested-by`` tag. > + > + * **``Reported-by``**: when a QEMU community member reports a problem > + via the mailing list, or some other informal channel that is not > + the issue tracker, it is good practice to credit them by including > + a ``Reported-by`` tag on any patch fixing the issue. When the > + problem is reported via the GitLab issue tracker, however, it is > + sufficient to just include a link to the issue. > + > +Subsystem maintainer requirements > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +When a subsystem maintainer accepts a patch from a contributor, in > +addition to the normal code review points, they are expected to validate > +the presence of suitable ``Signed-off-by`` tags. > + > +At the time they queue the patch in their subsystem tree, the maintainer > +**MUST** also then add their own ``Signed-off-by`` to indicate that they > +have done the aforementioned validation. > + > +The subsystem maintainer submitting a pull request is **NOT** expected to > +have a ``Reviewed-by`` tag on the patch, since this is implied by their > +own ``Signed-off-by``. Considering the above, I would remove this last paragraph. Kevin ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off 2023-11-23 13:16 ` Kevin Wolf @ 2023-11-23 17:12 ` Daniel P. Berrangé 0 siblings, 0 replies; 57+ messages in thread From: Daniel P. Berrangé @ 2023-11-23 17:12 UTC (permalink / raw) To: Kevin Wolf Cc: qemu-devel, Richard Henderson, Alexander Graf, Alex Bennée, Paolo Bonzini, Michael S. Tsirkin, Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell On Thu, Nov 23, 2023 at 02:16:36PM +0100, Kevin Wolf wrote: > Am 23.11.2023 um 12:40 hat Daniel P. Berrangé geschrieben: > > Currently we have a short paragraph saying that patches must include > > a Signed-off-by line, and merely link to the kernel documentation. > > The linked kernel docs have alot of content beyond the part about > > sign-off an thus is misleading/distracting to QEMU contributors. > > > > This introduces a dedicated 'code-provenance' page in QEMU talking > > about why we require sign-off, explaining the other tags we commonly > > use, and what to do in some edge cases. > > > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> > > --- > > docs/devel/code-provenance.rst | 197 ++++++++++++++++++++++++++++++ > > docs/devel/index-process.rst | 1 + > > docs/devel/submitting-a-patch.rst | 18 +-- > > 3 files changed, 201 insertions(+), 15 deletions(-) > > create mode 100644 docs/devel/code-provenance.rst > > > > diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst > > new file mode 100644 > > index 0000000000..b4591a2dec > > --- /dev/null > > +++ b/docs/devel/code-provenance.rst > > @@ -0,0 +1,197 @@ > > +.. _code-provenance: > > + > > +Code provenance > > +=============== > > + > > +Certifying patch submissions > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > + > > +The QEMU community **mandates** all contributors to certify provenance > > +of patch submissions they make to the project. To put it another way, > > +contributors must indicate that they are legally permitted to contribute > > +to the project. > > + > > +Certification is achieved with a low overhead by adding a single line > > +to the bottom of every git commit:: > > + > > + Signed-off-by: YOUR NAME <YOUR@EMAIL> > > + > > +This existence of this line asserts that the author of the patch is > > +contributing in accordance with the `Developer's Certificate of > > +Origin <https://developercertifcate.org>`__: > > + > > +.. _dco: > > + > > +:: > > + Developer's Certificate of Origin 1.1 > > + > > + By making a contribution to this project, I certify that: > > + > > + (a) The contribution was created in whole or in part by me and I > > + have the right to submit it under the open source license > > + indicated in the file; or > > + > > + (b) The contribution is based upon previous work that, to the best > > + of my knowledge, is covered under an appropriate open source > > + license and I have the right under that license to submit that > > + work with modifications, whether created in whole or in part > > + by me, under the same open source license (unless I am > > + permitted to submit under a different license), as indicated > > + in the file; or > > + > > + (c) The contribution was provided directly to me by some other > > + person who certified (a), (b) or (c) and I have not modified > > + it. > > + > > + (d) I understand and agree that this project and the contribution > > + are public and that a record of the contribution (including all > > + personal information I submit with it, including my sign-off) is > > + maintained indefinitely and may be redistributed consistent with > > + this project or the open source license(s) involved. > > + > > +It is generally expected that the name and email addresses used in one > > +of the ``Signed-off-by`` lines, matches that of the git commit ``Author`` > > +field. If the person sending the mail is also one of the patch authors, > > +it is further expected that the mail ``From:`` line name & address match > > +one of the ``Signed-off-by`` lines. > > Isn't the S-o-b expected even if the person sending the mail isn't one > of the patch authors, i.e. certifying (c) rather than (a) or (b) from > the DCO? This is essentially the same case as what a subsystem > maintainer does. Yes, you are right. > > +Other commit tags > > +~~~~~~~~~~~~~~~~~ > > + > > +While the ``Signed-off-by`` tag is mandatory, there are a number of > > +other tags that are commonly used during QEMU development > > + > > + * **``Reviewed-by``**: when a QEMU community member reviews a patch > > + on the mailing list, if they consider the patch acceptable, they > > + should send an email reply containing a ``Reviewed-by`` tag. > > + > > + NB: a subsystem maintainer sending a pull request would replace > > + their own ``Reviewed-by`` with another ``Signed-off-by`` > > As Philippe already mentioned, this isn't necessarily the case. It's a > common enough practice to add a S-o-b (which technically only certifies > the DCO) without removing the R-b (which tells that the content was > actually reviewed in detail - maintainers don't always do that if there > are already R-bs from trusted community members). Yes, will change. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off 2023-11-23 11:40 ` [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off Daniel P. Berrangé ` (2 preceding siblings ...) 2023-11-23 13:16 ` Kevin Wolf @ 2023-11-23 14:25 ` Michael S. Tsirkin 2023-11-23 17:16 ` Daniel P. Berrangé 2023-11-23 15:13 ` Stefan Hajnoczi 2024-01-27 14:36 ` Zhao Liu 5 siblings, 1 reply; 57+ messages in thread From: Michael S. Tsirkin @ 2023-11-23 14:25 UTC (permalink / raw) To: Daniel P. Berrangé Cc: qemu-devel, Richard Henderson, Alexander Graf, Alex Bennée, Paolo Bonzini, Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell On Thu, Nov 23, 2023 at 11:40:25AM +0000, Daniel P. Berrangé wrote: > Currently we have a short paragraph saying that patches must include > a Signed-off-by line, and merely link to the kernel documentation. > The linked kernel docs have alot of content beyond the part about > sign-off an thus is misleading/distracting to QEMU contributors. > > This introduces a dedicated 'code-provenance' page in QEMU talking > about why we require sign-off, explaining the other tags we commonly > use, and what to do in some edge cases. > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Great initiative! I think we needed this for a while now. > --- > docs/devel/code-provenance.rst | 197 ++++++++++++++++++++++++++++++ > docs/devel/index-process.rst | 1 + > docs/devel/submitting-a-patch.rst | 18 +-- > 3 files changed, 201 insertions(+), 15 deletions(-) > create mode 100644 docs/devel/code-provenance.rst > > diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst > new file mode 100644 > index 0000000000..b4591a2dec > --- /dev/null > +++ b/docs/devel/code-provenance.rst > @@ -0,0 +1,197 @@ > +.. _code-provenance: > + > +Code provenance > +=============== > + > +Certifying patch submissions > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +The QEMU community **mandates** all contributors to certify provenance > +of patch submissions they make to the project. To put it another way, > +contributors must indicate that they are legally permitted to contribute > +to the project. > + > +Certification is achieved with a low overhead by adding a single line > +to the bottom of every git commit:: > + > + Signed-off-by: YOUR NAME <YOUR@EMAIL> > + > +This existence of this line asserts that the author of the patch is The existence? > +contributing in accordance with the `Developer's Certificate of > +Origin <https://developercertifcate.org>`__: > + > +.. _dco: > + > +:: > + Developer's Certificate of Origin 1.1 > + > + By making a contribution to this project, I certify that: > + > + (a) The contribution was created in whole or in part by me and I > + have the right to submit it under the open source license > + indicated in the file; or > + > + (b) The contribution is based upon previous work that, to the best > + of my knowledge, is covered under an appropriate open source > + license and I have the right under that license to submit that > + work with modifications, whether created in whole or in part > + by me, under the same open source license (unless I am > + permitted to submit under a different license), as indicated > + in the file; or > + > + (c) The contribution was provided directly to me by some other > + person who certified (a), (b) or (c) and I have not modified > + it. > + > + (d) I understand and agree that this project and the contribution > + are public and that a record of the contribution (including all > + personal information I submit with it, including my sign-off) is > + maintained indefinitely and may be redistributed consistent with > + this project or the open source license(s) involved. > + > +It is generally expected that the name and email addresses used in one > +of the ``Signed-off-by`` lines, matches that of the git commit ``Author`` > +field. If the person sending the mail is also one of the patch authors, > +it is further expected that the mail ``From:`` line name & address match > +one of the ``Signed-off-by`` lines. > + > +Multiple authorship > +~~~~~~~~~~~~~~~~~~~ > + > +It is not uncommon for a patch to have contributions from multiple > +authors. In such a scenario, a git commit will usually be expected > +to have a ``Signed-off-by`` line for each contributor involved in > +creatin of the patch. Some edge cases: creation > + > + * The non-primary author's contributions were so trivial that > + they can be considered not subject to copyright. In this case > + the secondary authors need not include a ``Signed-off-by``. > + > + This case most commonly applies where QEMU reviewers give short > + snippets of code as suggested fixes to a patch. The reviewers > + don't need to have their own ``Signed-off-by`` added unless > + their code suggestion was unusually large. It is still a good policy to include attribution, e.g. by adding a Suggested-by tag. > + > + * Both contributors work for the same employer and the employer > + requires copyright assignment. > + > + It can be said that in this case a ``Signed-off-by`` is indicating > + that the person has permission to contributeo from their employer contribute > + who is the copyright holder. It is none the less still preferrable > + to include a ``Signed-off-by`` for each contributor, as in some > + countries employees are not able to assign copyright to their > + employer, and it also covers any time invested outside working > + hours. > + > +Other commit tags > +~~~~~~~~~~~~~~~~~ > + > +While the ``Signed-off-by`` tag is mandatory, there are a number of > +other tags that are commonly used during QEMU development > + > + * **``Reviewed-by``**: when a QEMU community member reviews a patch > + on the mailing list, if they consider the patch acceptable, they > + should send an email reply containing a ``Reviewed-by`` tag. > + > + NB: a subsystem maintainer sending a pull request would replace > + their own ``Reviewed-by`` with another ``Signed-off-by`` > + > + * **``Acked-by``**: when a QEMU subsystem maintainer approves a patch > + that touches their subsystem, but intends to allow a different > + maintainer to queue it and send a pull request, they would send > + a mail containing a ``Acked-by`` tag. > + > + * **``Tested-by``**: when a QEMU community member has functionally > + tested the behaviour of the patch in some manner, they should > + send an email reply conmtaning a ``Tested-by`` tag. > + > + * **``Reported-by``**: when a QEMU community member reports a problem > + via the mailing list, or some other informal channel that is not > + the issue tracker, it is good practice to credit them by including > + a ``Reported-by`` tag on any patch fixing the issue. When the > + problem is reported via the GitLab issue tracker, however, it is > + sufficient to just include a link to the issue. Suggested-by is also common. As long as we are here, let's document Fixes: and Cc: ? > +Subsystem maintainer requirements > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +When a subsystem maintainer accepts a patch from a contributor, in > +addition to the normal code review points, they are expected to validate > +the presence of suitable ``Signed-off-by`` tags. > + > +At the time they queue the patch in their subsystem tree, the maintainer > +**MUST** also then add their own ``Signed-off-by`` to indicate that they > +have done the aforementioned validation. Below you say **must** - I think that is better, no need to shout. > + > +The subsystem maintainer submitting a pull request is **NOT** expected to > +have a ``Reviewed-by`` tag on the patch, since this is implied by their > +own ``Signed-off-by``. > + > +Tools for adding ``Signed-of-by`` Signed-off-by > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +There are a variety of ways tools can support adding ``Signed-off-by`` > +tags for patches, avoiding the need for contributors to manually > +type in this repetitive text each time. > + > +git commands > +^^^^^^^^^^^^ > + > +When creating, or amending, a commit the ``-s`` flag to ``git commit`` > +will append a suitable line matching the configuring git author > +details. > + > +If preparing patches using the ``git format-patch`` tool, the ``-s`` > +flag can be used to append a suitable line in the emails it creates, > +without modifying the local commits. Alternatively to modify the > +local commits on a branch en-mass:: > + > + git rebase master -x 'git commit --amend --no-edit -s' > + > +emacs > +^^^^^ > + > +In the file ``$HOME/.emacs.d/abbrev_defs`` add:: > + > + (define-abbrev-table 'global-abbrev-table > + '( > + ("8rev" "Reviewed-by: YOUR NAME <your@email.addr>" nil 1) > + ("8ack" "Acked-by: YOUR NAME <your@email.addr>" nil 1) > + ("8test" "Tested-by: YOUR NAME <your@email.addr>" nil 1) > + ("8sob" "Signed-off-by: YOUR NAME <your@email.addr>" nil 1) > + )) > + > +with this change, if you type (for example) ``8rev`` followed > +by ``<space>`` or ``<enter>`` it will expand to the whole phrase. > + > +vim > +^^^ > + > +In the file ``$HOME/.vimrc`` add:: > + > + iabbrev 8rev Reviewed-by: YOUR NAME <your@email.addr> > + iabbrev 8ack Acked-by: YOUR NAME <your@email.addr> > + iabbrev 8test Tested-by: YOUR NAME <your@email.addr> > + iabbrev 8sob Signed-off-by: YOUR NAME <your@email.addr> > + > +with this change, if you type (for example) ``8rev`` followed > +by ``<space>`` or ``<enter>`` it will expand to the whole phrase. > + > +Re-starting abandoned work > +~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +For a variety of reasons there are some patches that get submitted to > +QEMU but never merged. An unrelated contributor may decide (months or > +years later) to continue working from the abandoned patch and re-submit > +it with extra changes. > + > +If the abandoned patch already had a ``Signed-off-by`` from the original > +author this **must** be preserved. The new contributor **must** then add > +their own ``Signed-off-by`` after the original one if they made any > +further changes to it. It is common to include a comment just prior to > +the new ``Signed-off-by`` indicating what extra changes were made. For > +example:: > + > + Signed-off-by: Some Person <some.person@example.com> > + [Rebased and added support for 'foo'] > + Signed-off-by: New Person <new.person@example.com> > diff --git a/docs/devel/index-process.rst b/docs/devel/index-process.rst > index 362f97ee30..b54e58105e 100644 > --- a/docs/devel/index-process.rst > +++ b/docs/devel/index-process.rst > @@ -13,6 +13,7 @@ Notes about how to interact with the community and how and where to submit patch > maintainers > style > submitting-a-patch > + code-provenance > trivial-patches > stable-process > submitting-a-pull-request > diff --git a/docs/devel/submitting-a-patch.rst b/docs/devel/submitting-a-patch.rst > index c641d948f1..ec541b3d15 100644 > --- a/docs/devel/submitting-a-patch.rst > +++ b/docs/devel/submitting-a-patch.rst > @@ -322,21 +322,9 @@ Patch emails must include a ``Signed-off-by:`` line > > Your patches **must** include a Signed-off-by: line. This is a hard > requirement because it's how you say "I'm legally okay to contribute > -this and happy for it to go into QEMU". The process is modelled after > -the `Linux kernel > -<http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches?id=f6f94e2ab1b33f0082ac22d71f66385a60d8157f#n297>`__ > -policy. > - > -If you wrote the patch, make sure your "From:" and "Signed-off-by:" > -lines use the same spelling. It's okay if you subscribe or contribute to > -the list via more than one address, but using multiple addresses in one > -commit just confuses things. If someone else wrote the patch, git will > -include a "From:" line in the body of the email (different from your > -envelope From:) that will give credit to the correct author; but again, > -that author's Signed-off-by: line is mandatory, with the same spelling. > - > -There are various tooling options for automatically adding these tags > -include using ``git commit -s`` or ``git format-patch -s``. For more > +this and happy for it to go into QEMU". For full guidance, read the > +:ref:`code-provenance` documentation. > + > information see `SubmittingPatches 1.12 > <http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches?id=f6f94e2ab1b33f0082ac22d71f66385a60d8157f#n297>`__. this "information" now looks orphaned or am I confused? > -- > 2.41.0 ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off 2023-11-23 14:25 ` Michael S. Tsirkin @ 2023-11-23 17:16 ` Daniel P. Berrangé 2023-11-23 17:33 ` Michael S. Tsirkin 2023-11-24 9:49 ` Kevin Wolf 0 siblings, 2 replies; 57+ messages in thread From: Daniel P. Berrangé @ 2023-11-23 17:16 UTC (permalink / raw) To: Michael S. Tsirkin Cc: qemu-devel, Richard Henderson, Alexander Graf, Alex Bennée, Paolo Bonzini, Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell On Thu, Nov 23, 2023 at 09:25:13AM -0500, Michael S. Tsirkin wrote: > On Thu, Nov 23, 2023 at 11:40:25AM +0000, Daniel P. Berrangé wrote: > > Currently we have a short paragraph saying that patches must include > > a Signed-off-by line, and merely link to the kernel documentation. > > The linked kernel docs have alot of content beyond the part about > > sign-off an thus is misleading/distracting to QEMU contributors. > > > > This introduces a dedicated 'code-provenance' page in QEMU talking > > about why we require sign-off, explaining the other tags we commonly > > use, and what to do in some edge cases. > > > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> > > > + * The non-primary author's contributions were so trivial that > > + they can be considered not subject to copyright. In this case > > + the secondary authors need not include a ``Signed-off-by``. > > + > > + This case most commonly applies where QEMU reviewers give short > > + snippets of code as suggested fixes to a patch. The reviewers > > + don't need to have their own ``Signed-off-by`` added unless > > + their code suggestion was unusually large. > > It is still a good policy to include attribution, e.g. > by adding a Suggested-by tag. Will add this tag. > > +Other commit tags > > +~~~~~~~~~~~~~~~~~ > > + > > +While the ``Signed-off-by`` tag is mandatory, there are a number of > > +other tags that are commonly used during QEMU development > > + > > + * **``Reviewed-by``**: when a QEMU community member reviews a patch > > + on the mailing list, if they consider the patch acceptable, they > > + should send an email reply containing a ``Reviewed-by`` tag. > > + > > + NB: a subsystem maintainer sending a pull request would replace > > + their own ``Reviewed-by`` with another ``Signed-off-by`` > > + > > + * **``Acked-by``**: when a QEMU subsystem maintainer approves a patch > > + that touches their subsystem, but intends to allow a different > > + maintainer to queue it and send a pull request, they would send > > + a mail containing a ``Acked-by`` tag. > > + > > + * **``Tested-by``**: when a QEMU community member has functionally > > + tested the behaviour of the patch in some manner, they should > > + send an email reply conmtaning a ``Tested-by`` tag. > > + > > + * **``Reported-by``**: when a QEMU community member reports a problem > > + via the mailing list, or some other informal channel that is not > > + the issue tracker, it is good practice to credit them by including > > + a ``Reported-by`` tag on any patch fixing the issue. When the > > + problem is reported via the GitLab issue tracker, however, it is > > + sufficient to just include a link to the issue. > > > Suggested-by is also common. > > As long as we are here, let's document Fixes: and Cc: ? The submitting-a-patch doc covers more general commit message information. I think this doc just ought to focus on tags that identify humans involved in the process. I've never been sure what the point of the 'Cc' tag is, when you actually want to use the Cc email header ? > > diff --git a/docs/devel/submitting-a-patch.rst b/docs/devel/submitting-a-patch.rst > > index c641d948f1..ec541b3d15 100644 > > --- a/docs/devel/submitting-a-patch.rst > > +++ b/docs/devel/submitting-a-patch.rst > > @@ -322,21 +322,9 @@ Patch emails must include a ``Signed-off-by:`` line > > > > Your patches **must** include a Signed-off-by: line. This is a hard > > requirement because it's how you say "I'm legally okay to contribute > > -this and happy for it to go into QEMU". The process is modelled after > > -the `Linux kernel > > -<http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches?id=f6f94e2ab1b33f0082ac22d71f66385a60d8157f#n297>`__ > > -policy. > > - > > -If you wrote the patch, make sure your "From:" and "Signed-off-by:" > > -lines use the same spelling. It's okay if you subscribe or contribute to > > -the list via more than one address, but using multiple addresses in one > > -commit just confuses things. If someone else wrote the patch, git will > > -include a "From:" line in the body of the email (different from your > > -envelope From:) that will give credit to the correct author; but again, > > -that author's Signed-off-by: line is mandatory, with the same spelling. > > - > > -There are various tooling options for automatically adding these tags > > -include using ``git commit -s`` or ``git format-patch -s``. For more > > +this and happy for it to go into QEMU". For full guidance, read the > > +:ref:`code-provenance` documentation. > > + > > information see `SubmittingPatches 1.12 > > <http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches?id=f6f94e2ab1b33f0082ac22d71f66385a60d8157f#n297>`__. > > this "information" now looks orphaned or am I confused? Yes, forgot to cull it. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off 2023-11-23 17:16 ` Daniel P. Berrangé @ 2023-11-23 17:33 ` Michael S. Tsirkin 2023-11-24 11:11 ` Philippe Mathieu-Daudé 2023-11-24 9:49 ` Kevin Wolf 1 sibling, 1 reply; 57+ messages in thread From: Michael S. Tsirkin @ 2023-11-23 17:33 UTC (permalink / raw) To: Daniel P. Berrangé Cc: qemu-devel, Richard Henderson, Alexander Graf, Alex Bennée, Paolo Bonzini, Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell On Thu, Nov 23, 2023 at 05:16:45PM +0000, Daniel P. Berrangé wrote: > On Thu, Nov 23, 2023 at 09:25:13AM -0500, Michael S. Tsirkin wrote: > > On Thu, Nov 23, 2023 at 11:40:25AM +0000, Daniel P. Berrangé wrote: > > > Currently we have a short paragraph saying that patches must include > > > a Signed-off-by line, and merely link to the kernel documentation. > > > The linked kernel docs have alot of content beyond the part about > > > sign-off an thus is misleading/distracting to QEMU contributors. > > > > > > This introduces a dedicated 'code-provenance' page in QEMU talking > > > about why we require sign-off, explaining the other tags we commonly > > > use, and what to do in some edge cases. > > > > > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> > > > > > > + * The non-primary author's contributions were so trivial that > > > + they can be considered not subject to copyright. In this case > > > + the secondary authors need not include a ``Signed-off-by``. > > > + > > > + This case most commonly applies where QEMU reviewers give short > > > + snippets of code as suggested fixes to a patch. The reviewers > > > + don't need to have their own ``Signed-off-by`` added unless > > > + their code suggestion was unusually large. > > > > It is still a good policy to include attribution, e.g. > > by adding a Suggested-by tag. > > Will add this tag. > > > > > +Other commit tags > > > +~~~~~~~~~~~~~~~~~ > > > + > > > +While the ``Signed-off-by`` tag is mandatory, there are a number of > > > +other tags that are commonly used during QEMU development > > > + > > > + * **``Reviewed-by``**: when a QEMU community member reviews a patch > > > + on the mailing list, if they consider the patch acceptable, they > > > + should send an email reply containing a ``Reviewed-by`` tag. > > > + > > > + NB: a subsystem maintainer sending a pull request would replace > > > + their own ``Reviewed-by`` with another ``Signed-off-by`` > > > + > > > + * **``Acked-by``**: when a QEMU subsystem maintainer approves a patch > > > + that touches their subsystem, but intends to allow a different > > > + maintainer to queue it and send a pull request, they would send > > > + a mail containing a ``Acked-by`` tag. > > > + > > > + * **``Tested-by``**: when a QEMU community member has functionally > > > + tested the behaviour of the patch in some manner, they should > > > + send an email reply conmtaning a ``Tested-by`` tag. > > > + > > > + * **``Reported-by``**: when a QEMU community member reports a problem > > > + via the mailing list, or some other informal channel that is not > > > + the issue tracker, it is good practice to credit them by including > > > + a ``Reported-by`` tag on any patch fixing the issue. When the > > > + problem is reported via the GitLab issue tracker, however, it is > > > + sufficient to just include a link to the issue. > > > > > > Suggested-by is also common. > > > > As long as we are here, let's document Fixes: and Cc: ? > > The submitting-a-patch doc covers more general commit message information. > I think this doc just ought to focus on tags that identify humans involved > in the process. > > I've never been sure what the point of the 'Cc' tag is, when you actually > want to use the Cc email header ? > It records the fact that these people have been copied but did not respond. > > > diff --git a/docs/devel/submitting-a-patch.rst b/docs/devel/submitting-a-patch.rst > > > index c641d948f1..ec541b3d15 100644 > > > --- a/docs/devel/submitting-a-patch.rst > > > +++ b/docs/devel/submitting-a-patch.rst > > > @@ -322,21 +322,9 @@ Patch emails must include a ``Signed-off-by:`` line > > > > > > Your patches **must** include a Signed-off-by: line. This is a hard > > > requirement because it's how you say "I'm legally okay to contribute > > > -this and happy for it to go into QEMU". The process is modelled after > > > -the `Linux kernel > > > -<http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches?id=f6f94e2ab1b33f0082ac22d71f66385a60d8157f#n297>`__ > > > -policy. > > > - > > > -If you wrote the patch, make sure your "From:" and "Signed-off-by:" > > > -lines use the same spelling. It's okay if you subscribe or contribute to > > > -the list via more than one address, but using multiple addresses in one > > > -commit just confuses things. If someone else wrote the patch, git will > > > -include a "From:" line in the body of the email (different from your > > > -envelope From:) that will give credit to the correct author; but again, > > > -that author's Signed-off-by: line is mandatory, with the same spelling. > > > - > > > -There are various tooling options for automatically adding these tags > > > -include using ``git commit -s`` or ``git format-patch -s``. For more > > > +this and happy for it to go into QEMU". For full guidance, read the > > > +:ref:`code-provenance` documentation. > > > + > > > information see `SubmittingPatches 1.12 > > > <http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches?id=f6f94e2ab1b33f0082ac22d71f66385a60d8157f#n297>`__. > > > > this "information" now looks orphaned or am I confused? > > Yes, forgot to cull it. > > With regards, > Daniel > -- > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- https://fstop138.berrange.com :| > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off 2023-11-23 17:33 ` Michael S. Tsirkin @ 2023-11-24 11:11 ` Philippe Mathieu-Daudé 2023-11-24 11:27 ` Michael S. Tsirkin 0 siblings, 1 reply; 57+ messages in thread From: Philippe Mathieu-Daudé @ 2023-11-24 11:11 UTC (permalink / raw) To: Michael S. Tsirkin, Daniel P. Berrangé Cc: qemu-devel, Richard Henderson, Alexander Graf, Alex Bennée, Paolo Bonzini, Markus Armbruster, Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell On 23/11/23 18:33, Michael S. Tsirkin wrote: > On Thu, Nov 23, 2023 at 05:16:45PM +0000, Daniel P. Berrangé wrote: >> On Thu, Nov 23, 2023 at 09:25:13AM -0500, Michael S. Tsirkin wrote: >>> On Thu, Nov 23, 2023 at 11:40:25AM +0000, Daniel P. Berrangé wrote: >>>> Currently we have a short paragraph saying that patches must include >>>> a Signed-off-by line, and merely link to the kernel documentation. >>>> The linked kernel docs have alot of content beyond the part about >>>> sign-off an thus is misleading/distracting to QEMU contributors. >>>> >>>> This introduces a dedicated 'code-provenance' page in QEMU talking >>>> about why we require sign-off, explaining the other tags we commonly >>>> use, and what to do in some edge cases. >>>> >>>> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> >>> >> >>>> + * The non-primary author's contributions were so trivial that >>>> + they can be considered not subject to copyright. In this case >>>> + the secondary authors need not include a ``Signed-off-by``. >>>> + >>>> + This case most commonly applies where QEMU reviewers give short >>>> + snippets of code as suggested fixes to a patch. The reviewers >>>> + don't need to have their own ``Signed-off-by`` added unless >>>> + their code suggestion was unusually large. >>> >>> It is still a good policy to include attribution, e.g. >>> by adding a Suggested-by tag. >> >> Will add this tag. Thanks! >>>> +Other commit tags >>>> +~~~~~~~~~~~~~~~~~ >>> As long as we are here, let's document Fixes: and Cc: ? >> >> The submitting-a-patch doc covers more general commit message information. >> I think this doc just ought to focus on tags that identify humans involved >> in the process. >> >> I've never been sure what the point of the 'Cc' tag is, when you actually >> want to use the Cc email header ? >> > > It records the fact that these people have been copied but did not > respond. This might be felt aggressive or forcing. My understanding of this Cc tag in a commit is "now that it is merged, you can't complain". We can be absent, sick, on holidays... If I missed a merged patch review I'll try to kindly ask on the list if it can be reworked, or suggest a patch to fix what I missed. Not sure this is really useful to commit that to the repository. IMHO the only useful Cc tag is for qemu-stable@nongnu.org, as Kevin mentioned. If you want to be sure your patch is Cc to a set of developers, you can add Cc: lines below the '---' patch separator. My 2 cents eh... Regards, Phil. ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off 2023-11-24 11:11 ` Philippe Mathieu-Daudé @ 2023-11-24 11:27 ` Michael S. Tsirkin 0 siblings, 0 replies; 57+ messages in thread From: Michael S. Tsirkin @ 2023-11-24 11:27 UTC (permalink / raw) To: Philippe Mathieu-Daudé Cc: Daniel P. Berrangé, qemu-devel, Richard Henderson, Alexander Graf, Alex Bennée, Paolo Bonzini, Markus Armbruster, Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell On Fri, Nov 24, 2023 at 12:11:30PM +0100, Philippe Mathieu-Daudé wrote: > On 23/11/23 18:33, Michael S. Tsirkin wrote: > > On Thu, Nov 23, 2023 at 05:16:45PM +0000, Daniel P. Berrangé wrote: > > > On Thu, Nov 23, 2023 at 09:25:13AM -0500, Michael S. Tsirkin wrote: > > > > On Thu, Nov 23, 2023 at 11:40:25AM +0000, Daniel P. Berrangé wrote: > > > > > Currently we have a short paragraph saying that patches must include > > > > > a Signed-off-by line, and merely link to the kernel documentation. > > > > > The linked kernel docs have alot of content beyond the part about > > > > > sign-off an thus is misleading/distracting to QEMU contributors. > > > > > > > > > > This introduces a dedicated 'code-provenance' page in QEMU talking > > > > > about why we require sign-off, explaining the other tags we commonly > > > > > use, and what to do in some edge cases. > > > > > > > > > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> > > > > > > > > > > > > + * The non-primary author's contributions were so trivial that > > > > > + they can be considered not subject to copyright. In this case > > > > > + the secondary authors need not include a ``Signed-off-by``. > > > > > + > > > > > + This case most commonly applies where QEMU reviewers give short > > > > > + snippets of code as suggested fixes to a patch. The reviewers > > > > > + don't need to have their own ``Signed-off-by`` added unless > > > > > + their code suggestion was unusually large. > > > > > > > > It is still a good policy to include attribution, e.g. > > > > by adding a Suggested-by tag. > > > > > > Will add this tag. > > Thanks! > > > > > > +Other commit tags > > > > > +~~~~~~~~~~~~~~~~~ > > > > > > As long as we are here, let's document Fixes: and Cc: ? > > > > > > The submitting-a-patch doc covers more general commit message information. > > > I think this doc just ought to focus on tags that identify humans involved > > > in the process. > > > > > > I've never been sure what the point of the 'Cc' tag is, when you actually > > > want to use the Cc email header ? > > > > > > > It records the fact that these people have been copied but did not > > respond. > This might be felt aggressive or forcing. > My understanding of this Cc > tag in a commit is "now that it is merged, you can't complain". We can > be absent, sick, on holidays... If I missed a merged patch review I'll > try to kindly ask on the list if it can be reworked, or suggest a patch > to fix what I missed. > Not sure this is really useful to commit that to the repository. I don't see it as forcing. Sometimes I do a fly-by review of a patch that caught my eye not in my area. Later people address my comments and start copying me but I don't have time to re-review. Recoding the fact that they copied me seems important. This info might be helpful in git history for other reasons - helps looking for someone to help review backports - to guess at code quality - to help understand whether code had all the needed people copied > > IMHO the only useful Cc tag is for qemu-stable@nongnu.org, as Kevin > mentioned. > > If you want to be sure your patch is Cc to a set of developers, you can > add Cc: lines below the '---' patch separator. My 2 cents eh... > > Regards, > > Phil. If people feel threatened by CC I don't have a problem to ask people to put it in a note so it comes after ---. -- MST ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off 2023-11-23 17:16 ` Daniel P. Berrangé 2023-11-23 17:33 ` Michael S. Tsirkin @ 2023-11-24 9:49 ` Kevin Wolf 1 sibling, 0 replies; 57+ messages in thread From: Kevin Wolf @ 2023-11-24 9:49 UTC (permalink / raw) To: Daniel P. Berrangé Cc: Michael S. Tsirkin, qemu-devel, Richard Henderson, Alexander Graf, Alex Bennée, Paolo Bonzini, Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell Am 23.11.2023 um 18:16 hat Daniel P. Berrangé geschrieben: > > Suggested-by is also common. > > > > As long as we are here, let's document Fixes: and Cc: ? > > The submitting-a-patch doc covers more general commit message information. > I think this doc just ought to focus on tags that identify humans involved > in the process. > > I've never been sure what the point of the 'Cc' tag is, when you actually > want to use the Cc email header ? By default, git-send-email automatically copies the addresses mentioned with Cc: in the commit message, so I always assumed that this is what people intend with these tags. Of course, in practice many of us have suppresscc = "all" in their config to avoid downstream accidents, so maybe there is another use? The only time I use it is for "Cc: qemu-stable@nongnu.org". I'm not sure if it still works like this, but people used to look for this in the commit history when preparing stable releases. (It's useful because sometimes people forget to actually CC the qemu-stable list when sending the patches.) Kevin ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off 2023-11-23 11:40 ` [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off Daniel P. Berrangé ` (3 preceding siblings ...) 2023-11-23 14:25 ` Michael S. Tsirkin @ 2023-11-23 15:13 ` Stefan Hajnoczi 2024-01-27 14:36 ` Zhao Liu 5 siblings, 0 replies; 57+ messages in thread From: Stefan Hajnoczi @ 2023-11-23 15:13 UTC (permalink / raw) To: Daniel P. Berrangé Cc: qemu-devel, Richard Henderson, Alexander Graf, Alex Bennée, Paolo Bonzini, Michael S. Tsirkin, Markus Armbruster, Phil Mathieu-Daudé, Thomas Huth, Kevin Wolf, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell [-- Attachment #1: Type: text/plain, Size: 11979 bytes --] On Thu, Nov 23, 2023 at 11:40:25AM +0000, Daniel P. Berrangé wrote: > Currently we have a short paragraph saying that patches must include > a Signed-off-by line, and merely link to the kernel documentation. > The linked kernel docs have alot of content beyond the part about > sign-off an thus is misleading/distracting to QEMU contributors. > > This introduces a dedicated 'code-provenance' page in QEMU talking > about why we require sign-off, explaining the other tags we commonly > use, and what to do in some edge cases. > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> > --- > docs/devel/code-provenance.rst | 197 ++++++++++++++++++++++++++++++ > docs/devel/index-process.rst | 1 + > docs/devel/submitting-a-patch.rst | 18 +-- > 3 files changed, 201 insertions(+), 15 deletions(-) > create mode 100644 docs/devel/code-provenance.rst > > diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst > new file mode 100644 > index 0000000000..b4591a2dec > --- /dev/null > +++ b/docs/devel/code-provenance.rst > @@ -0,0 +1,197 @@ > +.. _code-provenance: > + > +Code provenance > +=============== > + > +Certifying patch submissions > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +The QEMU community **mandates** all contributors to certify provenance > +of patch submissions they make to the project. To put it another way, > +contributors must indicate that they are legally permitted to contribute > +to the project. > + > +Certification is achieved with a low overhead by adding a single line > +to the bottom of every git commit:: > + > + Signed-off-by: YOUR NAME <YOUR@EMAIL> > + > +This existence of this line asserts that the author of the patch is > +contributing in accordance with the `Developer's Certificate of > +Origin <https://developercertifcate.org>`__: > + > +.. _dco: > + > +:: > + Developer's Certificate of Origin 1.1 > + > + By making a contribution to this project, I certify that: > + > + (a) The contribution was created in whole or in part by me and I > + have the right to submit it under the open source license > + indicated in the file; or > + > + (b) The contribution is based upon previous work that, to the best > + of my knowledge, is covered under an appropriate open source > + license and I have the right under that license to submit that > + work with modifications, whether created in whole or in part > + by me, under the same open source license (unless I am > + permitted to submit under a different license), as indicated > + in the file; or > + > + (c) The contribution was provided directly to me by some other > + person who certified (a), (b) or (c) and I have not modified > + it. > + > + (d) I understand and agree that this project and the contribution > + are public and that a record of the contribution (including all > + personal information I submit with it, including my sign-off) is > + maintained indefinitely and may be redistributed consistent with > + this project or the open source license(s) involved. > + > +It is generally expected that the name and email addresses used in one > +of the ``Signed-off-by`` lines, matches that of the git commit ``Author`` > +field. If the person sending the mail is also one of the patch authors, > +it is further expected that the mail ``From:`` line name & address match > +one of the ``Signed-off-by`` lines. > + > +Multiple authorship > +~~~~~~~~~~~~~~~~~~~ > + > +It is not uncommon for a patch to have contributions from multiple > +authors. In such a scenario, a git commit will usually be expected > +to have a ``Signed-off-by`` line for each contributor involved in > +creatin of the patch. Some edge cases: > + > + * The non-primary author's contributions were so trivial that > + they can be considered not subject to copyright. In this case > + the secondary authors need not include a ``Signed-off-by``. > + > + This case most commonly applies where QEMU reviewers give short > + snippets of code as suggested fixes to a patch. The reviewers > + don't need to have their own ``Signed-off-by`` added unless > + their code suggestion was unusually large. > + > + * Both contributors work for the same employer and the employer > + requires copyright assignment. > + > + It can be said that in this case a ``Signed-off-by`` is indicating > + that the person has permission to contributeo from their employer s/contributeo/contribute/ > + who is the copyright holder. It is none the less still preferrable > + to include a ``Signed-off-by`` for each contributor, as in some > + countries employees are not able to assign copyright to their > + employer, and it also covers any time invested outside working > + hours. > + > +Other commit tags > +~~~~~~~~~~~~~~~~~ > + > +While the ``Signed-off-by`` tag is mandatory, there are a number of > +other tags that are commonly used during QEMU development > + > + * **``Reviewed-by``**: when a QEMU community member reviews a patch > + on the mailing list, if they consider the patch acceptable, they > + should send an email reply containing a ``Reviewed-by`` tag. > + > + NB: a subsystem maintainer sending a pull request would replace > + their own ``Reviewed-by`` with another ``Signed-off-by`` > + > + * **``Acked-by``**: when a QEMU subsystem maintainer approves a patch > + that touches their subsystem, but intends to allow a different > + maintainer to queue it and send a pull request, they would send > + a mail containing a ``Acked-by`` tag. > + > + * **``Tested-by``**: when a QEMU community member has functionally > + tested the behaviour of the patch in some manner, they should > + send an email reply conmtaning a ``Tested-by`` tag. s/conmtaning/containing/ > + > + * **``Reported-by``**: when a QEMU community member reports a problem > + via the mailing list, or some other informal channel that is not > + the issue tracker, it is good practice to credit them by including > + a ``Reported-by`` tag on any patch fixing the issue. When the > + problem is reported via the GitLab issue tracker, however, it is > + sufficient to just include a link to the issue. > + > +Subsystem maintainer requirements > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +When a subsystem maintainer accepts a patch from a contributor, in > +addition to the normal code review points, they are expected to validate > +the presence of suitable ``Signed-off-by`` tags. > + > +At the time they queue the patch in their subsystem tree, the maintainer > +**MUST** also then add their own ``Signed-off-by`` to indicate that they > +have done the aforementioned validation. > + > +The subsystem maintainer submitting a pull request is **NOT** expected to > +have a ``Reviewed-by`` tag on the patch, since this is implied by their > +own ``Signed-off-by``. > + > +Tools for adding ``Signed-of-by`` s/Signed-of-by/Signed-off-by/ > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +There are a variety of ways tools can support adding ``Signed-off-by`` > +tags for patches, avoiding the need for contributors to manually > +type in this repetitive text each time. > + > +git commands > +^^^^^^^^^^^^ > + > +When creating, or amending, a commit the ``-s`` flag to ``git commit`` > +will append a suitable line matching the configuring git author > +details. > + > +If preparing patches using the ``git format-patch`` tool, the ``-s`` > +flag can be used to append a suitable line in the emails it creates, > +without modifying the local commits. Alternatively to modify the > +local commits on a branch en-mass:: > + > + git rebase master -x 'git commit --amend --no-edit -s' > + > +emacs > +^^^^^ > + > +In the file ``$HOME/.emacs.d/abbrev_defs`` add:: > + > + (define-abbrev-table 'global-abbrev-table > + '( > + ("8rev" "Reviewed-by: YOUR NAME <your@email.addr>" nil 1) > + ("8ack" "Acked-by: YOUR NAME <your@email.addr>" nil 1) > + ("8test" "Tested-by: YOUR NAME <your@email.addr>" nil 1) > + ("8sob" "Signed-off-by: YOUR NAME <your@email.addr>" nil 1) > + )) > + > +with this change, if you type (for example) ``8rev`` followed > +by ``<space>`` or ``<enter>`` it will expand to the whole phrase. > + > +vim > +^^^ > + > +In the file ``$HOME/.vimrc`` add:: > + > + iabbrev 8rev Reviewed-by: YOUR NAME <your@email.addr> > + iabbrev 8ack Acked-by: YOUR NAME <your@email.addr> > + iabbrev 8test Tested-by: YOUR NAME <your@email.addr> > + iabbrev 8sob Signed-off-by: YOUR NAME <your@email.addr> > + > +with this change, if you type (for example) ``8rev`` followed > +by ``<space>`` or ``<enter>`` it will expand to the whole phrase. > + > +Re-starting abandoned work > +~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +For a variety of reasons there are some patches that get submitted to > +QEMU but never merged. An unrelated contributor may decide (months or > +years later) to continue working from the abandoned patch and re-submit > +it with extra changes. > + > +If the abandoned patch already had a ``Signed-off-by`` from the original > +author this **must** be preserved. The new contributor **must** then add > +their own ``Signed-off-by`` after the original one if they made any > +further changes to it. It is common to include a comment just prior to > +the new ``Signed-off-by`` indicating what extra changes were made. For > +example:: > + > + Signed-off-by: Some Person <some.person@example.com> > + [Rebased and added support for 'foo'] > + Signed-off-by: New Person <new.person@example.com> > diff --git a/docs/devel/index-process.rst b/docs/devel/index-process.rst > index 362f97ee30..b54e58105e 100644 > --- a/docs/devel/index-process.rst > +++ b/docs/devel/index-process.rst > @@ -13,6 +13,7 @@ Notes about how to interact with the community and how and where to submit patch > maintainers > style > submitting-a-patch > + code-provenance > trivial-patches > stable-process > submitting-a-pull-request > diff --git a/docs/devel/submitting-a-patch.rst b/docs/devel/submitting-a-patch.rst > index c641d948f1..ec541b3d15 100644 > --- a/docs/devel/submitting-a-patch.rst > +++ b/docs/devel/submitting-a-patch.rst > @@ -322,21 +322,9 @@ Patch emails must include a ``Signed-off-by:`` line > > Your patches **must** include a Signed-off-by: line. This is a hard > requirement because it's how you say "I'm legally okay to contribute > -this and happy for it to go into QEMU". The process is modelled after > -the `Linux kernel > -<http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches?id=f6f94e2ab1b33f0082ac22d71f66385a60d8157f#n297>`__ > -policy. > - > -If you wrote the patch, make sure your "From:" and "Signed-off-by:" > -lines use the same spelling. It's okay if you subscribe or contribute to > -the list via more than one address, but using multiple addresses in one > -commit just confuses things. If someone else wrote the patch, git will > -include a "From:" line in the body of the email (different from your > -envelope From:) that will give credit to the correct author; but again, > -that author's Signed-off-by: line is mandatory, with the same spelling. > - > -There are various tooling options for automatically adding these tags > -include using ``git commit -s`` or ``git format-patch -s``. For more > +this and happy for it to go into QEMU". For full guidance, read the > +:ref:`code-provenance` documentation. > + > information see `SubmittingPatches 1.12 > <http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches?id=f6f94e2ab1b33f0082ac22d71f66385a60d8157f#n297>`__. > > -- > 2.41.0 > [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off 2023-11-23 11:40 ` [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off Daniel P. Berrangé ` (4 preceding siblings ...) 2023-11-23 15:13 ` Stefan Hajnoczi @ 2024-01-27 14:36 ` Zhao Liu 2024-01-29 9:31 ` Daniel P. Berrangé 5 siblings, 1 reply; 57+ messages in thread From: Zhao Liu @ 2024-01-27 14:36 UTC (permalink / raw) To: Daniel P. Berrangé Cc: qemu-devel, Richard Henderson, Alexander Graf, Alex Bennée, Paolo Bonzini, Michael S. Tsirkin, Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell Hi Daniel, On Thu, Nov 23, 2023 at 11:40:25AM +0000, Daniel P. Berrangé wrote: > Date: Thu, 23 Nov 2023 11:40:25 +0000 > From: "Daniel P. Berrangé" <berrange@redhat.com> > Subject: [PATCH 1/2] docs: introduce dedicated page about code provenance / > sign-off > > Currently we have a short paragraph saying that patches must include > a Signed-off-by line, and merely link to the kernel documentation. > The linked kernel docs have alot of content beyond the part about > sign-off an thus is misleading/distracting to QEMU contributors. > > This introduces a dedicated 'code-provenance' page in QEMU talking > about why we require sign-off, explaining the other tags we commonly > use, and what to do in some edge cases. > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> > --- > docs/devel/code-provenance.rst | 197 ++++++++++++++++++++++++++++++ > docs/devel/index-process.rst | 1 + > docs/devel/submitting-a-patch.rst | 18 +-- > 3 files changed, 201 insertions(+), 15 deletions(-) > create mode 100644 docs/devel/code-provenance.rst > > diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst > new file mode 100644 > index 0000000000..b4591a2dec > --- /dev/null > +++ b/docs/devel/code-provenance.rst > @@ -0,0 +1,197 @@ > +.. _code-provenance: > + > +Code provenance > +=============== > + > +Certifying patch submissions > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +The QEMU community **mandates** all contributors to certify provenance > +of patch submissions they make to the project. To put it another way, > +contributors must indicate that they are legally permitted to contribute > +to the project. > + > +Certification is achieved with a low overhead by adding a single line > +to the bottom of every git commit:: > + > + Signed-off-by: YOUR NAME <YOUR@EMAIL> > + > +This existence of this line asserts that the author of the patch is > +contributing in accordance with the `Developer's Certificate of > +Origin <https://developercertifcate.org>`__: > + > +.. _dco: > + > +:: > + Developer's Certificate of Origin 1.1 > + > + By making a contribution to this project, I certify that: > + > + (a) The contribution was created in whole or in part by me and I > + have the right to submit it under the open source license > + indicated in the file; or > + > + (b) The contribution is based upon previous work that, to the best > + of my knowledge, is covered under an appropriate open source > + license and I have the right under that license to submit that > + work with modifications, whether created in whole or in part > + by me, under the same open source license (unless I am > + permitted to submit under a different license), as indicated > + in the file; or > + > + (c) The contribution was provided directly to me by some other > + person who certified (a), (b) or (c) and I have not modified > + it. > + > + (d) I understand and agree that this project and the contribution > + are public and that a record of the contribution (including all > + personal information I submit with it, including my sign-off) is > + maintained indefinitely and may be redistributed consistent with > + this project or the open source license(s) involved. > + > +It is generally expected that the name and email addresses used in one > +of the ``Signed-off-by`` lines, matches that of the git commit ``Author`` > +field. If the person sending the mail is also one of the patch authors, > +it is further expected that the mail ``From:`` line name & address match > +one of the ``Signed-off-by`` lines. > + > +Multiple authorship > +~~~~~~~~~~~~~~~~~~~ > + > +It is not uncommon for a patch to have contributions from multiple > +authors. In such a scenario, a git commit will usually be expected > +to have a ``Signed-off-by`` line for each contributor involved in > +creatin of the patch. Some edge cases: > + > + * The non-primary author's contributions were so trivial that > + they can be considered not subject to copyright. In this case > + the secondary authors need not include a ``Signed-off-by``. > + > + This case most commonly applies where QEMU reviewers give short > + snippets of code as suggested fixes to a patch. The reviewers > + don't need to have their own ``Signed-off-by`` added unless > + their code suggestion was unusually large. > + > + * Both contributors work for the same employer and the employer > + requires copyright assignment. > + > + It can be said that in this case a ``Signed-off-by`` is indicating > + that the person has permission to contributeo from their employer > + who is the copyright holder. For this case, maybe it needs the "Co-developed-by"? > It is none the less still preferrable > + to include a ``Signed-off-by`` for each contributor, as in some > + countries employees are not able to assign copyright to their > + employer, and it also covers any time invested outside working > + hours. > + > +Other commit tags > +~~~~~~~~~~~~~~~~~ > + > +While the ``Signed-off-by`` tag is mandatory, there are a number of > +other tags that are commonly used during QEMU development > + > + * **``Reviewed-by``**: when a QEMU community member reviews a patch > + on the mailing list, if they consider the patch acceptable, they > + should send an email reply containing a ``Reviewed-by`` tag. Maybe just a question, the people should drop the Reviewed/ACKed/Tested tags that have been obtained if he make the any code changes (including function/variable renaming) as well as commit message changes during the patch refresh process, am I understand correctly? ;-) > + > + NB: a subsystem maintainer sending a pull request would replace > + their own ``Reviewed-by`` with another ``Signed-off-by`` > + > + * **``Acked-by``**: when a QEMU subsystem maintainer approves a patch > + that touches their subsystem, but intends to allow a different > + maintainer to queue it and send a pull request, they would send > + a mail containing a ``Acked-by`` tag. > + > + * **``Tested-by``**: when a QEMU community member has functionally > + tested the behaviour of the patch in some manner, they should > + send an email reply conmtaning a ``Tested-by`` tag. Is there any requirement for the order of tags? My previous understanding was that if the Reviewed-by/Tested-by tags were obtained by the author within his company, then those tags should be placed before the signed-off-by of the author. If the Reviewed-by/ Tested-by were acquired in the community, then they should be placed after the author's signed-off-by, right? > + > + * **``Reported-by``**: when a QEMU community member reports a problem > + via the mailing list, or some other informal channel that is not > + the issue tracker, it is good practice to credit them by including > + a ``Reported-by`` tag on any patch fixing the issue. When the > + problem is reported via the GitLab issue tracker, however, it is > + sufficient to just include a link to the issue. > + > +Subsystem maintainer requirements > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +When a subsystem maintainer accepts a patch from a contributor, in > +addition to the normal code review points, they are expected to validate > +the presence of suitable ``Signed-off-by`` tags. > + > +At the time they queue the patch in their subsystem tree, the maintainer > +**MUST** also then add their own ``Signed-off-by`` to indicate that they > +have done the aforementioned validation. > + > +The subsystem maintainer submitting a pull request is **NOT** expected to > +have a ``Reviewed-by`` tag on the patch, since this is implied by their > +own ``Signed-off-by``. > + > +Tools for adding ``Signed-of-by`` > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +There are a variety of ways tools can support adding ``Signed-off-by`` > +tags for patches, avoiding the need for contributors to manually > +type in this repetitive text each time. > + > +git commands > +^^^^^^^^^^^^ > + > +When creating, or amending, a commit the ``-s`` flag to ``git commit`` > +will append a suitable line matching the configuring git author > +details. > + > +If preparing patches using the ``git format-patch`` tool, the ``-s`` > +flag can be used to append a suitable line in the emails it creates, > +without modifying the local commits. Alternatively to modify the > +local commits on a branch en-mass:: > + > + git rebase master -x 'git commit --amend --no-edit -s' > + > +emacs > +^^^^^ > + > +In the file ``$HOME/.emacs.d/abbrev_defs`` add:: > + > + (define-abbrev-table 'global-abbrev-table > + '( > + ("8rev" "Reviewed-by: YOUR NAME <your@email.addr>" nil 1) > + ("8ack" "Acked-by: YOUR NAME <your@email.addr>" nil 1) > + ("8test" "Tested-by: YOUR NAME <your@email.addr>" nil 1) > + ("8sob" "Signed-off-by: YOUR NAME <your@email.addr>" nil 1) > + )) > + > +with this change, if you type (for example) ``8rev`` followed > +by ``<space>`` or ``<enter>`` it will expand to the whole phrase. > + > +vim > +^^^ > + > +In the file ``$HOME/.vimrc`` add:: > + > + iabbrev 8rev Reviewed-by: YOUR NAME <your@email.addr> > + iabbrev 8ack Acked-by: YOUR NAME <your@email.addr> > + iabbrev 8test Tested-by: YOUR NAME <your@email.addr> > + iabbrev 8sob Signed-off-by: YOUR NAME <your@email.addr> > + > +with this change, if you type (for example) ``8rev`` followed > +by ``<space>`` or ``<enter>`` it will expand to the whole phrase. > + > +Re-starting abandoned work > +~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +For a variety of reasons there are some patches that get submitted to > +QEMU but never merged. An unrelated contributor may decide (months or > +years later) to continue working from the abandoned patch and re-submit > +it with extra changes. > + > +If the abandoned patch already had a ``Signed-off-by`` from the original > +author this **must** be preserved. I find some people added Originally-by, e.g., 8e86851bd6b9. I guess if the code has been changed very significantly, or if the original implementation has just been referenced and significantly refactored, then Originally-by should be preferred instead of Signed-off-by from the original author, right? Thanks, Zhao > The new contributor **must** then add > +their own ``Signed-off-by`` after the original one if they made any > +further changes to it. It is common to include a comment just prior to > +the new ``Signed-off-by`` indicating what extra changes were made. For > +example:: > + > + Signed-off-by: Some Person <some.person@example.com> > + [Rebased and added support for 'foo'] > + Signed-off-by: New Person <new.person@example.com> > diff --git a/docs/devel/index-process.rst b/docs/devel/index-process.rst > index 362f97ee30..b54e58105e 100644 > --- a/docs/devel/index-process.rst > +++ b/docs/devel/index-process.rst > @@ -13,6 +13,7 @@ Notes about how to interact with the community and how and where to submit patch > maintainers > style > submitting-a-patch > + code-provenance > trivial-patches > stable-process > submitting-a-pull-request > diff --git a/docs/devel/submitting-a-patch.rst b/docs/devel/submitting-a-patch.rst > index c641d948f1..ec541b3d15 100644 > --- a/docs/devel/submitting-a-patch.rst > +++ b/docs/devel/submitting-a-patch.rst > @@ -322,21 +322,9 @@ Patch emails must include a ``Signed-off-by:`` line > > Your patches **must** include a Signed-off-by: line. This is a hard > requirement because it's how you say "I'm legally okay to contribute > -this and happy for it to go into QEMU". The process is modelled after > -the `Linux kernel > -<http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches?id=f6f94e2ab1b33f0082ac22d71f66385a60d8157f#n297>`__ > -policy. > - > -If you wrote the patch, make sure your "From:" and "Signed-off-by:" > -lines use the same spelling. It's okay if you subscribe or contribute to > -the list via more than one address, but using multiple addresses in one > -commit just confuses things. If someone else wrote the patch, git will > -include a "From:" line in the body of the email (different from your > -envelope From:) that will give credit to the correct author; but again, > -that author's Signed-off-by: line is mandatory, with the same spelling. > - > -There are various tooling options for automatically adding these tags > -include using ``git commit -s`` or ``git format-patch -s``. For more > +this and happy for it to go into QEMU". For full guidance, read the > +:ref:`code-provenance` documentation. > + > information see `SubmittingPatches 1.12 > <http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches?id=f6f94e2ab1b33f0082ac22d71f66385a60d8157f#n297>`__. > > -- > 2.41.0 > > ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off 2024-01-27 14:36 ` Zhao Liu @ 2024-01-29 9:31 ` Daniel P. Berrangé 2024-01-29 9:35 ` Samuel Tardieu 0 siblings, 1 reply; 57+ messages in thread From: Daniel P. Berrangé @ 2024-01-29 9:31 UTC (permalink / raw) To: Zhao Liu Cc: qemu-devel, Richard Henderson, Alexander Graf, Alex Bennée, Paolo Bonzini, Michael S. Tsirkin, Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell On Sat, Jan 27, 2024 at 10:36:24PM +0800, Zhao Liu wrote: > Hi Daniel, > > On Thu, Nov 23, 2023 at 11:40:25AM +0000, Daniel P. Berrangé wrote: > > +Multiple authorship > > +~~~~~~~~~~~~~~~~~~~ > > + > > +It is not uncommon for a patch to have contributions from multiple > > +authors. In such a scenario, a git commit will usually be expected > > +to have a ``Signed-off-by`` line for each contributor involved in > > +creatin of the patch. Some edge cases: > > + > > + * The non-primary author's contributions were so trivial that > > + they can be considered not subject to copyright. In this case > > + the secondary authors need not include a ``Signed-off-by``. > > + > > + This case most commonly applies where QEMU reviewers give short > > + snippets of code as suggested fixes to a patch. The reviewers > > + don't need to have their own ``Signed-off-by`` added unless > > + their code suggestion was unusually large. > > + > > + * Both contributors work for the same employer and the employer > > + requires copyright assignment. > > + > > + It can be said that in this case a ``Signed-off-by`` is indicating > > + that the person has permission to contributeo from their employer > > + who is the copyright holder. > > For this case, maybe it needs the "Co-developed-by"? If you're going to go to the trouble of adding multiple tags to the commit for each author who participated, then IMHO they should all be Signed-off-by. IOW, either just have S-o-B from the main author within a company, or have S-o-B for every author. Co-developed-by doesn't have value IMHO. > > It is none the less still preferrable > > + to include a ``Signed-off-by`` for each contributor, as in some > > + countries employees are not able to assign copyright to their > > + employer, and it also covers any time invested outside working > > + hours. > > + > > +Other commit tags > > +~~~~~~~~~~~~~~~~~ > > + > > +While the ``Signed-off-by`` tag is mandatory, there are a number of > > +other tags that are commonly used during QEMU development > > + > > + * **``Reviewed-by``**: when a QEMU community member reviews a patch > > + on the mailing list, if they consider the patch acceptable, they > > + should send an email reply containing a ``Reviewed-by`` tag. > > Maybe just a question, the people should drop the Reviewed/ACKed/Tested > tags that have been obtained if he make the any code changes (including > function/variable renaming) as well as commit message changes during > the patch refresh process, am I understand correctly? ;-) It is a judgement call as to whether a Reviewed-by/etc should be kept or dropped. It depends on the scale of the changes that were made to the commit since the Reviewed-by/etc was first given. > > + NB: a subsystem maintainer sending a pull request would replace > > + their own ``Reviewed-by`` with another ``Signed-off-by`` > > + > > + * **``Acked-by``**: when a QEMU subsystem maintainer approves a patch > > + that touches their subsystem, but intends to allow a different > > + maintainer to queue it and send a pull request, they would send > > + a mail containing a ``Acked-by`` tag. > > + > > + * **``Tested-by``**: when a QEMU community member has functionally > > + tested the behaviour of the patch in some manner, they should > > + send an email reply conmtaning a ``Tested-by`` tag. > > Is there any requirement for the order of tags? > > My previous understanding was that if the Reviewed-by/Tested-by tags > were obtained by the author within his company, then those tags should > be placed before the signed-off-by of the author. If the Reviewed-by/ > Tested-by were acquired in the community, then they should be placed > after the author's signed-off-by, right? Common practice is for Signed-off-by tags to be kept in time order from earliest author to latest author / maintainer. Common case is 2 S-o-B, the first from the patch author, and the last from the sub-system maintainer who sends the pull request. For other tags I don't see any broadly acceptable pattern. Some people add Reviewed-by before the S-o-B, others add Reviewed-by after the S-o-B. Either is fine IMHO. > > +Re-starting abandoned work > > +~~~~~~~~~~~~~~~~~~~~~~~~~~ > > + > > +For a variety of reasons there are some patches that get submitted to > > +QEMU but never merged. An unrelated contributor may decide (months or > > +years later) to continue working from the abandoned patch and re-submit > > +it with extra changes. > > + > > +If the abandoned patch already had a ``Signed-off-by`` from the original > > +author this **must** be preserved. > > I find some people added Originally-by, e.g., 8e86851bd6b9. > > I guess if the code has been changed very significantly, or if the > original implementation has just been referenced and significantly > refactored, then Originally-by should be preferred instead of > Signed-off-by from the original author, right? If patch submitted still contains any code that can be considered copyrightable (ie anything non-trivial) from the original author, then I would expect the original authors Signed-off-by to be retained. I think the cases where it is ok to use Originally-by, without a Signed-off-by, would be exceedingly. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off 2024-01-29 9:31 ` Daniel P. Berrangé @ 2024-01-29 9:35 ` Samuel Tardieu 2024-01-29 10:41 ` Peter Maydell 0 siblings, 1 reply; 57+ messages in thread From: Samuel Tardieu @ 2024-01-29 9:35 UTC (permalink / raw) To: Daniel P. Berrangé Cc: Zhao Liu, Richard Henderson, Alexander Graf, Alex Bennée, Paolo Bonzini, Michael S. Tsirkin, Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell, qemu-devel Daniel P. Berrangé <berrange@redhat.com> writes: >> Is there any requirement for the order of tags? >> >> My previous understanding was that if the Reviewed-by/Tested-by >> tags >> were obtained by the author within his company, then those tags >> should >> be placed before the signed-off-by of the author. If the >> Reviewed-by/ >> Tested-by were acquired in the community, then they should be >> placed >> after the author's signed-off-by, right? > > Common practice is for Signed-off-by tags to be kept in time > order > from earliest author to latest author / maintainer. Common case > is > 2 S-o-B, the first from the patch author, and the last from the > sub-system maintainer who sends the pull request. > > For other tags I don't see any broadly acceptable pattern. Some > people > add Reviewed-by before the S-o-B, others add Reviewed-by after > the > S-o-B. Either is fine IMHO. From what I've seen in other projects, S-o-B means that you accept accountability for everything above. One scenario would be: - Send original patch, which has been tested inside the company: Tested-by: Tester <tester@example.com> Signed-off-by: Developper <developper@example.com> - Get some R-b, but need to make some requested minor changes and resend a new patch series: Tested-by: Tester <tester@example.com> Reviewed-by: Reviewer <reviewer@othercompany.com> Signed-off-by: Developper <developper@example.com> This is a way of saying "I guarantee that the R-b still applies after the new changes I made to this series" - Then reviewed and pulled into their tree by the maintainer: Tested-by: Tester <tester@example.com> Reviewed-by: Reviewer <reviewer@othercompany.com> Signed-off-by: Developper <developper@example.com> Reviewed-by: Maintainer <maintainer@org.org> Signed-off-by: Maintainer <maintainer@org.org> If, after being reviewed, the initial patch would not have needed any change, the order would have been: Tested-by: Tester <tester@example.com> Signed-off-by: Developper <developper@example.com> Reviewed-by: Reviewer <reviewer@othercompany.com> Reviewed-by: Maintainer <maintainer@org.org> Signed-off-by: Maintainer <maintainer@org.org> This is consistent with what software like "b4" do: if the S-o of the current user is present, it is moved last, as the current user is the one accepting accountability at this point. However, this is not what QEMU has been using as far as I can see, as S-o-b tend to stay in their original positions. I even opened an issue on b4 a few weeks ago because of this <https://github.com/mricon/b4/issues/16>, and I reverted to using git-publish. But if this is ok to use an arbitrary order for non-S-o-b headers, I can get back to b4. Sam -- Samuel Tardieu ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off 2024-01-29 9:35 ` Samuel Tardieu @ 2024-01-29 10:41 ` Peter Maydell 2024-01-29 11:00 ` Daniel P. Berrangé 0 siblings, 1 reply; 57+ messages in thread From: Peter Maydell @ 2024-01-29 10:41 UTC (permalink / raw) To: Samuel Tardieu Cc: Daniel P. Berrangé, Zhao Liu, Richard Henderson, Alexander Graf, Alex Bennée, Paolo Bonzini, Michael S. Tsirkin, Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann, Mark Cave-Ayland, qemu-devel On Mon, 29 Jan 2024 at 09:47, Samuel Tardieu <sam@rfc1149.net> wrote: > However, this is not what QEMU has been using as far as I can see, > as S-o-b tend to stay in their original positions. I even opened > an issue on b4 a few weeks ago because of this > <https://github.com/mricon/b4/issues/16>, and I reverted to using > git-publish. But if this is ok to use an arbitrary order for > non-S-o-b headers, I can get back to b4. I think QEMU doesn't have a specific existing practice here. What you see is largely the result of people using whatever tooling they have and accepting the ordering it gives them. So I don't think you should stop using b4 just because the ordering it happens to produce isn't the same as somebody else's tooling. I think trying to impose some subtle distinction of meaning on the ordering of tags is not going to work, because there are going to be too many cases where people don't adhere to the ordering distinction because they don't know about it or don't understand it. As Daniel says, as long as the Signed-off-by tags are in basically the right order for developer vs maintainer that's the only strong ordering constraint we have. thanks -- PMM ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off 2024-01-29 10:41 ` Peter Maydell @ 2024-01-29 11:00 ` Daniel P. Berrangé 0 siblings, 0 replies; 57+ messages in thread From: Daniel P. Berrangé @ 2024-01-29 11:00 UTC (permalink / raw) To: Peter Maydell Cc: Samuel Tardieu, Zhao Liu, Richard Henderson, Alexander Graf, Alex Bennée, Paolo Bonzini, Michael S. Tsirkin, Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann, Mark Cave-Ayland, qemu-devel On Mon, Jan 29, 2024 at 10:41:38AM +0000, Peter Maydell wrote: > On Mon, 29 Jan 2024 at 09:47, Samuel Tardieu <sam@rfc1149.net> wrote: > > However, this is not what QEMU has been using as far as I can see, > > as S-o-b tend to stay in their original positions. I even opened > > an issue on b4 a few weeks ago because of this > > <https://github.com/mricon/b4/issues/16>, and I reverted to using > > git-publish. But if this is ok to use an arbitrary order for > > non-S-o-b headers, I can get back to b4. > > I think QEMU doesn't have a specific existing practice here. > What you see is largely the result of people using whatever > tooling they have and accepting the ordering it gives them. > So I don't think you should stop using b4 just because > the ordering it happens to produce isn't the same as > somebody else's tooling. > > I think trying to impose some subtle distinction of meaning > on the ordering of tags is not going to work, because there > are going to be too many cases where people don't adhere > to the ordering distinction because they don't know about > it or don't understand it. > > As Daniel says, as long as the Signed-off-by tags are > in basically the right order for developer vs maintainer > that's the only strong ordering constraint we have. To think of it another way.... Signed-off-by is the only tag which has defined legal meaning in terms of asserting that the people involved have permission to contribute. All the other tags (Reviewed/Tested/etc) are merely a historical record of the development process, and have no legal implications. This makes Signed-off-by the important one, and the others all in the "nice to have" category. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 57+ messages in thread
* [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators 2023-11-23 11:40 [PATCH 0/2] docs: define policy forbidding use of "AI" / LLM code generators Daniel P. Berrangé 2023-11-23 11:40 ` [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off Daniel P. Berrangé @ 2023-11-23 11:40 ` Daniel P. Berrangé 2023-11-23 12:57 ` Alex Bennée ` (3 more replies) 1 sibling, 4 replies; 57+ messages in thread From: Daniel P. Berrangé @ 2023-11-23 11:40 UTC (permalink / raw) To: qemu-devel Cc: Richard Henderson, Alexander Graf, Alex Bennée, Paolo Bonzini, Michael S. Tsirkin, Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell, Daniel P. Berrangé There has been an explosion of interest in so called "AI" (LLM) code generators in the past year or so. Thus far though, this is has not been matched by a broadly accepted legal interpretation of the licensing implications for code generator outputs. While the vendors may claim there is no problem and a free choice of license is possible, they have an inherent conflict of interest in promoting this interpretation. More broadly there is, as yet, no broad consensus on the licensing implications of code generators trained on inputs under a wide variety of licenses. The DCO requires contributors to assert they have the right to contribute under the designated project license. Given the lack of consensus on the licensing of "AI" (LLM) code generator output, it is not considered credible to assert compliance with the DCO clause (b) or (c) where a patch includes such generated code. This patch thus defines a policy that the QEMU project will not accept contributions where use of "AI" (LLM) code generators is either known, or suspected. Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> --- docs/devel/code-provenance.rst | 40 ++++++++++++++++++++++++++++++++++ 1 file changed, 40 insertions(+) diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst index b4591a2dec..a6e42c6b1b 100644 --- a/docs/devel/code-provenance.rst +++ b/docs/devel/code-provenance.rst @@ -195,3 +195,43 @@ example:: Signed-off-by: Some Person <some.person@example.com> [Rebased and added support for 'foo'] Signed-off-by: New Person <new.person@example.com> + +Use of "AI" (LLM) code generators +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +TL;DR: + + **Current QEMU project policy is to DECLINE any contributions + which are believed to include or derive from "AI" (LLM) + generated code.** + +The existence of "AI" (`Large Language Model <https://en.wikipedia.org/wiki/Large_language_model>`__ +/ LLM) code generators raises a number of difficult legal questions, a +number of which impact on Open Source projects. As noted earlier, the +QEMU community requires that contributors certify their patch submissions +are made in accordance with the rules of the :ref:`dco` (DCO). When a +patch contains "AI" generated code this raises difficulties with code +provenence and thus DCO compliance. + +To satisfy the DCO, the patch contributor has to fully understand +the origins and license of code they are contributing to QEMU. The +license terms that should apply to the output of an "AI" code generator +are ill-defined, given that both training data and operation of the +"AI" are typically opaque to the user. Even where the training data +is said to all be open source, it will likely be under a wide variety +of license terms. + +While the vendor's of "AI" code generators may promote the idea that +code output can be taken under a free choice of license, this is not +yet considered to be a generally accepted, nor tested, legal opinion. + +With this in mind, the QEMU maintainers does not consider it is +currently possible to comply with DCO terms (b) or (c) for most "AI" +generated code. + +The QEMU maintainers thus require that contributors refrain from using +"AI" code generators on patches intended to be submitted to the project, +and will decline any contribution if use of "AI" is known or suspected. + +Examples of tools impacted by this policy includes both GitHub CoPilot, +and ChatGPT, amongst many others which are less well known. -- 2.41.0 ^ permalink raw reply related [flat|nested] 57+ messages in thread
* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators 2023-11-23 11:40 ` [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators Daniel P. Berrangé @ 2023-11-23 12:57 ` Alex Bennée 2023-11-23 17:37 ` Michal Suchánek 2023-11-23 17:46 ` Daniel P. Berrangé 2023-11-23 13:20 ` Kevin Wolf ` (2 subsequent siblings) 3 siblings, 2 replies; 57+ messages in thread From: Alex Bennée @ 2023-11-23 12:57 UTC (permalink / raw) To: Daniel P. Berrangé Cc: qemu-devel, Richard Henderson, Alexander Graf, Paolo Bonzini, Michael S. Tsirkin, Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell Daniel P. Berrangé <berrange@redhat.com> writes: > There has been an explosion of interest in so called "AI" (LLM) > code generators in the past year or so. Thus far though, this is > has not been matched by a broadly accepted legal interpretation > of the licensing implications for code generator outputs. While > the vendors may claim there is no problem and a free choice of > license is possible, they have an inherent conflict of interest > in promoting this interpretation. More broadly there is, as yet, > no broad consensus on the licensing implications of code generators > trained on inputs under a wide variety of licenses. > > The DCO requires contributors to assert they have the right to > contribute under the designated project license. Given the lack > of consensus on the licensing of "AI" (LLM) code generator output, > it is not considered credible to assert compliance with the DCO > clause (b) or (c) where a patch includes such generated code. > > This patch thus defines a policy that the QEMU project will not > accept contributions where use of "AI" (LLM) code generators is > either known, or suspected. > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> > --- > docs/devel/code-provenance.rst | 40 ++++++++++++++++++++++++++++++++++ > 1 file changed, 40 insertions(+) > > diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst > index b4591a2dec..a6e42c6b1b 100644 > --- a/docs/devel/code-provenance.rst > +++ b/docs/devel/code-provenance.rst > @@ -195,3 +195,43 @@ example:: > Signed-off-by: Some Person <some.person@example.com> > [Rebased and added support for 'foo'] > Signed-off-by: New Person <new.person@example.com> > + > +Use of "AI" (LLM) code generators > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +TL;DR: > + > + **Current QEMU project policy is to DECLINE any contributions > + which are believed to include or derive from "AI" (LLM) > + generated code.** > + > +The existence of "AI" (`Large Language Model <https://en.wikipedia.org/wiki/Large_language_model>`__ > +/ LLM) code generators raises a number of difficult legal questions, a > +number of which impact on Open Source projects. As noted earlier, the > +QEMU community requires that contributors certify their patch submissions > +are made in accordance with the rules of the :ref:`dco` (DCO). When a > +patch contains "AI" generated code this raises difficulties with code > +provenence and thus DCO compliance. I agree this is going to be a field that keeps lawyers well re-numerated for the foreseeable future. However I suspect this elides over the main use case for LLM generators which is non-novel transformation. One good example is generating text fixtures where you write a piece of original code and then ask the code completion engine to fill out some unit tests to exercise the code. It's boring mechanical work but one an LLM is very suited to (even if you might tweak the final result). > +To satisfy the DCO, the patch contributor has to fully understand > +the origins and license of code they are contributing to QEMU. The > +license terms that should apply to the output of an "AI" code generator > +are ill-defined, given that both training data and operation of the > +"AI" are typically opaque to the user. Even where the training data > +is said to all be open source, it will likely be under a wide variety > +of license terms. > + > +While the vendor's of "AI" code generators may promote the idea that > +code output can be taken under a free choice of license, this is not > +yet considered to be a generally accepted, nor tested, legal opinion. > + > +With this in mind, the QEMU maintainers does not consider it is > +currently possible to comply with DCO terms (b) or (c) for most "AI" > +generated code. There is a load of code out that isn't eligible for copyright projection because it doesn't demonstrate much originality or creativity. In the experimentation I've done so far I've not seen much sign of genuine creativity. LLM's benefit from having access to a wide corpus of training data and tend to do a better job of inferencing solutions from semi-related posts than say for example human manually comparing posts having pasted an error message in google. > + > +The QEMU maintainers thus require that contributors refrain from using > +"AI" code generators on patches intended to be submitted to the project, > +and will decline any contribution if use of "AI" is known or suspected. > + > +Examples of tools impacted by this policy includes both GitHub CoPilot, > +and ChatGPT, amongst many others which are less well known. What about if you took an LLM and then fine tuned it by using project data so it could better help new users in making contributions to the project? You would be biasing the model to your own data for the purposes of helping developers write better QEMU code? -- Alex Bennée Virtualisation Tech Lead @ Linaro ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators 2023-11-23 12:57 ` Alex Bennée @ 2023-11-23 17:37 ` Michal Suchánek 2023-11-23 23:27 ` Michael S. Tsirkin 2023-11-23 17:46 ` Daniel P. Berrangé 1 sibling, 1 reply; 57+ messages in thread From: Michal Suchánek @ 2023-11-23 17:37 UTC (permalink / raw) To: Alex Bennée Cc: Daniel P. Berrangé, qemu-devel, Richard Henderson, Alexander Graf, Paolo Bonzini, Michael S. Tsirkin, Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell On Thu, Nov 23, 2023 at 12:57:42PM +0000, Alex Bennée wrote: > Daniel P. Berrangé <berrange@redhat.com> writes: > > > There has been an explosion of interest in so called "AI" (LLM) > > code generators in the past year or so. Thus far though, this is > > has not been matched by a broadly accepted legal interpretation > > of the licensing implications for code generator outputs. While > > the vendors may claim there is no problem and a free choice of > > license is possible, they have an inherent conflict of interest > > in promoting this interpretation. More broadly there is, as yet, > > no broad consensus on the licensing implications of code generators > > trained on inputs under a wide variety of licenses. > > > > The DCO requires contributors to assert they have the right to > > contribute under the designated project license. Given the lack > > of consensus on the licensing of "AI" (LLM) code generator output, > > it is not considered credible to assert compliance with the DCO > > clause (b) or (c) where a patch includes such generated code. > > > > This patch thus defines a policy that the QEMU project will not > > accept contributions where use of "AI" (LLM) code generators is > > either known, or suspected. > > > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> > > --- > > docs/devel/code-provenance.rst | 40 ++++++++++++++++++++++++++++++++++ > > 1 file changed, 40 insertions(+) > > > > diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst > > index b4591a2dec..a6e42c6b1b 100644 > > --- a/docs/devel/code-provenance.rst > > +++ b/docs/devel/code-provenance.rst > > @@ -195,3 +195,43 @@ example:: > > Signed-off-by: Some Person <some.person@example.com> > > [Rebased and added support for 'foo'] > > Signed-off-by: New Person <new.person@example.com> > > + > > +Use of "AI" (LLM) code generators > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > + > > +TL;DR: > > + > > + **Current QEMU project policy is to DECLINE any contributions > > + which are believed to include or derive from "AI" (LLM) > > + generated code.** > > + > > +The existence of "AI" (`Large Language Model <https://en.wikipedia.org/wiki/Large_language_model>`__ > > +/ LLM) code generators raises a number of difficult legal questions, a > > +number of which impact on Open Source projects. As noted earlier, the > > +QEMU community requires that contributors certify their patch submissions > > +are made in accordance with the rules of the :ref:`dco` (DCO). When a > > +patch contains "AI" generated code this raises difficulties with code > > +provenence and thus DCO compliance. > > I agree this is going to be a field that keeps lawyers well re-numerated > for the foreseeable future. However I suspect this elides over the main > use case for LLM generators which is non-novel transformation. One good > example is generating text fixtures where you write a piece of original > code and then ask the code completion engine to fill out some unit tests > to exercise the code. It's boring mechanical work but one an LLM is very > suited to (even if you might tweak the final result). It may be suited to produce such code (disputable) but the code is not suited for inclusion into the project, for legal reasons. > > +To satisfy the DCO, the patch contributor has to fully understand > > +the origins and license of code they are contributing to QEMU. The > > +license terms that should apply to the output of an "AI" code generator > > +are ill-defined, given that both training data and operation of the > > +"AI" are typically opaque to the user. Even where the training data > > +is said to all be open source, it will likely be under a wide variety > > +of license terms. > > + > > +While the vendor's of "AI" code generators may promote the idea that > > +code output can be taken under a free choice of license, this is not > > +yet considered to be a generally accepted, nor tested, legal opinion. > > + > > +With this in mind, the QEMU maintainers does not consider it is > > +currently possible to comply with DCO terms (b) or (c) for most "AI" > > +generated code. > > There is a load of code out that isn't eligible for copyright projection > because it doesn't demonstrate much originality or creativity. In the > experimentation I've done so far I've not seen much sign of genuine > creativity. LLM's benefit from having access to a wide corpus of > training data and tend to do a better job of inferencing solutions from > semi-related posts than say for example human manually comparing posts > having pasted an error message in google. And license of that corpus of training data is not defined. If you could erase the copyright on anything by feeding it into a statistical model and pulling it back out there would be some big content license holders objecting so it's very unlikely to happen. Consequently, for all practical purposes the "AI"/LLM output is derivative work of the input with all legal consequences. This is, of course, only a problem for *generative* use of AI/LLM where the putput can contain contain copies of substantial parts of input. Thanks Michal ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators 2023-11-23 17:37 ` Michal Suchánek @ 2023-11-23 23:27 ` Michael S. Tsirkin 0 siblings, 0 replies; 57+ messages in thread From: Michael S. Tsirkin @ 2023-11-23 23:27 UTC (permalink / raw) To: Michal Suchánek Cc: Alex Bennée, Daniel P. Berrangé, qemu-devel, Richard Henderson, Alexander Graf, Paolo Bonzini, Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell On Thu, Nov 23, 2023 at 06:37:47PM +0100, Michal Suchánek wrote: > If you could erase the copyright on anything by feeding it into a > statistical model and pulling it back out there > Would be some big > content license holders objecting so it's very unlikely to happen. I won't venture a guess and I think neither should QEMU. For now, being on the safe side and rejecting auto-generated code sounds very reasonable to me, though, in particular because it's often quite low quality ;). Not a lawyer, and I don't speak for Red Hat. -- MST ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators 2023-11-23 12:57 ` Alex Bennée 2023-11-23 17:37 ` Michal Suchánek @ 2023-11-23 17:46 ` Daniel P. Berrangé 2023-11-23 23:53 ` Michael S. Tsirkin 1 sibling, 1 reply; 57+ messages in thread From: Daniel P. Berrangé @ 2023-11-23 17:46 UTC (permalink / raw) To: Alex Bennée Cc: qemu-devel, Richard Henderson, Alexander Graf, Paolo Bonzini, Michael S. Tsirkin, Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell On Thu, Nov 23, 2023 at 12:57:42PM +0000, Alex Bennée wrote: > Daniel P. Berrangé <berrange@redhat.com> writes: > > > There has been an explosion of interest in so called "AI" (LLM) > > code generators in the past year or so. Thus far though, this is > > has not been matched by a broadly accepted legal interpretation > > of the licensing implications for code generator outputs. While > > the vendors may claim there is no problem and a free choice of > > license is possible, they have an inherent conflict of interest > > in promoting this interpretation. More broadly there is, as yet, > > no broad consensus on the licensing implications of code generators > > trained on inputs under a wide variety of licenses. > > > > The DCO requires contributors to assert they have the right to > > contribute under the designated project license. Given the lack > > of consensus on the licensing of "AI" (LLM) code generator output, > > it is not considered credible to assert compliance with the DCO > > clause (b) or (c) where a patch includes such generated code. > > > > This patch thus defines a policy that the QEMU project will not > > accept contributions where use of "AI" (LLM) code generators is > > either known, or suspected. > > > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> > > --- > > docs/devel/code-provenance.rst | 40 ++++++++++++++++++++++++++++++++++ > > 1 file changed, 40 insertions(+) > > > > diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst > > index b4591a2dec..a6e42c6b1b 100644 > > --- a/docs/devel/code-provenance.rst > > +++ b/docs/devel/code-provenance.rst > > @@ -195,3 +195,43 @@ example:: > > Signed-off-by: Some Person <some.person@example.com> > > [Rebased and added support for 'foo'] > > Signed-off-by: New Person <new.person@example.com> > > + > > +Use of "AI" (LLM) code generators > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > + > > +TL;DR: > > + > > + **Current QEMU project policy is to DECLINE any contributions > > + which are believed to include or derive from "AI" (LLM) > > + generated code.** > > + > > +The existence of "AI" (`Large Language Model <https://en.wikipedia.org/wiki/Large_language_model>`__ > > +/ LLM) code generators raises a number of difficult legal questions, a > > +number of which impact on Open Source projects. As noted earlier, the > > +QEMU community requires that contributors certify their patch submissions > > +are made in accordance with the rules of the :ref:`dco` (DCO). When a > > +patch contains "AI" generated code this raises difficulties with code > > +provenence and thus DCO compliance. > > I agree this is going to be a field that keeps lawyers well re-numerated > for the foreseeable future. However I suspect this elides over the main > use case for LLM generators which is non-novel transformation. One good > example is generating text fixtures where you write a piece of original > code and then ask the code completion engine to fill out some unit tests > to exercise the code. It's boring mechanical work but one an LLM is very > suited to (even if you might tweak the final result). Yes, I can see how that is helpful, but I think in many cases the resulting code will be complex enough to be considered copyrightable, and so even with the original input code, I feel the licensing of the output is still ill-defined. > > > +To satisfy the DCO, the patch contributor has to fully understand > > +the origins and license of code they are contributing to QEMU. The > > +license terms that should apply to the output of an "AI" code generator > > +are ill-defined, given that both training data and operation of the > > +"AI" are typically opaque to the user. Even where the training data > > +is said to all be open source, it will likely be under a wide variety > > +of license terms. > > + > > +While the vendor's of "AI" code generators may promote the idea that > > +code output can be taken under a free choice of license, this is not > > +yet considered to be a generally accepted, nor tested, legal opinion. > > + > > +With this in mind, the QEMU maintainers does not consider it is > > +currently possible to comply with DCO terms (b) or (c) for most "AI" > > +generated code. > > There is a load of code out that isn't eligible for copyright projection > because it doesn't demonstrate much originality or creativity. In the > experimentation I've done so far I've not seen much sign of genuine > creativity. LLM's benefit from having access to a wide corpus of > training data and tend to do a better job of inferencing solutions from > semi-related posts than say for example human manually comparing posts > having pasted an error message in google. The boundary between what is considered copyrightable and not, it itself quite ill-defined, and thus it is hard to express a clear rule that can be applied. I think more experience long term contributors end up getting somewhat of a "gut feeling" about what's ok and what's not, but I'm not sure if that is true for contibutors in general. IOW, while there are likely cases where it is possible to safely use a AI generator, I'm not sure how to best express that in an way that makes sense. Perhaps a loosely worded addendum about possible exception for "trivial" output > > +The QEMU maintainers thus require that contributors refrain from using > > +"AI" code generators on patches intended to be submitted to the project, > > +and will decline any contribution if use of "AI" is known or suspected. > > + > > +Examples of tools impacted by this policy includes both GitHub CoPilot, > > +and ChatGPT, amongst many others which are less well known. > > What about if you took an LLM and then fine tuned it by using project > data so it could better help new users in making contributions to the > project? You would be biasing the model to your own data for the > purposes of helping developers write better QEMU code? It is hard to provide an answer to that question, since I think it is something that would need to be considered case by case. It hinges around how much does the new QEMU specific training data influence the model, vs other pre-existing training (if any) Perhaps we can finish this policy with a general point to solicit feedback on possible exceptions ? "If a contributor believes they can demonstrate that the output of a particular tool has deterministic licensing, such that they can satisfy the DCO, they should provide such info to the mailing list" With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators 2023-11-23 17:46 ` Daniel P. Berrangé @ 2023-11-23 23:53 ` Michael S. Tsirkin 2023-11-24 10:17 ` Kevin Wolf 0 siblings, 1 reply; 57+ messages in thread From: Michael S. Tsirkin @ 2023-11-23 23:53 UTC (permalink / raw) To: Daniel P. Berrangé Cc: Alex Bennée, qemu-devel, Richard Henderson, Alexander Graf, Paolo Bonzini, Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell On Thu, Nov 23, 2023 at 05:46:16PM +0000, Daniel P. Berrangé wrote: > On Thu, Nov 23, 2023 at 12:57:42PM +0000, Alex Bennée wrote: > > Daniel P. Berrangé <berrange@redhat.com> writes: > > > > > There has been an explosion of interest in so called "AI" (LLM) > > > code generators in the past year or so. Thus far though, this is > > > has not been matched by a broadly accepted legal interpretation > > > of the licensing implications for code generator outputs. While > > > the vendors may claim there is no problem and a free choice of > > > license is possible, they have an inherent conflict of interest > > > in promoting this interpretation. More broadly there is, as yet, > > > no broad consensus on the licensing implications of code generators > > > trained on inputs under a wide variety of licenses. > > > > > > The DCO requires contributors to assert they have the right to > > > contribute under the designated project license. Given the lack > > > of consensus on the licensing of "AI" (LLM) code generator output, > > > it is not considered credible to assert compliance with the DCO > > > clause (b) or (c) where a patch includes such generated code. > > > > > > This patch thus defines a policy that the QEMU project will not > > > accept contributions where use of "AI" (LLM) code generators is > > > either known, or suspected. > > > > > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> > > > --- > > > docs/devel/code-provenance.rst | 40 ++++++++++++++++++++++++++++++++++ > > > 1 file changed, 40 insertions(+) > > > > > > diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst > > > index b4591a2dec..a6e42c6b1b 100644 > > > --- a/docs/devel/code-provenance.rst > > > +++ b/docs/devel/code-provenance.rst > > > @@ -195,3 +195,43 @@ example:: > > > Signed-off-by: Some Person <some.person@example.com> > > > [Rebased and added support for 'foo'] > > > Signed-off-by: New Person <new.person@example.com> > > > + > > > +Use of "AI" (LLM) code generators > > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > + > > > +TL;DR: > > > + > > > + **Current QEMU project policy is to DECLINE any contributions > > > + which are believed to include or derive from "AI" (LLM) > > > + generated code.** > > > + > > > +The existence of "AI" (`Large Language Model <https://en.wikipedia.org/wiki/Large_language_model>`__ > > > +/ LLM) code generators raises a number of difficult legal questions, a > > > +number of which impact on Open Source projects. As noted earlier, the > > > +QEMU community requires that contributors certify their patch submissions > > > +are made in accordance with the rules of the :ref:`dco` (DCO). When a > > > +patch contains "AI" generated code this raises difficulties with code > > > +provenence and thus DCO compliance. > > > > I agree this is going to be a field that keeps lawyers well re-numerated > > for the foreseeable future. However I suspect this elides over the main > > use case for LLM generators which is non-novel transformation. One good > > example is generating text fixtures where you write a piece of original > > code and then ask the code completion engine to fill out some unit tests > > to exercise the code. It's boring mechanical work but one an LLM is very > > suited to (even if you might tweak the final result). > > Yes, I can see how that is helpful, but I think in many cases the > resulting code will be complex enough to be considered copyrightable, > and so even with the original input code, I feel the licensing of the > output is still ill-defined. > > > > > > +To satisfy the DCO, the patch contributor has to fully understand > > > +the origins and license of code they are contributing to QEMU. The > > > +license terms that should apply to the output of an "AI" code generator > > > +are ill-defined, given that both training data and operation of the > > > +"AI" are typically opaque to the user. Even where the training data > > > +is said to all be open source, it will likely be under a wide variety > > > +of license terms. > > > + > > > +While the vendor's of "AI" code generators may promote the idea that > > > +code output can be taken under a free choice of license, this is not > > > +yet considered to be a generally accepted, nor tested, legal opinion. > > > + > > > +With this in mind, the QEMU maintainers does not consider it is > > > +currently possible to comply with DCO terms (b) or (c) for most "AI" > > > +generated code. > > > > There is a load of code out that isn't eligible for copyright projection > > because it doesn't demonstrate much originality or creativity. In the > > experimentation I've done so far I've not seen much sign of genuine > > creativity. LLM's benefit from having access to a wide corpus of > > training data and tend to do a better job of inferencing solutions from > > semi-related posts than say for example human manually comparing posts > > having pasted an error message in google. > > The boundary between what is considered copyrightable and not, it > itself quite ill-defined, and thus it is hard to express a clear > rule that can be applied. > > I think more experience long term contributors end up getting somewhat > of a "gut feeling" about what's ok and what's not, but I'm not sure if > that is true for contibutors in general. > > IOW, while there are likely cases where it is possible to safely use > a AI generator, I'm not sure how to best express that in an way that > makes sense. > > Perhaps a loosely worded addendum about possible exception for > "trivial" output > > > > +The QEMU maintainers thus require that contributors refrain from using > > > +"AI" code generators on patches intended to be submitted to the project, > > > +and will decline any contribution if use of "AI" is known or suspected. > > > + > > > +Examples of tools impacted by this policy includes both GitHub CoPilot, > > > +and ChatGPT, amongst many others which are less well known. > > > > What about if you took an LLM and then fine tuned it by using project > > data so it could better help new users in making contributions to the > > project? You would be biasing the model to your own data for the > > purposes of helping developers write better QEMU code? > > It is hard to provide an answer to that question, since I think it is > something that would need to be considered case by case. It hinges > around how much does the new QEMU specific training data influence > the model, vs other pre-existing training (if any) > > Perhaps we can finish this policy with a general point to solicit > feedback on possible exceptions ? > > "If a contributor believes they can demonstrate that the output of > a particular tool has deterministic licensing, such that they can > satisfy the DCO, they should provide such info to the mailing list" > > With regards, > Daniel But the question is not about what QEMU should accept. We can trust maintainers to DTRT. The question is the meaning of DCO. If you want DCO to mean "this code was not generated by AI" then you better define "AI" in an unambiguous way otherwise what is it certifying? Instead, I propose adding simply this: Thus, generally, Signed-off-by from *each* person who has written a substantial portion of the patch is required. If a substantial portion of the patch was not written by any human person but was instead generated automatically (e.g. by an AI such as ChatGPT, or a decompiler) then you *must* clearly document this in the patch commit message. As a matter of policy, and out of an abundance of caution, such contributions will generally be rejected. When in doubt whether a specific portion is substantial - assume that Signed-off-by is required. -- MST ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators 2023-11-23 23:53 ` Michael S. Tsirkin @ 2023-11-24 10:17 ` Kevin Wolf 2023-11-24 10:33 ` Alex Bennée 0 siblings, 1 reply; 57+ messages in thread From: Kevin Wolf @ 2023-11-24 10:17 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Daniel P. Berrangé, Alex Bennée, qemu-devel, Richard Henderson, Alexander Graf, Paolo Bonzini, Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell Am 24.11.2023 um 00:53 hat Michael S. Tsirkin geschrieben: > On Thu, Nov 23, 2023 at 05:46:16PM +0000, Daniel P. Berrangé wrote: > > On Thu, Nov 23, 2023 at 12:57:42PM +0000, Alex Bennée wrote: > > > Daniel P. Berrangé <berrange@redhat.com> writes: > > > > > > > There has been an explosion of interest in so called "AI" (LLM) > > > > code generators in the past year or so. Thus far though, this is > > > > has not been matched by a broadly accepted legal interpretation > > > > of the licensing implications for code generator outputs. While > > > > the vendors may claim there is no problem and a free choice of > > > > license is possible, they have an inherent conflict of interest > > > > in promoting this interpretation. More broadly there is, as yet, > > > > no broad consensus on the licensing implications of code generators > > > > trained on inputs under a wide variety of licenses. > > > > > > > > The DCO requires contributors to assert they have the right to > > > > contribute under the designated project license. Given the lack > > > > of consensus on the licensing of "AI" (LLM) code generator output, > > > > it is not considered credible to assert compliance with the DCO > > > > clause (b) or (c) where a patch includes such generated code. > > > > > > > > This patch thus defines a policy that the QEMU project will not > > > > accept contributions where use of "AI" (LLM) code generators is > > > > either known, or suspected. > > > > > > > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> > > > > --- > > > > docs/devel/code-provenance.rst | 40 ++++++++++++++++++++++++++++++++++ > > > > 1 file changed, 40 insertions(+) > > > > > > > > diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst > > > > index b4591a2dec..a6e42c6b1b 100644 > > > > --- a/docs/devel/code-provenance.rst > > > > +++ b/docs/devel/code-provenance.rst > > > > @@ -195,3 +195,43 @@ example:: > > > > Signed-off-by: Some Person <some.person@example.com> > > > > [Rebased and added support for 'foo'] > > > > Signed-off-by: New Person <new.person@example.com> > > > > + > > > > +Use of "AI" (LLM) code generators > > > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > > + > > > > +TL;DR: > > > > + > > > > + **Current QEMU project policy is to DECLINE any contributions > > > > + which are believed to include or derive from "AI" (LLM) > > > > + generated code.** > > > > + > > > > +The existence of "AI" (`Large Language Model <https://en.wikipedia.org/wiki/Large_language_model>`__ > > > > +/ LLM) code generators raises a number of difficult legal questions, a > > > > +number of which impact on Open Source projects. As noted earlier, the > > > > +QEMU community requires that contributors certify their patch submissions > > > > +are made in accordance with the rules of the :ref:`dco` (DCO). When a > > > > +patch contains "AI" generated code this raises difficulties with code > > > > +provenence and thus DCO compliance. > > > > > > I agree this is going to be a field that keeps lawyers well re-numerated > > > for the foreseeable future. However I suspect this elides over the main > > > use case for LLM generators which is non-novel transformation. One good > > > example is generating text fixtures where you write a piece of original > > > code and then ask the code completion engine to fill out some unit tests > > > to exercise the code. It's boring mechanical work but one an LLM is very > > > suited to (even if you might tweak the final result). > > > > Yes, I can see how that is helpful, but I think in many cases the > > resulting code will be complex enough to be considered copyrightable, > > and so even with the original input code, I feel the licensing of the > > output is still ill-defined. > > > > > > > > > +To satisfy the DCO, the patch contributor has to fully understand > > > > +the origins and license of code they are contributing to QEMU. The > > > > +license terms that should apply to the output of an "AI" code generator > > > > +are ill-defined, given that both training data and operation of the > > > > +"AI" are typically opaque to the user. Even where the training data > > > > +is said to all be open source, it will likely be under a wide variety > > > > +of license terms. > > > > + > > > > +While the vendor's of "AI" code generators may promote the idea that > > > > +code output can be taken under a free choice of license, this is not > > > > +yet considered to be a generally accepted, nor tested, legal opinion. > > > > + > > > > +With this in mind, the QEMU maintainers does not consider it is > > > > +currently possible to comply with DCO terms (b) or (c) for most "AI" > > > > +generated code. > > > > > > There is a load of code out that isn't eligible for copyright projection > > > because it doesn't demonstrate much originality or creativity. In the > > > experimentation I've done so far I've not seen much sign of genuine > > > creativity. LLM's benefit from having access to a wide corpus of > > > training data and tend to do a better job of inferencing solutions from > > > semi-related posts than say for example human manually comparing posts > > > having pasted an error message in google. > > > > The boundary between what is considered copyrightable and not, it > > itself quite ill-defined, and thus it is hard to express a clear > > rule that can be applied. > > > > I think more experience long term contributors end up getting somewhat > > of a "gut feeling" about what's ok and what's not, but I'm not sure if > > that is true for contibutors in general. > > > > IOW, while there are likely cases where it is possible to safely use > > a AI generator, I'm not sure how to best express that in an way that > > makes sense. > > > > Perhaps a loosely worded addendum about possible exception for > > "trivial" output > > > > > > +The QEMU maintainers thus require that contributors refrain from using > > > > +"AI" code generators on patches intended to be submitted to the project, > > > > +and will decline any contribution if use of "AI" is known or suspected. > > > > + > > > > +Examples of tools impacted by this policy includes both GitHub CoPilot, > > > > +and ChatGPT, amongst many others which are less well known. > > > > > > What about if you took an LLM and then fine tuned it by using project > > > data so it could better help new users in making contributions to the > > > project? You would be biasing the model to your own data for the > > > purposes of helping developers write better QEMU code? > > > > It is hard to provide an answer to that question, since I think it is > > something that would need to be considered case by case. It hinges > > around how much does the new QEMU specific training data influence > > the model, vs other pre-existing training (if any) I suspect fine tuning won't be enough because it doesn't make the unlicensed original training data go away. If you could make sure that all of the training data consists only of code for which you have the right to contribute it to QEMU, that would be a different case. > > Perhaps we can finish this policy with a general point to solicit > > feedback on possible exceptions ? > > > > "If a contributor believes they can demonstrate that the output of > > a particular tool has deterministic licensing, such that they can > > satisfy the DCO, they should provide such info to the mailing list" > > > > With regards, > > Daniel > > > But the question is not about what QEMU should accept. We can trust > maintainers to DTRT. The question is the meaning of DCO. If you want > DCO to mean "this code was not generated by AI" then you better define > "AI" in an unambiguous way otherwise what is it certifying? That you can state confidently that you have the legal right to contribute this code. The problem is not AI per se, the problem is incompatibly licensed - or really, unlicensed (should I call it "pirated" for effect?) - training input for the AI. So if you got the code from ChatGPT, I simply won't believe you even if you claim that you have the right. > Instead, I propose adding simply this: > > Thus, generally, Signed-off-by from *each* person who has written > a substantial portion of the patch is required. > > If a substantial portion of the patch was not written by any > human person but was instead generated automatically (e.g. by an AI such > as ChatGPT, or a decompiler) then you *must* clearly document > this in the patch commit message. As a matter of policy, and out of an > abundance of caution, such contributions will generally be rejected. > > When in doubt whether a specific portion is substantial - assume > that Signed-off-by is required. "generated automatically" is going way too far. There is no problem at all with code changes generated by Coccinelle if you wrote the rules yourself or received them under a license that allows their inclusion in QEMU. The problem with ChatGPT etc. is that there is no licensing information attached to the generated code. You know it's based on someone else's work, but you don't know who it is, if they are willing to give you a license and under which conditions. And it's not an "abundance of caution" why we reject such patches, but that you obviously can't actually sign the DCO under such cirumstances and therefore the S-o-b is wrong. Kevin ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators 2023-11-24 10:17 ` Kevin Wolf @ 2023-11-24 10:33 ` Alex Bennée 2023-11-24 10:42 ` Michael S. Tsirkin 0 siblings, 1 reply; 57+ messages in thread From: Alex Bennée @ 2023-11-24 10:33 UTC (permalink / raw) To: Kevin Wolf Cc: Michael S. Tsirkin, Daniel P. Berrangé, qemu-devel, Richard Henderson, Alexander Graf, Paolo Bonzini, Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell Kevin Wolf <kwolf@redhat.com> writes: > Am 24.11.2023 um 00:53 hat Michael S. Tsirkin geschrieben: >> On Thu, Nov 23, 2023 at 05:46:16PM +0000, Daniel P. Berrangé wrote: >> > On Thu, Nov 23, 2023 at 12:57:42PM +0000, Alex Bennée wrote: >> > > Daniel P. Berrangé <berrange@redhat.com> writes: >> > > <snip> >> > > > +The QEMU maintainers thus require that contributors refrain from using >> > > > +"AI" code generators on patches intended to be submitted to the project, >> > > > +and will decline any contribution if use of "AI" is known or suspected. >> > > > + >> > > > +Examples of tools impacted by this policy includes both GitHub CoPilot, >> > > > +and ChatGPT, amongst many others which are less well known. >> > > >> > > What about if you took an LLM and then fine tuned it by using project >> > > data so it could better help new users in making contributions to the >> > > project? You would be biasing the model to your own data for the >> > > purposes of helping developers write better QEMU code? >> > >> > It is hard to provide an answer to that question, since I think it is >> > something that would need to be considered case by case. It hinges >> > around how much does the new QEMU specific training data influence >> > the model, vs other pre-existing training (if any) > > I suspect fine tuning won't be enough because it doesn't make the > unlicensed original training data go away. > > If you could make sure that all of the training data consists only of > code for which you have the right to contribute it to QEMU, that would > be a different case. That probably means we can never use even open source LLMs to generate code for QEMU because while the source data is all open source it won't necessarily be GPL compatible. -- Alex Bennée Virtualisation Tech Lead @ Linaro ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators 2023-11-24 10:33 ` Alex Bennée @ 2023-11-24 10:42 ` Michael S. Tsirkin 2023-11-24 10:43 ` Peter Maydell 0 siblings, 1 reply; 57+ messages in thread From: Michael S. Tsirkin @ 2023-11-24 10:42 UTC (permalink / raw) To: Alex Bennée Cc: Kevin Wolf, Daniel P. Berrangé, qemu-devel, Richard Henderson, Alexander Graf, Paolo Bonzini, Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell On Fri, Nov 24, 2023 at 10:33:49AM +0000, Alex Bennée wrote: > That probably means we can never use even open source LLMs to generate > code for QEMU because while the source data is all open source it won't > necessarily be GPL compatible. I would probably wait until the dust settles before we start accepting LLM generated code. If nothing else, generated code quality in our niche area is at this point still nowhere near being useful. -- MST ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators 2023-11-24 10:42 ` Michael S. Tsirkin @ 2023-11-24 10:43 ` Peter Maydell 2023-11-24 11:02 ` Michael S. Tsirkin 2023-11-24 11:37 ` Daniel P. Berrangé 0 siblings, 2 replies; 57+ messages in thread From: Peter Maydell @ 2023-11-24 10:43 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Alex Bennée, Kevin Wolf, Daniel P. Berrangé, qemu-devel, Richard Henderson, Alexander Graf, Paolo Bonzini, Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Gerd Hoffmann, Mark Cave-Ayland On Fri, 24 Nov 2023 at 10:42, Michael S. Tsirkin <mst@redhat.com> wrote: > > On Fri, Nov 24, 2023 at 10:33:49AM +0000, Alex Bennée wrote: > > That probably means we can never use even open source LLMs to generate > > code for QEMU because while the source data is all open source it won't > > necessarily be GPL compatible. > > I would probably wait until the dust settles before we start accepting > LLM generated code. I think that's pretty much my take on what this policy is: "say no for now; we can always come back later when the legal situation seems clearer". -- PMM ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators 2023-11-24 10:43 ` Peter Maydell @ 2023-11-24 11:02 ` Michael S. Tsirkin 2023-11-24 11:37 ` Daniel P. Berrangé 1 sibling, 0 replies; 57+ messages in thread From: Michael S. Tsirkin @ 2023-11-24 11:02 UTC (permalink / raw) To: Peter Maydell Cc: Alex Bennée, Kevin Wolf, Daniel P. Berrangé, qemu-devel, Richard Henderson, Alexander Graf, Paolo Bonzini, Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Gerd Hoffmann, Mark Cave-Ayland On Fri, Nov 24, 2023 at 10:43:05AM +0000, Peter Maydell wrote: > On Fri, 24 Nov 2023 at 10:42, Michael S. Tsirkin <mst@redhat.com> wrote: > > > > On Fri, Nov 24, 2023 at 10:33:49AM +0000, Alex Bennée wrote: > > > That probably means we can never use even open source LLMs to generate > > > code for QEMU because while the source data is all open source it won't > > > necessarily be GPL compatible. > > > > I would probably wait until the dust settles before we start accepting > > LLM generated code. > > I think that's pretty much my take on what this policy is: > "say no for now; we can always come back later when the legal > situation seems clearer". Absolutely. So I think we should not try and venture into terminology such as what is ai or try and promote legal copyright theories. ATM there's no good reason for someone who did not write the code to put their DCO on the code. If it is not clear who wrote the code because it was generated and not written then we don't want it. -- MST ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators 2023-11-24 10:43 ` Peter Maydell 2023-11-24 11:02 ` Michael S. Tsirkin @ 2023-11-24 11:37 ` Daniel P. Berrangé 2023-11-24 11:39 ` Michael S. Tsirkin 1 sibling, 1 reply; 57+ messages in thread From: Daniel P. Berrangé @ 2023-11-24 11:37 UTC (permalink / raw) To: Peter Maydell Cc: Michael S. Tsirkin, Alex Bennée, Kevin Wolf, qemu-devel, Richard Henderson, Alexander Graf, Paolo Bonzini, Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Gerd Hoffmann, Mark Cave-Ayland On Fri, Nov 24, 2023 at 10:43:05AM +0000, Peter Maydell wrote: > On Fri, 24 Nov 2023 at 10:42, Michael S. Tsirkin <mst@redhat.com> wrote: > > > > On Fri, Nov 24, 2023 at 10:33:49AM +0000, Alex Bennée wrote: > > > That probably means we can never use even open source LLMs to generate > > > code for QEMU because while the source data is all open source it won't > > > necessarily be GPL compatible. > > > > I would probably wait until the dust settles before we start accepting > > LLM generated code. > > I think that's pretty much my take on what this policy is: > "say no for now; we can always come back later when the legal > situation seems clearer". Yes, that was my thoughts exactly. And if anyone comes along with a specific LLM/AI code generator that they believe can be used in a way compatible with the DCO, they can ask for an exception to the general policy which we can discuss then. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators 2023-11-24 11:37 ` Daniel P. Berrangé @ 2023-11-24 11:39 ` Michael S. Tsirkin 2023-11-24 11:40 ` Michael S. Tsirkin 0 siblings, 1 reply; 57+ messages in thread From: Michael S. Tsirkin @ 2023-11-24 11:39 UTC (permalink / raw) To: Daniel P. Berrangé Cc: Peter Maydell, Alex Bennée, Kevin Wolf, qemu-devel, Richard Henderson, Alexander Graf, Paolo Bonzini, Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Gerd Hoffmann, Mark Cave-Ayland On Fri, Nov 24, 2023 at 11:37:15AM +0000, Daniel P. Berrangé wrote: > On Fri, Nov 24, 2023 at 10:43:05AM +0000, Peter Maydell wrote: > > On Fri, 24 Nov 2023 at 10:42, Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > On Fri, Nov 24, 2023 at 10:33:49AM +0000, Alex Bennée wrote: > > > > That probably means we can never use even open source LLMs to generate > > > > code for QEMU because while the source data is all open source it won't > > > > necessarily be GPL compatible. > > > > > > I would probably wait until the dust settles before we start accepting > > > LLM generated code. > > > > I think that's pretty much my take on what this policy is: > > "say no for now; we can always come back later when the legal > > situation seems clearer". > > Yes, that was my thoughts exactly. > > And if anyone comes along with a specific LLM/AI code generator that > they believe can be used in a way compatible with the DCO, they can > ask for an exception to the general policy which we can discuss then. Yea. But why do you keep worrying about LLM/AI mess? Are there code generators whose output do allow? What are these? -- MST ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators 2023-11-24 11:39 ` Michael S. Tsirkin @ 2023-11-24 11:40 ` Michael S. Tsirkin 0 siblings, 0 replies; 57+ messages in thread From: Michael S. Tsirkin @ 2023-11-24 11:40 UTC (permalink / raw) To: Daniel P. Berrangé Cc: Peter Maydell, Alex Bennée, Kevin Wolf, qemu-devel, Richard Henderson, Alexander Graf, Paolo Bonzini, Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Gerd Hoffmann, Mark Cave-Ayland On Fri, Nov 24, 2023 at 06:39:21AM -0500, Michael S. Tsirkin wrote: > On Fri, Nov 24, 2023 at 11:37:15AM +0000, Daniel P. Berrangé wrote: > > On Fri, Nov 24, 2023 at 10:43:05AM +0000, Peter Maydell wrote: > > > On Fri, 24 Nov 2023 at 10:42, Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > > > On Fri, Nov 24, 2023 at 10:33:49AM +0000, Alex Bennée wrote: > > > > > That probably means we can never use even open source LLMs to generate > > > > > code for QEMU because while the source data is all open source it won't > > > > > necessarily be GPL compatible. > > > > > > > > I would probably wait until the dust settles before we start accepting > > > > LLM generated code. > > > > > > I think that's pretty much my take on what this policy is: > > > "say no for now; we can always come back later when the legal > > > situation seems clearer". > > > > Yes, that was my thoughts exactly. > > > > And if anyone comes along with a specific LLM/AI code generator that > > they believe can be used in a way compatible with the DCO, they can > > ask for an exception to the general policy which we can discuss then. > > Yea. But why do you keep worrying about LLM/AI mess? Are there code > generators whose output do allow? What are these? And to clarify I mean source code in the GPL sense so please do not say "compiler". -- MST ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators 2023-11-23 11:40 ` [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators Daniel P. Berrangé 2023-11-23 12:57 ` Alex Bennée @ 2023-11-23 13:20 ` Kevin Wolf 2023-11-23 14:35 ` Michael S. Tsirkin 2023-11-23 15:22 ` Stefan Hajnoczi 3 siblings, 0 replies; 57+ messages in thread From: Kevin Wolf @ 2023-11-23 13:20 UTC (permalink / raw) To: Daniel P. Berrangé Cc: qemu-devel, Richard Henderson, Alexander Graf, Alex Bennée, Paolo Bonzini, Michael S. Tsirkin, Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell Am 23.11.2023 um 12:40 hat Daniel P. Berrangé geschrieben: > There has been an explosion of interest in so called "AI" (LLM) > code generators in the past year or so. Thus far though, this is > has not been matched by a broadly accepted legal interpretation > of the licensing implications for code generator outputs. While > the vendors may claim there is no problem and a free choice of > license is possible, they have an inherent conflict of interest > in promoting this interpretation. More broadly there is, as yet, > no broad consensus on the licensing implications of code generators > trained on inputs under a wide variety of licenses. > > The DCO requires contributors to assert they have the right to > contribute under the designated project license. Given the lack > of consensus on the licensing of "AI" (LLM) code generator output, > it is not considered credible to assert compliance with the DCO > clause (b) or (c) where a patch includes such generated code. > > This patch thus defines a policy that the QEMU project will not > accept contributions where use of "AI" (LLM) code generators is > either known, or suspected. > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> > --- > docs/devel/code-provenance.rst | 40 ++++++++++++++++++++++++++++++++++ > 1 file changed, 40 insertions(+) > > diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst > index b4591a2dec..a6e42c6b1b 100644 > --- a/docs/devel/code-provenance.rst > +++ b/docs/devel/code-provenance.rst > @@ -195,3 +195,43 @@ example:: > Signed-off-by: Some Person <some.person@example.com> > [Rebased and added support for 'foo'] > Signed-off-by: New Person <new.person@example.com> > + > +Use of "AI" (LLM) code generators > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +TL;DR: > + > + **Current QEMU project policy is to DECLINE any contributions > + which are believed to include or derive from "AI" (LLM) > + generated code.** > + > +The existence of "AI" (`Large Language Model <https://en.wikipedia.org/wiki/Large_language_model>`__ > +/ LLM) code generators raises a number of difficult legal questions, a > +number of which impact on Open Source projects. As noted earlier, the > +QEMU community requires that contributors certify their patch submissions > +are made in accordance with the rules of the :ref:`dco` (DCO). When a > +patch contains "AI" generated code this raises difficulties with code > +provenence and thus DCO compliance. > + > +To satisfy the DCO, the patch contributor has to fully understand > +the origins and license of code they are contributing to QEMU. The > +license terms that should apply to the output of an "AI" code generator > +are ill-defined, given that both training data and operation of the > +"AI" are typically opaque to the user. Even where the training data > +is said to all be open source, it will likely be under a wide variety > +of license terms. > + > +While the vendor's of "AI" code generators may promote the idea that > +code output can be taken under a free choice of license, this is not > +yet considered to be a generally accepted, nor tested, legal opinion. > + > +With this in mind, the QEMU maintainers does not consider it is s/does/do/ or maybe s/maintainers/project/ > +currently possible to comply with DCO terms (b) or (c) for most "AI" > +generated code. > + > +The QEMU maintainers thus require that contributors refrain from using > +"AI" code generators on patches intended to be submitted to the project, > +and will decline any contribution if use of "AI" is known or suspected. > + > +Examples of tools impacted by this policy includes both GitHub CoPilot, > +and ChatGPT, amongst many others which are less well known. Acked-by: Kevin Wolf <kwolf@redhat.com> ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators 2023-11-23 11:40 ` [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators Daniel P. Berrangé 2023-11-23 12:57 ` Alex Bennée 2023-11-23 13:20 ` Kevin Wolf @ 2023-11-23 14:35 ` Michael S. Tsirkin 2023-11-23 14:56 ` Manos Pitsidianakis 2023-11-23 17:58 ` Daniel P. Berrangé 2023-11-23 15:22 ` Stefan Hajnoczi 3 siblings, 2 replies; 57+ messages in thread From: Michael S. Tsirkin @ 2023-11-23 14:35 UTC (permalink / raw) To: Daniel P. Berrangé Cc: qemu-devel, Richard Henderson, Alexander Graf, Alex Bennée, Paolo Bonzini, Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell On Thu, Nov 23, 2023 at 11:40:26AM +0000, Daniel P. Berrangé wrote: > There has been an explosion of interest in so called "AI" (LLM) > code generators in the past year or so. Thus far though, this is > has not been matched by a broadly accepted legal interpretation > of the licensing implications for code generator outputs. While > the vendors may claim there is no problem and a free choice of > license is possible, they have an inherent conflict of interest > in promoting this interpretation. More broadly there is, as yet, > no broad consensus on the licensing implications of code generators > trained on inputs under a wide variety of licenses. > > The DCO requires contributors to assert they have the right to > contribute under the designated project license. Given the lack > of consensus on the licensing of "AI" (LLM) code generator output, > it is not considered credible to assert compliance with the DCO > clause (b) or (c) where a patch includes such generated code. > > This patch thus defines a policy that the QEMU project will not > accept contributions where use of "AI" (LLM) code generators is > either known, or suspected. > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> > --- > docs/devel/code-provenance.rst | 40 ++++++++++++++++++++++++++++++++++ > 1 file changed, 40 insertions(+) > > diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst > index b4591a2dec..a6e42c6b1b 100644 > --- a/docs/devel/code-provenance.rst > +++ b/docs/devel/code-provenance.rst > @@ -195,3 +195,43 @@ example:: > Signed-off-by: Some Person <some.person@example.com> > [Rebased and added support for 'foo'] > Signed-off-by: New Person <new.person@example.com> > + > +Use of "AI" (LLM) code generators > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +TL;DR: > + > + **Current QEMU project policy is to DECLINE any contributions > + which are believed to include or derive from "AI" (LLM) > + generated code.** > + > +The existence of "AI" (`Large Language Model <https://en.wikipedia.org/wiki/Large_language_model>`__ > +/ LLM) code generators raises a number of difficult legal questions, a > +number of which impact on Open Source projects. As noted earlier, the > +QEMU community requires that contributors certify their patch submissions > +are made in accordance with the rules of the :ref:`dco` (DCO). When a > +patch contains "AI" generated code this raises difficulties with code > +provenence and thus DCO compliance. > + > +To satisfy the DCO, the patch contributor has to fully understand > +the origins and license of code they are contributing to QEMU. The > +license terms that should apply to the output of an "AI" code generator > +are ill-defined, given that both training data and operation of the > +"AI" are typically opaque to the user. Even where the training data > +is said to all be open source, it will likely be under a wide variety > +of license terms. > + > +While the vendor's of "AI" code generators may promote the idea that > +code output can be taken under a free choice of license, this is not > +yet considered to be a generally accepted, nor tested, legal opinion. > + > +With this in mind, the QEMU maintainers does not consider it is > +currently possible to comply with DCO terms (b) or (c) for most "AI" > +generated code. > + > +The QEMU maintainers thus require that contributors refrain from using > +"AI" code generators on patches intended to be submitted to the project, > +and will decline any contribution if use of "AI" is known or suspected. > + > +Examples of tools impacted by this policy includes both GitHub CoPilot, > +and ChatGPT, amongst many others which are less well known. So you called out these two by name, fine, but given "AI" is in scare quotes I don't really know what is or is not allowed and I don't know how will contributors know. Is the "AI" that one must not use necessarily an LLM? And how do you define LLM even? Wikipedia says "general-purpose language understanding and generation". All this seems vague to me. However, can't we define a simpler more specific policy? For example, isn't it true that *any* automatically generated code can only be included if the scripts producing said code are also included or otherwise available under GPLv2? > -- > 2.41.0 ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators 2023-11-23 14:35 ` Michael S. Tsirkin @ 2023-11-23 14:56 ` Manos Pitsidianakis 2023-11-23 15:13 ` Michael S. Tsirkin ` (4 more replies) 2023-11-23 17:58 ` Daniel P. Berrangé 1 sibling, 5 replies; 57+ messages in thread From: Manos Pitsidianakis @ 2023-11-23 14:56 UTC (permalink / raw) To: qemu-devel, Michael S. Tsirkin, Daniel P. Berrangé Cc: qemu-devel, Richard Henderson, Alexander Graf, Alex Benné e, Paolo Bonzini, Markus Armbruster, Phil Mathieu-Daudé , Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell On Thu, 23 Nov 2023 16:35, "Michael S. Tsirkin" <mst@redhat.com> wrote: >On Thu, Nov 23, 2023 at 11:40:26AM +0000, Daniel P. Berrangé wrote: >> There has been an explosion of interest in so called "AI" (LLM) >> code generators in the past year or so. Thus far though, this is >> has not been matched by a broadly accepted legal interpretation >> of the licensing implications for code generator outputs. While >> the vendors may claim there is no problem and a free choice of >> license is possible, they have an inherent conflict of interest >> in promoting this interpretation. More broadly there is, as yet, >> no broad consensus on the licensing implications of code generators >> trained on inputs under a wide variety of licenses. >> >> The DCO requires contributors to assert they have the right to >> contribute under the designated project license. Given the lack >> of consensus on the licensing of "AI" (LLM) code generator output, >> it is not considered credible to assert compliance with the DCO >> clause (b) or (c) where a patch includes such generated code. >> >> This patch thus defines a policy that the QEMU project will not >> accept contributions where use of "AI" (LLM) code generators is >> either known, or suspected. >> >> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> >> --- >> docs/devel/code-provenance.rst | 40 ++++++++++++++++++++++++++++++++++ >> 1 file changed, 40 insertions(+) >> >> diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst >> index b4591a2dec..a6e42c6b1b 100644 >> --- a/docs/devel/code-provenance.rst >> +++ b/docs/devel/code-provenance.rst >> @@ -195,3 +195,43 @@ example:: >> Signed-off-by: Some Person <some.person@example.com> >> [Rebased and added support for 'foo'] >> Signed-off-by: New Person <new.person@example.com> >> + >> +Use of "AI" (LLM) code generators >> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> + >> +TL;DR: >> + >> + **Current QEMU project policy is to DECLINE any contributions >> + which are believed to include or derive from "AI" (LLM) >> + generated code.** >> + >> +The existence of "AI" (`Large Language Model <https://en.wikipedia.org/wiki/Large_language_model>`__ >> +/ LLM) code generators raises a number of difficult legal questions, a >> +number of which impact on Open Source projects. As noted earlier, the >> +QEMU community requires that contributors certify their patch submissions >> +are made in accordance with the rules of the :ref:`dco` (DCO). When a >> +patch contains "AI" generated code this raises difficulties with code >> +provenence and thus DCO compliance. >> + >> +To satisfy the DCO, the patch contributor has to fully understand >> +the origins and license of code they are contributing to QEMU. The >> +license terms that should apply to the output of an "AI" code generator >> +are ill-defined, given that both training data and operation of the >> +"AI" are typically opaque to the user. Even where the training data >> +is said to all be open source, it will likely be under a wide variety >> +of license terms. >> + >> +While the vendor's of "AI" code generators may promote the idea that >> +code output can be taken under a free choice of license, this is not >> +yet considered to be a generally accepted, nor tested, legal opinion. >> + >> +With this in mind, the QEMU maintainers does not consider it is >> +currently possible to comply with DCO terms (b) or (c) for most "AI" >> +generated code. >> + >> +The QEMU maintainers thus require that contributors refrain from using >> +"AI" code generators on patches intended to be submitted to the project, >> +and will decline any contribution if use of "AI" is known or suspected. >> + >> +Examples of tools impacted by this policy includes both GitHub CoPilot, >> +and ChatGPT, amongst many others which are less well known. > > >So you called out these two by name, fine, but given "AI" is in scare >quotes I don't really know what is or is not allowed and I don't know >how will contributors know. Is the "AI" that one must not use >necessarily an LLM? And how do you define LLM even? Wikipedia says >"general-purpose language understanding and generation". > > >All this seems vague to me. > > >However, can't we define a simpler more specific policy? >For example, isn't it true that *any* automatically generated code >can only be included if the scripts producing said code >are also included or otherwise available under GPLv2? The following definition makes sense to me: - Automated codegen tool must be idempotent. - Automated codegen tool must not use statistical modelling. I'd remove all AI or LLM references. These are non-specific, colloquial and in the case of `AI`, non-technical. This policy should apply the same to a Markov chain code generator. ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators 2023-11-23 14:56 ` Manos Pitsidianakis @ 2023-11-23 15:13 ` Michael S. Tsirkin 2023-11-23 15:29 ` Philippe Mathieu-Daudé ` (3 subsequent siblings) 4 siblings, 0 replies; 57+ messages in thread From: Michael S. Tsirkin @ 2023-11-23 15:13 UTC (permalink / raw) To: Manos Pitsidianakis Cc: qemu-devel, Daniel P. Berrangé, Richard Henderson, Alexander Graf, Alex Benné e, Paolo Bonzini, Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell On Thu, Nov 23, 2023 at 04:56:28PM +0200, Manos Pitsidianakis wrote: > > However, can't we define a simpler more specific policy? > > For example, isn't it true that *any* automatically generated code > > can only be included if the scripts producing said code > > are also included or otherwise available under GPLv2? > > The following definition makes sense to me: > > - Automated codegen tool must be idempotent. > - Automated codegen tool must not use statistical modelling. Why does it matter so much? > I'd remove all AI or LLM references. These are non-specific, colloquial and > in the case of `AI`, non-technical. This policy should apply the same to a > Markov chain code generator. -- MST ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators 2023-11-23 14:56 ` Manos Pitsidianakis 2023-11-23 15:13 ` Michael S. Tsirkin @ 2023-11-23 15:29 ` Philippe Mathieu-Daudé 2023-11-23 17:06 ` Michael S. Tsirkin 2023-11-23 15:32 ` Alex Bennée ` (2 subsequent siblings) 4 siblings, 1 reply; 57+ messages in thread From: Philippe Mathieu-Daudé @ 2023-11-23 15:29 UTC (permalink / raw) To: Manos Pitsidianakis, qemu-devel, Michael S. Tsirkin, Daniel P. Berrangé Cc: Richard Henderson, Alexander Graf, Alex Benn é e, Paolo Bonzini, Markus Armbruster, Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell On 23/11/23 15:56, Manos Pitsidianakis wrote: > On Thu, 23 Nov 2023 16:35, "Michael S. Tsirkin" <mst@redhat.com> wrote: >> On Thu, Nov 23, 2023 at 11:40:26AM +0000, Daniel P. Berrangé wrote: >>> There has been an explosion of interest in so called "AI" (LLM) >>> code generators in the past year or so. Thus far though, this is >>> has not been matched by a broadly accepted legal interpretation >>> of the licensing implications for code generator outputs. While >>> the vendors may claim there is no problem and a free choice of >>> license is possible, they have an inherent conflict of interest >>> in promoting this interpretation. More broadly there is, as yet, >>> no broad consensus on the licensing implications of code generators >>> trained on inputs under a wide variety of licenses. >>> >>> The DCO requires contributors to assert they have the right to >>> contribute under the designated project license. Given the lack >>> of consensus on the licensing of "AI" (LLM) code generator output, >>> it is not considered credible to assert compliance with the DCO >>> clause (b) or (c) where a patch includes such generated code. >>> >>> This patch thus defines a policy that the QEMU project will not >>> accept contributions where use of "AI" (LLM) code generators is >>> either known, or suspected. >>> >>> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> >>> --- >>> docs/devel/code-provenance.rst | 40 ++++++++++++++++++++++++++++++++++ >>> 1 file changed, 40 insertions(+) >>> +Use of "AI" (LLM) code generators >>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>> + >>> +TL;DR: >>> + >>> + **Current QEMU project policy is to DECLINE any contributions >>> + which are believed to include or derive from "AI" (LLM) >>> + generated code.** >>> + >>> +The existence of "AI" (`Large Language Model >>> <https://en.wikipedia.org/wiki/Large_language_model>`__ >>> +/ LLM) code generators raises a number of difficult legal questions, a >>> +number of which impact on Open Source projects. As noted earlier, the >>> +QEMU community requires that contributors certify their patch >>> submissions >>> +are made in accordance with the rules of the :ref:`dco` (DCO). When a >>> +patch contains "AI" generated code this raises difficulties with code >>> +provenence and thus DCO compliance. >>> + >>> +To satisfy the DCO, the patch contributor has to fully understand >>> +the origins and license of code they are contributing to QEMU. The >>> +license terms that should apply to the output of an "AI" code generator >>> +are ill-defined, given that both training data and operation of the >>> +"AI" are typically opaque to the user. Even where the training data >>> +is said to all be open source, it will likely be under a wide variety >>> +of license terms. >>> + >>> +While the vendor's of "AI" code generators may promote the idea that >>> +code output can be taken under a free choice of license, this is not >>> +yet considered to be a generally accepted, nor tested, legal opinion. >>> + >>> +With this in mind, the QEMU maintainers does not consider it is >>> +currently possible to comply with DCO terms (b) or (c) for most "AI" >>> +generated code. >>> + >>> +The QEMU maintainers thus require that contributors refrain from using >>> +"AI" code generators on patches intended to be submitted to the >>> project, >>> +and will decline any contribution if use of "AI" is known or suspected. >>> + >>> +Examples of tools impacted by this policy includes both GitHub CoPilot, >>> +and ChatGPT, amongst many others which are less well known. >> >> >> So you called out these two by name, fine, but given "AI" is in scare >> quotes I don't really know what is or is not allowed and I don't know >> how will contributors know. Is the "AI" that one must not use >> necessarily an LLM? And how do you define LLM even? Wikipedia says >> "general-purpose language understanding and generation". >> >> >> All this seems vague to me. >> >> >> However, can't we define a simpler more specific policy? >> For example, isn't it true that *any* automatically generated code >> can only be included if the scripts producing said code >> are also included or otherwise available under GPLv2? > > The following definition makes sense to me: > > - Automated codegen tool must be idempotent. > - Automated codegen tool must not use statistical modelling. > > I'd remove all AI or LLM references. These are non-specific, colloquial > and in the case of `AI`, non-technical. This policy should apply the > same to a Markov chain code generator. This document targets all contributors. Contributions can be typo fix, translations, ... and don't have to be technical. Similarly, contributors aren't expected to be technical experts. As a neophyte, "AI" makes sense. "Idempotent code generator" or "LLM" don't :) ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators 2023-11-23 15:29 ` Philippe Mathieu-Daudé @ 2023-11-23 17:06 ` Michael S. Tsirkin 2023-11-23 17:29 ` Michal Suchánek 0 siblings, 1 reply; 57+ messages in thread From: Michael S. Tsirkin @ 2023-11-23 17:06 UTC (permalink / raw) To: Philippe Mathieu-Daudé Cc: Manos Pitsidianakis, qemu-devel, Daniel P. Berrangé, Richard Henderson, Alexander Graf, Alex Benn é e, Paolo Bonzini, Markus Armbruster, Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell On Thu, Nov 23, 2023 at 04:29:52PM +0100, Philippe Mathieu-Daudé wrote: > This document targets all contributors. Contributions can be typo > fix, translations, ... and don't have to be technical. Similarly, > contributors aren't expected to be technical experts. As a neophyte, > "AI" makes sense. "Idempotent code generator" or "LLM" don't :) I don't think there's any big deal in using AI for typo fixes. -- MST ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators 2023-11-23 17:06 ` Michael S. Tsirkin @ 2023-11-23 17:29 ` Michal Suchánek 2023-11-23 18:05 ` Michael S. Tsirkin 0 siblings, 1 reply; 57+ messages in thread From: Michal Suchánek @ 2023-11-23 17:29 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Philippe Mathieu-Daudé, Manos Pitsidianakis, qemu-devel, Daniel P. Berrangé, Richard Henderson, Alexander Graf, Alex Benn é e, Paolo Bonzini, Markus Armbruster, Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell On Thu, Nov 23, 2023 at 12:06:59PM -0500, Michael S. Tsirkin wrote: > On Thu, Nov 23, 2023 at 04:29:52PM +0100, Philippe Mathieu-Daudé wrote: > > This document targets all contributors. Contributions can be typo > > fix, translations, ... and don't have to be technical. Similarly, > > contributors aren't expected to be technical experts. As a neophyte, > > "AI" makes sense. "Idempotent code generator" or "LLM" don't :) > > I don't think there's any big deal in using AI for typo fixes. For how many typos it is still OK, and would not a deterministic spellchecker be preferred? There are some edge cases where using AI is OK, the problem is most of the time it is not clear it is OK to use. Thanks Michal ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators 2023-11-23 17:29 ` Michal Suchánek @ 2023-11-23 18:05 ` Michael S. Tsirkin 0 siblings, 0 replies; 57+ messages in thread From: Michael S. Tsirkin @ 2023-11-23 18:05 UTC (permalink / raw) To: Michal Suchánek Cc: Philippe Mathieu-Daudé, Manos Pitsidianakis, qemu-devel, Daniel P. Berrangé, Richard Henderson, Alexander Graf, Alex Benn é e, Paolo Bonzini, Markus Armbruster, Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell On Thu, Nov 23, 2023 at 06:29:38PM +0100, Michal Suchánek wrote: > On Thu, Nov 23, 2023 at 12:06:59PM -0500, Michael S. Tsirkin wrote: > > On Thu, Nov 23, 2023 at 04:29:52PM +0100, Philippe Mathieu-Daudé wrote: > > > This document targets all contributors. Contributions can be typo > > > fix, translations, ... and don't have to be technical. Similarly, > > > contributors aren't expected to be technical experts. As a neophyte, > > > "AI" makes sense. "Idempotent code generator" or "LLM" don't :) > > > > I don't think there's any big deal in using AI for typo fixes. > > For how many typos it is still OK, and would not a deterministic > spellchecker be preferred? > > There are some edge cases where using AI is OK, the problem is most of > the time it is not clear it is OK to use. > > Thanks > > Michal ¯\_(ツ)_/¯ I am not a lawyer, and I don't speak for Red Hat. My point is however that e.g. even if you are using e.g. a grammar corrector you better make sure that it is not claiming that its output is a derivative work. -- MST ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators 2023-11-23 14:56 ` Manos Pitsidianakis 2023-11-23 15:13 ` Michael S. Tsirkin 2023-11-23 15:29 ` Philippe Mathieu-Daudé @ 2023-11-23 15:32 ` Alex Bennée 2023-11-23 18:02 ` Daniel P. Berrangé 2023-11-24 10:25 ` Kevin Wolf 4 siblings, 0 replies; 57+ messages in thread From: Alex Bennée @ 2023-11-23 15:32 UTC (permalink / raw) To: Manos Pitsidianakis Cc: qemu-devel, Michael S. Tsirkin, Daniel P. Berrangé , Richard Henderson, Alexander Graf, Paolo Bonzini, Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell Manos Pitsidianakis <manos.pitsidianakis@linaro.org> writes: > On Thu, 23 Nov 2023 16:35, "Michael S. Tsirkin" <mst@redhat.com> wrote: >>On Thu, Nov 23, 2023 at 11:40:26AM +0000, Daniel P. Berrangé wrote: >>> There has been an explosion of interest in so called "AI" (LLM) >>> code generators in the past year or so. Thus far though, this is >>> has not been matched by a broadly accepted legal interpretation >>> of the licensing implications for code generator outputs. While >>> the vendors may claim there is no problem and a free choice of >>> license is possible, they have an inherent conflict of interest >>> in promoting this interpretation. More broadly there is, as yet, >>> no broad consensus on the licensing implications of code generators >>> trained on inputs under a wide variety of licenses. >>> The DCO requires contributors to assert they have the right to >>> contribute under the designated project license. Given the lack >>> of consensus on the licensing of "AI" (LLM) code generator output, >>> it is not considered credible to assert compliance with the DCO >>> clause (b) or (c) where a patch includes such generated code. >>> This patch thus defines a policy that the QEMU project will not >>> accept contributions where use of "AI" (LLM) code generators is >>> either known, or suspected. >>> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> >>> --- >>> docs/devel/code-provenance.rst | 40 ++++++++++++++++++++++++++++++++++ >>> 1 file changed, 40 insertions(+) >>> diff --git a/docs/devel/code-provenance.rst >>> b/docs/devel/code-provenance.rst >>> index b4591a2dec..a6e42c6b1b 100644 >>> --- a/docs/devel/code-provenance.rst >>> +++ b/docs/devel/code-provenance.rst >>> @@ -195,3 +195,43 @@ example:: >>> Signed-off-by: Some Person <some.person@example.com> >>> [Rebased and added support for 'foo'] >>> Signed-off-by: New Person <new.person@example.com> >>> + >>> +Use of "AI" (LLM) code generators >>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>> + >>> +TL;DR: >>> + >>> + **Current QEMU project policy is to DECLINE any contributions >>> + which are believed to include or derive from "AI" (LLM) >>> + generated code.** >>> + >>> +The existence of "AI" (`Large Language Model <https://en.wikipedia.org/wiki/Large_language_model>`__ >>> +/ LLM) code generators raises a number of difficult legal questions, a >>> +number of which impact on Open Source projects. As noted earlier, the >>> +QEMU community requires that contributors certify their patch submissions >>> +are made in accordance with the rules of the :ref:`dco` (DCO). When a >>> +patch contains "AI" generated code this raises difficulties with code >>> +provenence and thus DCO compliance. >>> + <snip> >>> + >>> +The QEMU maintainers thus require that contributors refrain from using >>> +"AI" code generators on patches intended to be submitted to the project, >>> +and will decline any contribution if use of "AI" is known or suspected. >>> + >>> +Examples of tools impacted by this policy includes both GitHub CoPilot, >>> +and ChatGPT, amongst many others which are less well known. >> >> >>So you called out these two by name, fine, but given "AI" is in scare >>quotes I don't really know what is or is not allowed and I don't know >>how will contributors know. Is the "AI" that one must not use >>necessarily an LLM? And how do you define LLM even? Wikipedia says >>"general-purpose language understanding and generation". >> >> >>All this seems vague to me. >> >> >>However, can't we define a simpler more specific policy? >>For example, isn't it true that *any* automatically generated code >>can only be included if the scripts producing said code >>are also included or otherwise available under GPLv2? > > The following definition makes sense to me: > > - Automated codegen tool must be idempotent. > - Automated codegen tool must not use statistical modelling. > > I'd remove all AI or LLM references. These are non-specific, > colloquial and in the case of `AI`, non-technical. This policy should > apply the same to a Markov chain code generator. I'm fairly sure my Emacs auto-complete would fail by that definition. -- Alex Bennée Virtualisation Tech Lead @ Linaro ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators 2023-11-23 14:56 ` Manos Pitsidianakis ` (2 preceding siblings ...) 2023-11-23 15:32 ` Alex Bennée @ 2023-11-23 18:02 ` Daniel P. Berrangé 2023-11-23 18:10 ` Peter Maydell 2023-11-24 10:25 ` Kevin Wolf 4 siblings, 1 reply; 57+ messages in thread From: Daniel P. Berrangé @ 2023-11-23 18:02 UTC (permalink / raw) To: Manos Pitsidianakis Cc: qemu-devel, Michael S. Tsirkin, Richard Henderson, Alexander Graf, Alex Benné e, Paolo Bonzini, Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell On Thu, Nov 23, 2023 at 04:56:28PM +0200, Manos Pitsidianakis wrote: > On Thu, 23 Nov 2023 16:35, "Michael S. Tsirkin" <mst@redhat.com> wrote: > > On Thu, Nov 23, 2023 at 11:40:26AM +0000, Daniel P. Berrangé wrote: > > > There has been an explosion of interest in so called "AI" (LLM) > > > code generators in the past year or so. Thus far though, this is > > > has not been matched by a broadly accepted legal interpretation > > > of the licensing implications for code generator outputs. While > > > the vendors may claim there is no problem and a free choice of > > > license is possible, they have an inherent conflict of interest > > > in promoting this interpretation. More broadly there is, as yet, > > > no broad consensus on the licensing implications of code generators > > > trained on inputs under a wide variety of licenses. > > > > > > The DCO requires contributors to assert they have the right to > > > contribute under the designated project license. Given the lack > > > of consensus on the licensing of "AI" (LLM) code generator output, > > > it is not considered credible to assert compliance with the DCO > > > clause (b) or (c) where a patch includes such generated code. > > > > > > This patch thus defines a policy that the QEMU project will not > > > accept contributions where use of "AI" (LLM) code generators is > > > either known, or suspected. > > > > > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> > > > --- > > > docs/devel/code-provenance.rst | 40 ++++++++++++++++++++++++++++++++++ > > > 1 file changed, 40 insertions(+) > > > > > > diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst > > > index b4591a2dec..a6e42c6b1b 100644 > > > --- a/docs/devel/code-provenance.rst > > > +++ b/docs/devel/code-provenance.rst > > > @@ -195,3 +195,43 @@ example:: > > > Signed-off-by: Some Person <some.person@example.com> > > > [Rebased and added support for 'foo'] > > > Signed-off-by: New Person <new.person@example.com> > > > + > > > +Use of "AI" (LLM) code generators > > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > + > > > +TL;DR: > > > + > > > + **Current QEMU project policy is to DECLINE any contributions > > > + which are believed to include or derive from "AI" (LLM) > > > + generated code.** > > > + > > > +The existence of "AI" (`Large Language Model <https://en.wikipedia.org/wiki/Large_language_model>`__ > > > +/ LLM) code generators raises a number of difficult legal questions, a > > > +number of which impact on Open Source projects. As noted earlier, the > > > +QEMU community requires that contributors certify their patch submissions > > > +are made in accordance with the rules of the :ref:`dco` (DCO). When a > > > +patch contains "AI" generated code this raises difficulties with code > > > +provenence and thus DCO compliance. > > > + > > > +To satisfy the DCO, the patch contributor has to fully understand > > > +the origins and license of code they are contributing to QEMU. The > > > +license terms that should apply to the output of an "AI" code generator > > > +are ill-defined, given that both training data and operation of the > > > +"AI" are typically opaque to the user. Even where the training data > > > +is said to all be open source, it will likely be under a wide variety > > > +of license terms. > > > + > > > +While the vendor's of "AI" code generators may promote the idea that > > > +code output can be taken under a free choice of license, this is not > > > +yet considered to be a generally accepted, nor tested, legal opinion. > > > + > > > +With this in mind, the QEMU maintainers does not consider it is > > > +currently possible to comply with DCO terms (b) or (c) for most "AI" > > > +generated code. > > > + > > > +The QEMU maintainers thus require that contributors refrain from using > > > +"AI" code generators on patches intended to be submitted to the project, > > > +and will decline any contribution if use of "AI" is known or suspected. > > > + > > > +Examples of tools impacted by this policy includes both GitHub CoPilot, > > > +and ChatGPT, amongst many others which are less well known. > > > > > > So you called out these two by name, fine, but given "AI" is in scare > > quotes I don't really know what is or is not allowed and I don't know > > how will contributors know. Is the "AI" that one must not use > > necessarily an LLM? And how do you define LLM even? Wikipedia says > > "general-purpose language understanding and generation". > > > > > > All this seems vague to me. > > > > > > However, can't we define a simpler more specific policy? > > For example, isn't it true that *any* automatically generated code > > can only be included if the scripts producing said code > > are also included or otherwise available under GPLv2? > > The following definition makes sense to me: > > - Automated codegen tool must be idempotent. > - Automated codegen tool must not use statistical modelling. As a casual reader, I would find this somewhat unclear to interpet and relate to. > I'd remove all AI or LLM references. These are non-specific, colloquial and > in the case of `AI`, non-technical. This policy should apply the same to a > Markov chain code generator. The fact that they are colloaquial is, IMHO, a good thing is it makes the policy relatable to the casual reader who hears the terms "AI" and "LLM" in technical press articles/blogs/etc all over the place. I would have considered "Markov chain code generator" to fall under the "AI" reference, since "AI" has defacto become a general purpose term that covers a wierd variety of underlying technologies. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators 2023-11-23 18:02 ` Daniel P. Berrangé @ 2023-11-23 18:10 ` Peter Maydell 0 siblings, 0 replies; 57+ messages in thread From: Peter Maydell @ 2023-11-23 18:10 UTC (permalink / raw) To: Daniel P. Berrangé Cc: Manos Pitsidianakis, qemu-devel, Michael S. Tsirkin, Richard Henderson, Alexander Graf, Alex Benné e, Paolo Bonzini, Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann, Mark Cave-Ayland On Thu, 23 Nov 2023 at 18:02, Daniel P. Berrangé <berrange@redhat.com> wrote: > > On Thu, Nov 23, 2023 at 04:56:28PM +0200, Manos Pitsidianakis wrote: > > On Thu, 23 Nov 2023 16:35, "Michael S. Tsirkin" <mst@redhat.com> wrote: > > > On Thu, Nov 23, 2023 at 11:40:26AM +0000, Daniel P. Berrangé wrote: > > > > +Examples of tools impacted by this policy includes both GitHub CoPilot, > > > > +and ChatGPT, amongst many others which are less well known. > > > > > > > > > So you called out these two by name, fine, but given "AI" is in scare > > > quotes I don't really know what is or is not allowed and I don't know > > > how will contributors know. Is the "AI" that one must not use > > > necessarily an LLM? And how do you define LLM even? Wikipedia says > > > "general-purpose language understanding and generation". > > > > > > > > > All this seems vague to me. > > > > > > > > > However, can't we define a simpler more specific policy? > > > For example, isn't it true that *any* automatically generated code > > > can only be included if the scripts producing said code > > > are also included or otherwise available under GPLv2? > > > > The following definition makes sense to me: > > > > - Automated codegen tool must be idempotent. > > - Automated codegen tool must not use statistical modelling. > > As a casual reader, I would find this somewhat unclear to interpet > and relate to. It's also not really relevant to what we're trying to rule out. A non-idempotent codegen tool is fine, if the code it generates is clearly under a license that's compatible with QEMU's. A codegen tool that uses statistical modelling is also fine, if (for example) it's only doing statistical modelling of the data in the single file it's adding code to and doesn't use any external data set. > > I'd remove all AI or LLM references. These are non-specific, colloquial and > > in the case of `AI`, non-technical. This policy should apply the same to a > > Markov chain code generator. > > The fact that they are colloaquial is, IMHO, a good thing is it makes > the policy relatable to the casual reader who hears the terms "AI" and > "LLM" in technical press articles/blogs/etc all over the place. Yes, I think that the most important thing about the wording of this policy (assuming we agree on it) is that it should be immediately very clear to anybody reading it that ChatGPT, Copilot, etc type tools aren't permitted. Because in practice the most likely case is somebody who wants to use those, and we don't want to make them have to go through "read an abstract definition of what isn't permitted and apply that abstract definition to the concrete tool they're using". thanks -- PMM ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators 2023-11-23 14:56 ` Manos Pitsidianakis ` (3 preceding siblings ...) 2023-11-23 18:02 ` Daniel P. Berrangé @ 2023-11-24 10:25 ` Kevin Wolf 2023-11-24 10:37 ` Michael S. Tsirkin 2023-11-24 10:42 ` Manos Pitsidianakis 4 siblings, 2 replies; 57+ messages in thread From: Kevin Wolf @ 2023-11-24 10:25 UTC (permalink / raw) To: Manos Pitsidianakis Cc: qemu-devel, Michael S. Tsirkin, Daniel P. Berrangé, Richard Henderson, Alexander Graf, Alex Benné e, Paolo Bonzini, Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell Am 23.11.2023 um 15:56 hat Manos Pitsidianakis geschrieben: > On Thu, 23 Nov 2023 16:35, "Michael S. Tsirkin" <mst@redhat.com> wrote: > > On Thu, Nov 23, 2023 at 11:40:26AM +0000, Daniel P. Berrangé wrote: > > > There has been an explosion of interest in so called "AI" (LLM) > > > code generators in the past year or so. Thus far though, this is > > > has not been matched by a broadly accepted legal interpretation > > > of the licensing implications for code generator outputs. While > > > the vendors may claim there is no problem and a free choice of > > > license is possible, they have an inherent conflict of interest > > > in promoting this interpretation. More broadly there is, as yet, > > > no broad consensus on the licensing implications of code generators > > > trained on inputs under a wide variety of licenses. > > > > > > The DCO requires contributors to assert they have the right to > > > contribute under the designated project license. Given the lack > > > of consensus on the licensing of "AI" (LLM) code generator output, > > > it is not considered credible to assert compliance with the DCO > > > clause (b) or (c) where a patch includes such generated code. > > > > > > This patch thus defines a policy that the QEMU project will not > > > accept contributions where use of "AI" (LLM) code generators is > > > either known, or suspected. > > > > > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> > > > --- > > > docs/devel/code-provenance.rst | 40 ++++++++++++++++++++++++++++++++++ > > > 1 file changed, 40 insertions(+) > > > > > > diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst > > > index b4591a2dec..a6e42c6b1b 100644 > > > --- a/docs/devel/code-provenance.rst > > > +++ b/docs/devel/code-provenance.rst > > > @@ -195,3 +195,43 @@ example:: > > > Signed-off-by: Some Person <some.person@example.com> > > > [Rebased and added support for 'foo'] > > > Signed-off-by: New Person <new.person@example.com> > > > + > > > +Use of "AI" (LLM) code generators > > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > + > > > +TL;DR: > > > + > > > + **Current QEMU project policy is to DECLINE any contributions > > > + which are believed to include or derive from "AI" (LLM) > > > + generated code.** > > > + > > > +The existence of "AI" (`Large Language Model <https://en.wikipedia.org/wiki/Large_language_model>`__ > > > +/ LLM) code generators raises a number of difficult legal questions, a > > > +number of which impact on Open Source projects. As noted earlier, the > > > +QEMU community requires that contributors certify their patch submissions > > > +are made in accordance with the rules of the :ref:`dco` (DCO). When a > > > +patch contains "AI" generated code this raises difficulties with code > > > +provenence and thus DCO compliance. > > > + > > > +To satisfy the DCO, the patch contributor has to fully understand > > > +the origins and license of code they are contributing to QEMU. The > > > +license terms that should apply to the output of an "AI" code generator > > > +are ill-defined, given that both training data and operation of the > > > +"AI" are typically opaque to the user. Even where the training data > > > +is said to all be open source, it will likely be under a wide variety > > > +of license terms. > > > + > > > +While the vendor's of "AI" code generators may promote the idea that > > > +code output can be taken under a free choice of license, this is not > > > +yet considered to be a generally accepted, nor tested, legal opinion. > > > + > > > +With this in mind, the QEMU maintainers does not consider it is > > > +currently possible to comply with DCO terms (b) or (c) for most "AI" > > > +generated code. > > > + > > > +The QEMU maintainers thus require that contributors refrain from using > > > +"AI" code generators on patches intended to be submitted to the project, > > > +and will decline any contribution if use of "AI" is known or suspected. > > > + > > > +Examples of tools impacted by this policy includes both GitHub CoPilot, > > > +and ChatGPT, amongst many others which are less well known. > > > > > > So you called out these two by name, fine, but given "AI" is in scare > > quotes I don't really know what is or is not allowed and I don't know > > how will contributors know. Is the "AI" that one must not use > > necessarily an LLM? And how do you define LLM even? Wikipedia says > > "general-purpose language understanding and generation". > > > > > > All this seems vague to me. > > > > > > However, can't we define a simpler more specific policy? > > For example, isn't it true that *any* automatically generated code > > can only be included if the scripts producing said code > > are also included or otherwise available under GPLv2? > > The following definition makes sense to me: > > - Automated codegen tool must be idempotent. > - Automated codegen tool must not use statistical modelling. How are these definitions related to your ability to sign the DCO? Kevin ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators 2023-11-24 10:25 ` Kevin Wolf @ 2023-11-24 10:37 ` Michael S. Tsirkin 2023-11-24 10:42 ` Manos Pitsidianakis 1 sibling, 0 replies; 57+ messages in thread From: Michael S. Tsirkin @ 2023-11-24 10:37 UTC (permalink / raw) To: Kevin Wolf Cc: Manos Pitsidianakis, qemu-devel, Daniel P. Berrangé, Richard Henderson, Alexander Graf, Alex Benné e, Paolo Bonzini, Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell On Fri, Nov 24, 2023 at 11:25:55AM +0100, Kevin Wolf wrote: > > - Automated codegen tool must be idempotent. > > - Automated codegen tool must not use statistical modelling. > > How are these definitions related to your ability to sign the DCO? Not only that - while the question of whether code generated e.g. by copilot would be source code by GPL definition is unclear at least to me, code generated by an idempotent automated tool seems highly likely not to satisfy the GPL definition. Though I am not a lawyer and do not speak for Red Hat. -- MST ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators 2023-11-24 10:25 ` Kevin Wolf 2023-11-24 10:37 ` Michael S. Tsirkin @ 2023-11-24 10:42 ` Manos Pitsidianakis 1 sibling, 0 replies; 57+ messages in thread From: Manos Pitsidianakis @ 2023-11-24 10:42 UTC (permalink / raw) To: Kevin Wolf Cc: qemu-devel, Michael S. Tsirkin, Daniel P. Berrangé , Richard Henderson, Alexander Graf, Alex Benné e, Paolo Bonzini, Markus Armbruster, Phil Mathieu-Daudé , Stefan Hajnoczi, Thomas Huth, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell On Fri, 24 Nov 2023 12:25, Kevin Wolf <kwolf@redhat.com> wrote: >Am 23.11.2023 um 15:56 hat Manos Pitsidianakis geschrieben: >> On Thu, 23 Nov 2023 16:35, "Michael S. Tsirkin" <mst@redhat.com> wrote: >> > On Thu, Nov 23, 2023 at 11:40:26AM +0000, Daniel P. Berrangé wrote: >> > > There has been an explosion of interest in so called "AI" (LLM) >> > > code generators in the past year or so. Thus far though, this is >> > > has not been matched by a broadly accepted legal interpretation >> > > of the licensing implications for code generator outputs. While >> > > the vendors may claim there is no problem and a free choice of >> > > license is possible, they have an inherent conflict of interest >> > > in promoting this interpretation. More broadly there is, as yet, >> > > no broad consensus on the licensing implications of code generators >> > > trained on inputs under a wide variety of licenses. >> > > >> > > The DCO requires contributors to assert they have the right to >> > > contribute under the designated project license. Given the lack >> > > of consensus on the licensing of "AI" (LLM) code generator output, >> > > it is not considered credible to assert compliance with the DCO >> > > clause (b) or (c) where a patch includes such generated code. >> > > >> > > This patch thus defines a policy that the QEMU project will not >> > > accept contributions where use of "AI" (LLM) code generators is >> > > either known, or suspected. >> > > >> > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> >> > > --- >> > > docs/devel/code-provenance.rst | 40 ++++++++++++++++++++++++++++++++++ >> > > 1 file changed, 40 insertions(+) >> > > >> > > diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst >> > > index b4591a2dec..a6e42c6b1b 100644 >> > > --- a/docs/devel/code-provenance.rst >> > > +++ b/docs/devel/code-provenance.rst >> > > @@ -195,3 +195,43 @@ example:: >> > > Signed-off-by: Some Person <some.person@example.com> >> > > [Rebased and added support for 'foo'] >> > > Signed-off-by: New Person <new.person@example.com> >> > > + >> > > +Use of "AI" (LLM) code generators >> > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> > > + >> > > +TL;DR: >> > > + >> > > + **Current QEMU project policy is to DECLINE any contributions >> > > + which are believed to include or derive from "AI" (LLM) >> > > + generated code.** >> > > + >> > > +The existence of "AI" (`Large Language Model <https://en.wikipedia.org/wiki/Large_language_model>`__ >> > > +/ LLM) code generators raises a number of difficult legal questions, a >> > > +number of which impact on Open Source projects. As noted earlier, the >> > > +QEMU community requires that contributors certify their patch submissions >> > > +are made in accordance with the rules of the :ref:`dco` (DCO). When a >> > > +patch contains "AI" generated code this raises difficulties with code >> > > +provenence and thus DCO compliance. >> > > + >> > > +To satisfy the DCO, the patch contributor has to fully understand >> > > +the origins and license of code they are contributing to QEMU. The >> > > +license terms that should apply to the output of an "AI" code generator >> > > +are ill-defined, given that both training data and operation of the >> > > +"AI" are typically opaque to the user. Even where the training data >> > > +is said to all be open source, it will likely be under a wide variety >> > > +of license terms. >> > > + >> > > +While the vendor's of "AI" code generators may promote the idea that >> > > +code output can be taken under a free choice of license, this is not >> > > +yet considered to be a generally accepted, nor tested, legal opinion. >> > > + >> > > +With this in mind, the QEMU maintainers does not consider it is >> > > +currently possible to comply with DCO terms (b) or (c) for most "AI" >> > > +generated code. >> > > + >> > > +The QEMU maintainers thus require that contributors refrain from using >> > > +"AI" code generators on patches intended to be submitted to the project, >> > > +and will decline any contribution if use of "AI" is known or suspected. >> > > + >> > > +Examples of tools impacted by this policy includes both GitHub CoPilot, >> > > +and ChatGPT, amongst many others which are less well known. >> > >> > >> > So you called out these two by name, fine, but given "AI" is in scare >> > quotes I don't really know what is or is not allowed and I don't know >> > how will contributors know. Is the "AI" that one must not use >> > necessarily an LLM? And how do you define LLM even? Wikipedia says >> > "general-purpose language understanding and generation". >> > >> > >> > All this seems vague to me. >> > >> > >> > However, can't we define a simpler more specific policy? >> > For example, isn't it true that *any* automatically generated code >> > can only be included if the scripts producing said code >> > are also included or otherwise available under GPLv2? >> >> The following definition makes sense to me: >> >> - Automated codegen tool must be idempotent. >> - Automated codegen tool must not use statistical modelling. > >How are these definitions related to your ability to sign the DCO? > >Kevin This was a response to Michael's salient observation that AI and LLM are very vague and not clearly defined terms. I did not mention DCO at all. Manos ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators 2023-11-23 14:35 ` Michael S. Tsirkin 2023-11-23 14:56 ` Manos Pitsidianakis @ 2023-11-23 17:58 ` Daniel P. Berrangé 2023-11-23 22:39 ` Michael S. Tsirkin 1 sibling, 1 reply; 57+ messages in thread From: Daniel P. Berrangé @ 2023-11-23 17:58 UTC (permalink / raw) To: Michael S. Tsirkin Cc: qemu-devel, Richard Henderson, Alexander Graf, Alex Bennée, Paolo Bonzini, Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell On Thu, Nov 23, 2023 at 09:35:43AM -0500, Michael S. Tsirkin wrote: > On Thu, Nov 23, 2023 at 11:40:26AM +0000, Daniel P. Berrangé wrote: > > There has been an explosion of interest in so called "AI" (LLM) > > code generators in the past year or so. Thus far though, this is > > has not been matched by a broadly accepted legal interpretation > > of the licensing implications for code generator outputs. While > > the vendors may claim there is no problem and a free choice of > > license is possible, they have an inherent conflict of interest > > in promoting this interpretation. More broadly there is, as yet, > > no broad consensus on the licensing implications of code generators > > trained on inputs under a wide variety of licenses. > > > > The DCO requires contributors to assert they have the right to > > contribute under the designated project license. Given the lack > > of consensus on the licensing of "AI" (LLM) code generator output, > > it is not considered credible to assert compliance with the DCO > > clause (b) or (c) where a patch includes such generated code. > > > > This patch thus defines a policy that the QEMU project will not > > accept contributions where use of "AI" (LLM) code generators is > > either known, or suspected. > > > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> > > --- > > docs/devel/code-provenance.rst | 40 ++++++++++++++++++++++++++++++++++ > > 1 file changed, 40 insertions(+) > > > > diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst > > index b4591a2dec..a6e42c6b1b 100644 > > --- a/docs/devel/code-provenance.rst > > +++ b/docs/devel/code-provenance.rst > > @@ -195,3 +195,43 @@ example:: > > Signed-off-by: Some Person <some.person@example.com> > > [Rebased and added support for 'foo'] > > Signed-off-by: New Person <new.person@example.com> > > + > > +Use of "AI" (LLM) code generators > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > + > > +TL;DR: > > + > > + **Current QEMU project policy is to DECLINE any contributions > > + which are believed to include or derive from "AI" (LLM) > > + generated code.** > > + > > +The existence of "AI" (`Large Language Model <https://en.wikipedia.org/wiki/Large_language_model>`__ > > +/ LLM) code generators raises a number of difficult legal questions, a > > +number of which impact on Open Source projects. As noted earlier, the > > +QEMU community requires that contributors certify their patch submissions > > +are made in accordance with the rules of the :ref:`dco` (DCO). When a > > +patch contains "AI" generated code this raises difficulties with code > > +provenence and thus DCO compliance. > > + > > +To satisfy the DCO, the patch contributor has to fully understand > > +the origins and license of code they are contributing to QEMU. The > > +license terms that should apply to the output of an "AI" code generator > > +are ill-defined, given that both training data and operation of the > > +"AI" are typically opaque to the user. Even where the training data > > +is said to all be open source, it will likely be under a wide variety > > +of license terms. > > + > > +While the vendor's of "AI" code generators may promote the idea that > > +code output can be taken under a free choice of license, this is not > > +yet considered to be a generally accepted, nor tested, legal opinion. > > + > > +With this in mind, the QEMU maintainers does not consider it is > > +currently possible to comply with DCO terms (b) or (c) for most "AI" > > +generated code. > > + > > +The QEMU maintainers thus require that contributors refrain from using > > +"AI" code generators on patches intended to be submitted to the project, > > +and will decline any contribution if use of "AI" is known or suspected. > > + > > +Examples of tools impacted by this policy includes both GitHub CoPilot, > > +and ChatGPT, amongst many others which are less well known. > > > So you called out these two by name, fine, but given "AI" is in scare > quotes I don't really know what is or is not allowed and I don't know > how will contributors know. Is the "AI" that one must not use > necessarily an LLM? And how do you define LLM even? Wikipedia says > "general-purpose language understanding and generation". I used "AI" in quotes, because I think it can mean different things to different people. In practical terms it has become a bit of a catch all term for a wide variety of tools. Thus I think the quote serve to express this as a loose generalization, rather than a precise definition. The same for "LLM", I don't want to try to define it, as it has also become somewhat of a general term. > All this seems vague to me. Delibrately so, as there are a wide variety of tools working in varying ways, but all with similar caveats around the licensing of the output "derivative" work. > However, can't we define a simpler more specific policy? > For example, isn't it true that *any* automatically generated code > can only be included if the scripts producing said code > are also included or otherwise available under GPLv2? The license of a code generation tool itself is usually considered to be not a factor in the license of its output. In most cases the license of the input data will determine the license of the output data, since the latter is a derivative work of the former. The person runing the tool will typically know exact what the input data is, and so have confidence over the license of the output. If there are questions about whether the output is a derivative of the tool's code itself, then the tool author can provide an disclaimer for this. Such a disclaimer though, would not erase the derivative link between input data and output data. One example is GCC where the output .o/exe is a derivative of the input .c. The output, however, may also link the gcc runtime library, and so GCC has a license exception saying that this runtime linkage doesn't affect the license of the output program. This is OK, since the GCC authors who added this exception owned copyright over the runtime library they're adding an exception for. If we apply this to LLMs, the output of the LLM is a derivative of the training data. The output is not a derivative of the LLM code. The LLM copyright holders could make this latter point explicit since they own copyright of the LLM code, but they do not own copyright of the training data, and neither does the person using the LLM, hence the legal uncertainty. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators 2023-11-23 17:58 ` Daniel P. Berrangé @ 2023-11-23 22:39 ` Michael S. Tsirkin 2023-11-24 9:06 ` Daniel P. Berrangé 0 siblings, 1 reply; 57+ messages in thread From: Michael S. Tsirkin @ 2023-11-23 22:39 UTC (permalink / raw) To: Daniel P. Berrangé Cc: qemu-devel, Richard Henderson, Alexander Graf, Alex Bennée, Paolo Bonzini, Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell On Thu, Nov 23, 2023 at 05:58:45PM +0000, Daniel P. Berrangé wrote: > The license of a code generation tool itself is usually considered > to be not a factor in the license of its output. Really? I would find it very surprising if a code generation tool that is not a language model and so is not understanding the code it's generating did not include some code snippets going into the output. It is also possible to unintentionally run afoul of GPL's definition of source code which is "the preferred form of the work for making modifications to it". So even if you have copyright to input, dumping just output and putting GPL on it might or might not be ok. -- MST ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators 2023-11-23 22:39 ` Michael S. Tsirkin @ 2023-11-24 9:06 ` Daniel P. Berrangé 2023-11-24 9:27 ` Michael S. Tsirkin 2023-11-24 10:21 ` Alex Bennée 0 siblings, 2 replies; 57+ messages in thread From: Daniel P. Berrangé @ 2023-11-24 9:06 UTC (permalink / raw) To: Michael S. Tsirkin Cc: qemu-devel, Richard Henderson, Alexander Graf, Alex Bennée, Paolo Bonzini, Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell On Thu, Nov 23, 2023 at 05:39:18PM -0500, Michael S. Tsirkin wrote: > On Thu, Nov 23, 2023 at 05:58:45PM +0000, Daniel P. Berrangé wrote: > > The license of a code generation tool itself is usually considered > > to be not a factor in the license of its output. > > Really? I would find it very surprising if a code generation tool that > is not a language model and so is not understanding the code it's > generating did not include some code snippets going into the output. > It is also possible to unintentionally run afoul of GPL's definition of source > code which is "the preferred form of the work for making modifications to it". > So even if you have copyright to input, dumping just output and putting > GPL on it might or might not be ok. Consider the C pre-processor. This takes an input .c file, and expands all the macros, to split out a new .c file. The license of the output .c file is determined by the license of the input .c file. The license of the CPP impl (whether OSS or proprietary) doesn't have any influence on the license of the output file, it cannot magically force the output file to be proprietary any more than it can force it to be output file GPL. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators 2023-11-24 9:06 ` Daniel P. Berrangé @ 2023-11-24 9:27 ` Michael S. Tsirkin 2023-11-24 10:21 ` Alex Bennée 1 sibling, 0 replies; 57+ messages in thread From: Michael S. Tsirkin @ 2023-11-24 9:27 UTC (permalink / raw) To: Daniel P. Berrangé Cc: qemu-devel, Richard Henderson, Alexander Graf, Alex Bennée, Paolo Bonzini, Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell On Fri, Nov 24, 2023 at 09:06:29AM +0000, Daniel P. Berrangé wrote: > On Thu, Nov 23, 2023 at 05:39:18PM -0500, Michael S. Tsirkin wrote: > > On Thu, Nov 23, 2023 at 05:58:45PM +0000, Daniel P. Berrangé wrote: > > > The license of a code generation tool itself is usually considered > > > to be not a factor in the license of its output. > > > > Really? I would find it very surprising if a code generation tool that > > is not a language model and so is not understanding the code it's > > generating did not include some code snippets going into the output. > > It is also possible to unintentionally run afoul of GPL's definition of source > > code which is "the preferred form of the work for making modifications to it". > > So even if you have copyright to input, dumping just output and putting > > GPL on it might or might not be ok. > > Consider the C pre-processor. This takes an input .c file, and expands > all the macros, to split out a new .c file. > > The license of the output .c file is determined by the license of the > input .c file. The license of the CPP impl (whether OSS or proprietary) > doesn't have any influence on the license of the output file, it cannot > magically force the output file to be proprietary any more than it can > force it to be output file GPL. > > With regards, > Daniel Sorry I don't get how is C preprocessor relevant here? It does not generate source code in the GPL sense. We won't accept C preprocessor output in a patch. Not being a lawyer I personally am not really interested in discussing how copyright works, certainly not at this highly abstract and simplified level. -- MST ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators 2023-11-24 9:06 ` Daniel P. Berrangé 2023-11-24 9:27 ` Michael S. Tsirkin @ 2023-11-24 10:21 ` Alex Bennée 2023-11-24 10:30 ` Michael S. Tsirkin 2023-11-24 11:41 ` Daniel P. Berrangé 1 sibling, 2 replies; 57+ messages in thread From: Alex Bennée @ 2023-11-24 10:21 UTC (permalink / raw) To: Daniel P. Berrangé Cc: Michael S. Tsirkin, qemu-devel, Richard Henderson, Alexander Graf, Paolo Bonzini, Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell Daniel P. Berrangé <berrange@redhat.com> writes: > On Thu, Nov 23, 2023 at 05:39:18PM -0500, Michael S. Tsirkin wrote: >> On Thu, Nov 23, 2023 at 05:58:45PM +0000, Daniel P. Berrangé wrote: >> > The license of a code generation tool itself is usually considered >> > to be not a factor in the license of its output. >> >> Really? I would find it very surprising if a code generation tool that >> is not a language model and so is not understanding the code it's >> generating did not include some code snippets going into the output. >> It is also possible to unintentionally run afoul of GPL's definition of source >> code which is "the preferred form of the work for making modifications to it". >> So even if you have copyright to input, dumping just output and putting >> GPL on it might or might not be ok. > > Consider the C pre-processor. This takes an input .c file, and expands > all the macros, to split out a new .c file. > > The license of the output .c file is determined by the license of the > input .c file. The license of the CPP impl (whether OSS or proprietary) > doesn't have any influence on the license of the output file, it cannot > magically force the output file to be proprietary any more than it can > force it to be output file GPL. LLM's are just a tool like a compiler (albeit with spookier different internals). The prompt and the instructions are arguably the more important part of how to get good results from the LLM transformation. In fact most of the way I've been using them has been by pasting some existing code and asking for review or transformation of it. However I totally get that using the various online LLMs you have very little transparency about what has gone into their training and therefor there is a danger of proprietary code being hallucinated out of their matricies. Conversely what if I use an LLM like OpenLLaMa: https://github.com/openlm-research/open_llama I have fairly exhaustive definitions of what went into the training data which of most interest is probably the StarCoder dataset (paper): https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view where there are tools to detect if generated code has been lifted directly from the dataset or is indeed a transformation. > > With regards, > Daniel -- Alex Bennée Virtualisation Tech Lead @ Linaro ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators 2023-11-24 10:21 ` Alex Bennée @ 2023-11-24 10:30 ` Michael S. Tsirkin 2023-11-24 11:41 ` Daniel P. Berrangé 1 sibling, 0 replies; 57+ messages in thread From: Michael S. Tsirkin @ 2023-11-24 10:30 UTC (permalink / raw) To: Alex Bennée Cc: Daniel P. Berrangé, qemu-devel, Richard Henderson, Alexander Graf, Paolo Bonzini, Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell On Fri, Nov 24, 2023 at 10:21:17AM +0000, Alex Bennée wrote: > LLM's are just a tool like a compiler (albeit with spookier different > internals). We already generally don't accept compiler output in patches since it is not source code by the definition of GPL. -- MST ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators 2023-11-24 10:21 ` Alex Bennée 2023-11-24 10:30 ` Michael S. Tsirkin @ 2023-11-24 11:41 ` Daniel P. Berrangé 1 sibling, 0 replies; 57+ messages in thread From: Daniel P. Berrangé @ 2023-11-24 11:41 UTC (permalink / raw) To: Alex Bennée Cc: Michael S. Tsirkin, qemu-devel, Richard Henderson, Alexander Graf, Paolo Bonzini, Markus Armbruster, Phil Mathieu-Daudé, Stefan Hajnoczi, Thomas Huth, Kevin Wolf, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell On Fri, Nov 24, 2023 at 10:21:17AM +0000, Alex Bennée wrote: > Daniel P. Berrangé <berrange@redhat.com> writes: > > > On Thu, Nov 23, 2023 at 05:39:18PM -0500, Michael S. Tsirkin wrote: > >> On Thu, Nov 23, 2023 at 05:58:45PM +0000, Daniel P. Berrangé wrote: > >> > The license of a code generation tool itself is usually considered > >> > to be not a factor in the license of its output. > >> > >> Really? I would find it very surprising if a code generation tool that > >> is not a language model and so is not understanding the code it's > >> generating did not include some code snippets going into the output. > >> It is also possible to unintentionally run afoul of GPL's definition of source > >> code which is "the preferred form of the work for making modifications to it". > >> So even if you have copyright to input, dumping just output and putting > >> GPL on it might or might not be ok. > > > > Consider the C pre-processor. This takes an input .c file, and expands > > all the macros, to split out a new .c file. > > > > The license of the output .c file is determined by the license of the > > input .c file. The license of the CPP impl (whether OSS or proprietary) > > doesn't have any influence on the license of the output file, it cannot > > magically force the output file to be proprietary any more than it can > > force it to be output file GPL. > > LLM's are just a tool like a compiler (albeit with spookier different > internals). The prompt and the instructions are arguably the more > important part of how to get good results from the LLM transformation. > In fact most of the way I've been using them has been by pasting some > existing code and asking for review or transformation of it. > > However I totally get that using the various online LLMs you have very > little transparency about what has gone into their training and therefor > there is a danger of proprietary code being hallucinated out of their > matricies. Conversely what if I use an LLM like OpenLLaMa: > > https://github.com/openlm-research/open_llama > > I have fairly exhaustive definitions of what went into the training data > which of most interest is probably the StarCoder dataset (paper): > > https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view > > where there are tools to detect if generated code has been lifted > directly from the dataset or is indeed a transformation. I've not looked at the links above, but I think if someone can make an compelling argument that *specific* tools have sufficient transparency to be compatible with signing the DCO, then I think we could maintain a list of exceptions in the policy. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 57+ messages in thread
* Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators 2023-11-23 11:40 ` [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators Daniel P. Berrangé ` (2 preceding siblings ...) 2023-11-23 14:35 ` Michael S. Tsirkin @ 2023-11-23 15:22 ` Stefan Hajnoczi 3 siblings, 0 replies; 57+ messages in thread From: Stefan Hajnoczi @ 2023-11-23 15:22 UTC (permalink / raw) To: Daniel P. Berrangé Cc: qemu-devel, Richard Henderson, Alexander Graf, Alex Bennée, Paolo Bonzini, Michael S. Tsirkin, Markus Armbruster, Phil Mathieu-Daudé, Thomas Huth, Kevin Wolf, Gerd Hoffmann, Mark Cave-Ayland, Peter Maydell [-- Attachment #1: Type: text/plain, Size: 1685 bytes --] On Thu, Nov 23, 2023 at 11:40:26AM +0000, Daniel P. Berrangé wrote: > There has been an explosion of interest in so called "AI" (LLM) > code generators in the past year or so. Thus far though, this is > has not been matched by a broadly accepted legal interpretation > of the licensing implications for code generator outputs. While > the vendors may claim there is no problem and a free choice of > license is possible, they have an inherent conflict of interest > in promoting this interpretation. More broadly there is, as yet, > no broad consensus on the licensing implications of code generators > trained on inputs under a wide variety of licenses. > > The DCO requires contributors to assert they have the right to > contribute under the designated project license. Given the lack > of consensus on the licensing of "AI" (LLM) code generator output, > it is not considered credible to assert compliance with the DCO > clause (b) or (c) where a patch includes such generated code. > > This patch thus defines a policy that the QEMU project will not > accept contributions where use of "AI" (LLM) code generators is > either known, or suspected. > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> > --- > docs/devel/code-provenance.rst | 40 ++++++++++++++++++++++++++++++++++ > 1 file changed, 40 insertions(+) As open source LLMs mature, it may be possible to curate the training data so that the output complies with software licenses and can be used in QEMU. For the time being, the position in this patch seems reasonable because it prevents license problems down the road. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 57+ messages in thread
end of thread, other threads:[~2024-01-29 11:01 UTC | newest] Thread overview: 57+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-11-23 11:40 [PATCH 0/2] docs: define policy forbidding use of "AI" / LLM code generators Daniel P. Berrangé 2023-11-23 11:40 ` [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off Daniel P. Berrangé 2023-11-23 11:58 ` Philippe Mathieu-Daudé 2023-11-23 17:08 ` Daniel P. Berrangé 2023-11-23 23:56 ` Michael S. Tsirkin 2023-11-23 13:01 ` Peter Maydell 2023-11-23 17:12 ` Daniel P. Berrangé 2023-11-23 13:16 ` Kevin Wolf 2023-11-23 17:12 ` Daniel P. Berrangé 2023-11-23 14:25 ` Michael S. Tsirkin 2023-11-23 17:16 ` Daniel P. Berrangé 2023-11-23 17:33 ` Michael S. Tsirkin 2023-11-24 11:11 ` Philippe Mathieu-Daudé 2023-11-24 11:27 ` Michael S. Tsirkin 2023-11-24 9:49 ` Kevin Wolf 2023-11-23 15:13 ` Stefan Hajnoczi 2024-01-27 14:36 ` Zhao Liu 2024-01-29 9:31 ` Daniel P. Berrangé 2024-01-29 9:35 ` Samuel Tardieu 2024-01-29 10:41 ` Peter Maydell 2024-01-29 11:00 ` Daniel P. Berrangé 2023-11-23 11:40 ` [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators Daniel P. Berrangé 2023-11-23 12:57 ` Alex Bennée 2023-11-23 17:37 ` Michal Suchánek 2023-11-23 23:27 ` Michael S. Tsirkin 2023-11-23 17:46 ` Daniel P. Berrangé 2023-11-23 23:53 ` Michael S. Tsirkin 2023-11-24 10:17 ` Kevin Wolf 2023-11-24 10:33 ` Alex Bennée 2023-11-24 10:42 ` Michael S. Tsirkin 2023-11-24 10:43 ` Peter Maydell 2023-11-24 11:02 ` Michael S. Tsirkin 2023-11-24 11:37 ` Daniel P. Berrangé 2023-11-24 11:39 ` Michael S. Tsirkin 2023-11-24 11:40 ` Michael S. Tsirkin 2023-11-23 13:20 ` Kevin Wolf 2023-11-23 14:35 ` Michael S. Tsirkin 2023-11-23 14:56 ` Manos Pitsidianakis 2023-11-23 15:13 ` Michael S. Tsirkin 2023-11-23 15:29 ` Philippe Mathieu-Daudé 2023-11-23 17:06 ` Michael S. Tsirkin 2023-11-23 17:29 ` Michal Suchánek 2023-11-23 18:05 ` Michael S. Tsirkin 2023-11-23 15:32 ` Alex Bennée 2023-11-23 18:02 ` Daniel P. Berrangé 2023-11-23 18:10 ` Peter Maydell 2023-11-24 10:25 ` Kevin Wolf 2023-11-24 10:37 ` Michael S. Tsirkin 2023-11-24 10:42 ` Manos Pitsidianakis 2023-11-23 17:58 ` Daniel P. Berrangé 2023-11-23 22:39 ` Michael S. Tsirkin 2023-11-24 9:06 ` Daniel P. Berrangé 2023-11-24 9:27 ` Michael S. Tsirkin 2023-11-24 10:21 ` Alex Bennée 2023-11-24 10:30 ` Michael S. Tsirkin 2023-11-24 11:41 ` Daniel P. Berrangé 2023-11-23 15:22 ` Stefan Hajnoczi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).