Re: [PATCH v3 3/3] docs: define policy forbidding use of AI code generators

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: "Philippe Mathieu-Daudé" <philmd@linaro.org>
To: "Daniel P. Berrangé" <berrange@redhat.com>
Cc: "Markus Armbruster" <armbru@redhat.com>,
	"Stefan Hajnoczi" <stefanha@gmail.com>,
	qemu-devel@nongnu.org, "Thomas Huth" <thuth@redhat.com>,
	"Alex Bennée" <alex.bennee@linaro.org>,
	"Michael S . Tsirkin" <mst@redhat.com>,
	"Gerd Hoffmann" <kraxel@redhat.com>,
	"Mark Cave-Ayland" <mark.cave-ayland@ilande.co.uk>,
	"Kevin Wolf" <kwolf@redhat.com>,
	"Stefan Hajnoczi" <stefanha@redhat.com>,
	"Alexander Graf" <agraf@csgraf.de>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Richard Henderson" <richard.henderson@linaro.org>,
	"Peter Maydell" <peter.maydell@linaro.org>,
	"Pierrick Bouvier" <pierrick.bouvier@linaro.org>
Subject: Re: [PATCH v3 3/3] docs: define policy forbidding use of AI code generators
Date: Wed, 4 Jun 2025 11:19:10 +0200	[thread overview]
Message-ID: <3f35fb33-97f9-433e-a5bd-86d2926cf3d5@linaro.org> (raw)
In-Reply-To: <aEAGadbMexZ9mm4a@redhat.com>

On 4/6/25 10:40, Daniel P. Berrangé wrote:
> On Wed, Jun 04, 2025 at 09:54:33AM +0200, Philippe Mathieu-Daudé wrote:
>> On 4/6/25 09:15, Daniel P. Berrangé wrote:
>>> On Wed, Jun 04, 2025 at 08:17:27AM +0200, Markus Armbruster wrote:
>>>> Stefan Hajnoczi <stefanha@gmail.com> writes:
>>>>
>>>>> On Tue, Jun 3, 2025 at 10:25 AM Markus Armbruster <armbru@redhat.com> wrote:
>>>>>>
>>>>>> From: Daniel P. Berrangé <berrange@redhat.com>
>>>    >> +
>>>>>> +The increasing prevalence of AI code generators, most notably but not limited
>>>>>
>>>>> More detail is needed on what an "AI code generator" is. Coding
>>>>> assistant tools range from autocompletion to linters to automatic code
>>>>> generators. In addition there are other AI-related tools like ChatGPT
>>>>> or Gemini as a chatbot that can people use like Stackoverflow or an
>>>>> API documentation summarizer.
>>>>>
>>>>> I think the intent is to say: do not put code that comes from _any_ AI
>>>>> tool into QEMU.
>>>>>
>>>>> It would be okay to use AI to research APIs, algorithms, brainstorm
>>>>> ideas, debug the code, analyze the code, etc but the actual code
>>>>> changes must not be generated by AI.
>>>
>>> The scope of the policy is around contributions we receive as
>>> patches with SoB. Researching / brainstorming / analysis etc
>>> are not contribution activities, so not covered by the policy
>>> IMHO.
>>>
>>>>
>>>> The existing text is about "AI code generators".  However, the "most
>>>> notably LLMs" that follows it could lead readers to believe it's about
>>>> more than just code generation, because LLMs are in fact used for more.
>>>> I figure this is your concern.
>>>>
>>>> We could instead start wide, then narrow the focus to code generation.
>>>> Here's my try:
>>>>
>>>>     The increasing prevalence of AI-assisted software development results
>>>>     in a number of difficult legal questions and risks for software
>>>>     projects, including QEMU.  Of particular concern is code generated by
>>>>     `Large Language Models
>>>>     <https://en.wikipedia.org/wiki/Large_language_model>`__ (LLMs).
>>>
>>> Documentation we maintain has the same concerns as code.
>>> So I'd suggest to substitute 'code' with 'code / content'.
>>
>> Why couldn't we accept documentation patches improved using LLM?
> 
> I would flip it around and ask why would documentation not be held
> to the same standard as code, when it comes to licensing and legal
> compliance ?
> 
> This is all copyright content that we merge & distribute under the
> same QEMU licensing terms, and we have the same legal obligations
> whether it is "source code" or "documentation" or other content
> that is not traditional "source code" (images for example).
> 
> 
>> As a non-native English speaker being often stuck trying to describe
>> function APIs, I'm very tempted to use a LLM to review my sentences
>> and make them better understandable.
> 
> I can understand that desire, and it is an admittedly tricky situation
> and tradeoff for which I don't have a great answer.
> 
> As a starting point we (as reviewers/maintainers) must be broadly
> very tolerant & accepting of content that is not perfect English,
> because we know many (probably even the majority of) contributors
> won't have English as their first language.
> 
> As a reviewer I don't mind imperfect language in submissions. Even
> if language is not perfect it is at least a direct expression of
> the author's understanding and thus we can have a level of trust
> in the docs based on our community experience with the contributor.
> 
> If docs have been altered in any significant manner by an LLM,
> even if they are linguistically improved, IMHO, knowing that use
> of LLM would reduce my personal trust in the technically accuracy
> of the contribution.
> 
> This is straying into the debate around the accuracy of LLMs though,
> which is interesting, but tangential from the purpose of this policy
> which aims to focus on the code provenance / legal side.
> 
> 
> 
> So, back on track, a important point is that this policy (& the
> legal concerns/risks it attempts to address) are implicitly
> around contributions that can be considered copyrightable.
> 
> Some so called "trivial" work can be so simplistic as to not meet
> the threshold for copyright protection, and it is thus easy for the
> DCO requirements to be satisfied.
> 
> 
> As a person, when you write the API documentation from scratch,
> your output would generally be considered to be copyrightable
> contribution by the author.
> 
> When a reviewer then suggests changes to your docs, most of the
> time those changes are so trivial, that the reviewer wouldn't be
> claiming copyright over the resulting work.
> 
> If the reviewer completely rewrites entire sentences in the
> docs though, though would be able to claim copyright over part
> of the resulting work.
> 
> 
> The tippping point between copyrightable/non-copyrightable is
> hard to define in a policy. It is inherantly fuzzy, and somewhat
> of a "you'll know it when you see it" or "lets debate it in court"
> situation...
> 
> 
> So back to LLMs.
> 
> 
> If you ask the LLM (or an agent using an LLM) to entirely write
> the API docs from scratch, I think that should be expected to
> fall under this proposed contribution policy in general.
> 
> 
> If you write the API docs yourself and ask the LLM to review and
> suggest improvements, that MAY or MAY NOT fall under this policy.
> 
> If the LLM suggested tweaks were minor enough to be considered
> not to meet the threshold to be copyrightable it would be fine,
> this is little different to a human reviewer suggesting tweaks.

Good.

> If the LLM suggested large scale rewriting that would be harder
> to draw the line, but would tend towards falling under this
> contribution policy.
> 
> So it depends on the scope of what the LLM suggested as a change
> to your docs.
> 
> IOW, LLM-as-sparkling-auto-correct is probably OK, but
> LLM-as-book-editor / LLM-as-ghost-writer is probably NOT OK

OK.

> This is a scenario where the QEMU contributor has to use their
> personal judgement as to whether their use of LLM in a docs context
> is compliant with this policy, or not. I don't think we should try
> to describe this in the policy given how fuzzy the situation is.

Thank you very much for this detailed explanation!

> 
> NB, this copyrightable/non-copyrightable situation applies to source
> code too, not just docs.
> 
> With regards,
> Daniel

next prev parent reply	other threads:[~2025-06-04  9:19 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-03 14:25 [PATCH v3 0/3] docs: define policy forbidding use of "AI" / LLM code generators Markus Armbruster
2025-06-03 14:25 ` [PATCH v3 1/3] docs: introduce dedicated page about code provenance / sign-off Markus Armbruster
2025-06-03 16:53   ` Alex Bennée
2025-06-04  6:44     ` Markus Armbruster
2025-06-04  7:18       ` Daniel P. Berrangé
2025-06-04  7:46       ` Philippe Mathieu-Daudé
2025-06-04  8:52         ` Markus Armbruster
2025-06-05  9:04           ` Markus Armbruster
2025-06-04  7:58       ` Gerd Hoffmann
2025-06-05 14:52       ` Markus Armbruster
2025-06-05 15:07         ` Alex Bennée
2025-06-03 14:25 ` [PATCH v3 2/3] docs: define policy limiting the inclusion of generated files Markus Armbruster
2025-06-03 14:25 ` [PATCH v3 3/3] docs: define policy forbidding use of AI code generators Markus Armbruster
2025-06-03 15:37   ` Kevin Wolf
2025-06-04  6:18     ` Markus Armbruster
2025-06-03 18:25   ` Stefan Hajnoczi
2025-06-04  6:17     ` Markus Armbruster
2025-06-04  7:15       ` Daniel P. Berrangé
2025-06-04  7:54         ` Philippe Mathieu-Daudé
2025-06-04  8:40           ` Daniel P. Berrangé
2025-06-04  9:19             ` Philippe Mathieu-Daudé [this message]
2025-06-04  9:04           ` Markus Armbruster
2025-06-04  8:58         ` Markus Armbruster
2025-06-04  9:22           ` Daniel P. Berrangé
2025-06-04  9:40             ` Markus Armbruster
2025-06-04 12:35             ` Yan Vugenfirer
2025-06-04  9:10     ` Daniel P. Berrangé
2025-06-04 11:01       ` Stefan Hajnoczi
2025-06-03 15:25 ` [PATCH v3 0/3] docs: define policy forbidding use of "AI" / LLM " Kevin Wolf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3f35fb33-97f9-433e-a5bd-86d2926cf3d5@linaro.org \
    --to=philmd@linaro.org \
    --cc=agraf@csgraf.de \
    --cc=alex.bennee@linaro.org \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=kraxel@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mark.cave-ayland@ilande.co.uk \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=pierrick.bouvier@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=stefanha@gmail.com \
    --cc=stefanha@redhat.com \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).