From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from lists.gnu.org (lists.gnu.org [209.51.188.17])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id 924BDC5B543
	for <qemu-devel@archiver.kernel.org>; Wed,  4 Jun 2025 09:19:45 +0000 (UTC)
Received: from localhost ([::1] helo=lists1p.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <qemu-devel-bounces@nongnu.org>)
	id 1uMkHM-0000oW-7V; Wed, 04 Jun 2025 05:19:25 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <philmd@linaro.org>) id 1uMkHI-0000oA-4E
 for qemu-devel@nongnu.org; Wed, 04 Jun 2025 05:19:20 -0400
Received: from mail-wr1-x432.google.com ([2a00:1450:4864:20::432])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128)
 (Exim 4.90_1) (envelope-from <philmd@linaro.org>) id 1uMkHE-00064S-6W
 for qemu-devel@nongnu.org; Wed, 04 Jun 2025 05:19:18 -0400
Received: by mail-wr1-x432.google.com with SMTP id
 ffacd0b85a97d-3a522224582so148693f8f.3
 for <qemu-devel@nongnu.org>; Wed, 04 Jun 2025 02:19:15 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=linaro.org; s=google; t=1749028753; x=1749633553; darn=nongnu.org;
 h=content-transfer-encoding:in-reply-to:from:content-language
 :references:cc:to:subject:user-agent:mime-version:date:message-id
 :from:to:cc:subject:date:message-id:reply-to;
 bh=y0iaDF0wTPGluPJcZERPPEHrKL0Zzqdsi4XP1wHEdSM=;
 b=g17K5q/wYv97oUqnS8ETFP/VhDoULVsYq7QEdgIhUvzFH6Nrgij96/ATVAQj6uDkZy
 z2G4524B2cpRyPDpYWJCM70NGLcpe5ifIQlUxFTxKrDfP9ZEK1Pk5RPoquvyktHuYhxt
 +geeFQWARpHxrnRfmVGubs25anb9T8EsaPpFVztUT+mPRFHLQXhRtTZkPuXsE92WIvIu
 4CgDJKKbWh07J6B3l+G/9NQHcSHXel4IIXgXKHZgPHq9L8DAw+uykl+JSllgLQTnIF6+
 BYdJzCG7tRUMjllpflUv1YbKwizE4fIfe7ufO2jvU3iLBFiEPb1Qr9AlrOE8ih2CUbFB
 TWsw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20230601; t=1749028753; x=1749633553;
 h=content-transfer-encoding:in-reply-to:from:content-language
 :references:cc:to:subject:user-agent:mime-version:date:message-id
 :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
 bh=y0iaDF0wTPGluPJcZERPPEHrKL0Zzqdsi4XP1wHEdSM=;
 b=vyo+TKwYpz8tkhpV1yAfFLV1WUaQ+Puns2Wg+J5Cg/Q61f+iGdddIWoVipBsGP9uCe
 xFYCtqKK3zIEtf1+Exdd1dMXM40OTORwCA9UOEfceQLUtgu5hfE4n109K0UmROE95FWi
 OVd+vZzL+E3GE8oeALssxzwA0IBgvNsZP2dI67LOqNrqtDwUFgpuizDHIShFWMxGPbxo
 10i89HsAIx9k1gDiSZIzuzFWJVAXGBtfm3IwdicYEIco/41AhrN5+CJhujJGHrZp7PoA
 OCO8Az8dt9WLpDkT5FVgQocC8oWsqDaD8yLJdhogs3DL4GA9SI6rwFzFp7mWvMShTECU
 esJA==
X-Forwarded-Encrypted: i=1;
 AJvYcCW7Uw6YU1DEZ4gsdY/geLALhFeZr09dlr96GWN6gNLE+HUQWorm/YX8dSljXVtURPcVD7mcc1I/OFM5@nongnu.org
X-Gm-Message-State: AOJu0YwUqqoR4hVYQKSnvcC0/G0dZv0iySRCEd7TL92spI+GlF/j0mMA
 1qfmtLtMvO798nkrqDn1SInjvUIcCyaWNeTNzbqA5CiVtfIlDDRGILw80r7x3EyW4B4=
X-Gm-Gg: ASbGnctMvNaElyoMsY4Y4fw99t5EpXBaHq+VKVGF6DjCbJdrlGbqcZnrbDH4KenBYmf
 uXjaPGvW1Ajz++w+uwQgW9htzRLpXraCbagmnjMud3rqZ7pYBdejsIRy95HSwQeNwYZqOH+Oobc
 OXuchfs/1nhQeAb5vEarFuNA6ahSkKt7POCHZO6nzGWuwHFy0vUM1wuyxR5EPbfpyudMQ1dV9ie
 04qKn8NK1RhmKIRoh68l9/SroaUXSRXzWR7gZH9Vw8fkWXW13LymCgTuJODEQ5evsAvTYIsWm8t
 eoRzflu0+uQHAlCybqh6ePQmIR4ZMOceDbM7xrsgQstkzl6Rpk8xUUvlbDYPDM0AK753+qFu+kX
 /JaTeBXvNJBNVHSDkn6A=
X-Google-Smtp-Source: AGHT+IEWi/LH/skZWQabOZ+TFGv/UA+uu8q1pQmn0ORzUJaWUEEE4FhAK7boRk/quFd/cRBr3jq/NQ==
X-Received: by 2002:a5d:5f89:0:b0:3a3:7bad:29cb with SMTP id
 ffacd0b85a97d-3a51dc4c4bfmr1535713f8f.52.1749028752692; 
 Wed, 04 Jun 2025 02:19:12 -0700 (PDT)
Received: from [192.168.69.138] (88-187-86-199.subs.proxad.net.
 [88.187.86.199]) by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-3a4f00971e4sm21403235f8f.65.2025.06.04.02.19.11
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Wed, 04 Jun 2025 02:19:12 -0700 (PDT)
Message-ID: <3f35fb33-97f9-433e-a5bd-86d2926cf3d5@linaro.org>
Date: Wed, 4 Jun 2025 11:19:10 +0200
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: [PATCH v3 3/3] docs: define policy forbidding use of AI code
 generators
To: =?UTF-8?Q?Daniel_P=2E_Berrang=C3=A9?= <berrange@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>,
 Stefan Hajnoczi <stefanha@gmail.com>, qemu-devel@nongnu.org,
 Thomas Huth <thuth@redhat.com>, =?UTF-8?Q?Alex_Benn=C3=A9e?=
 <alex.bennee@linaro.org>, "Michael S . Tsirkin" <mst@redhat.com>,
 Gerd Hoffmann <kraxel@redhat.com>,
 Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>,
 Kevin Wolf <kwolf@redhat.com>, Stefan Hajnoczi <stefanha@redhat.com>,
 Alexander Graf <agraf@csgraf.de>, Paolo Bonzini <pbonzini@redhat.com>,
 Richard Henderson <richard.henderson@linaro.org>,
 Peter Maydell <peter.maydell@linaro.org>,
 Pierrick Bouvier <pierrick.bouvier@linaro.org>
References: <20250603142524.4043193-1-armbru@redhat.com>
 <20250603142524.4043193-4-armbru@redhat.com>
 <CAJSP0QUGaQEwhVh_w6Wbdm-Nqo_2kHcb+eS2Simq-x9J=-7qkg@mail.gmail.com>
 <87a56o1154.fsf@pond.sub.org> <aD_yhelX-w4Vdm8Z@redhat.com>
 <3df2ae5d-c1c6-45ee-8119-ca42e17a0d98@linaro.org>
 <aEAGadbMexZ9mm4a@redhat.com>
Content-Language: en-US
From: =?UTF-8?Q?Philippe_Mathieu-Daud=C3=A9?= <philmd@linaro.org>
In-Reply-To: <aEAGadbMexZ9mm4a@redhat.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Received-SPF: pass client-ip=2a00:1450:4864:20::432;
 envelope-from=philmd@linaro.org; helo=mail-wr1-x432.google.com
X-Spam_score_int: -20
X-Spam_score: -2.1
X-Spam_bar: --
X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,
 RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001,
 SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org
Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org

On 4/6/25 10:40, Daniel P. Berrangé wrote:
> On Wed, Jun 04, 2025 at 09:54:33AM +0200, Philippe Mathieu-Daudé wrote:
>> On 4/6/25 09:15, Daniel P. Berrangé wrote:
>>> On Wed, Jun 04, 2025 at 08:17:27AM +0200, Markus Armbruster wrote:
>>>> Stefan Hajnoczi <stefanha@gmail.com> writes:
>>>>
>>>>> On Tue, Jun 3, 2025 at 10:25 AM Markus Armbruster <armbru@redhat.com> wrote:
>>>>>>
>>>>>> From: Daniel P. Berrangé <berrange@redhat.com>
>>>    >> +
>>>>>> +The increasing prevalence of AI code generators, most notably but not limited
>>>>>
>>>>> More detail is needed on what an "AI code generator" is. Coding
>>>>> assistant tools range from autocompletion to linters to automatic code
>>>>> generators. In addition there are other AI-related tools like ChatGPT
>>>>> or Gemini as a chatbot that can people use like Stackoverflow or an
>>>>> API documentation summarizer.
>>>>>
>>>>> I think the intent is to say: do not put code that comes from _any_ AI
>>>>> tool into QEMU.
>>>>>
>>>>> It would be okay to use AI to research APIs, algorithms, brainstorm
>>>>> ideas, debug the code, analyze the code, etc but the actual code
>>>>> changes must not be generated by AI.
>>>
>>> The scope of the policy is around contributions we receive as
>>> patches with SoB. Researching / brainstorming / analysis etc
>>> are not contribution activities, so not covered by the policy
>>> IMHO.
>>>
>>>>
>>>> The existing text is about "AI code generators".  However, the "most
>>>> notably LLMs" that follows it could lead readers to believe it's about
>>>> more than just code generation, because LLMs are in fact used for more.
>>>> I figure this is your concern.
>>>>
>>>> We could instead start wide, then narrow the focus to code generation.
>>>> Here's my try:
>>>>
>>>>     The increasing prevalence of AI-assisted software development results
>>>>     in a number of difficult legal questions and risks for software
>>>>     projects, including QEMU.  Of particular concern is code generated by
>>>>     `Large Language Models
>>>>     <https://en.wikipedia.org/wiki/Large_language_model>`__ (LLMs).
>>>
>>> Documentation we maintain has the same concerns as code.
>>> So I'd suggest to substitute 'code' with 'code / content'.
>>
>> Why couldn't we accept documentation patches improved using LLM?
> 
> I would flip it around and ask why would documentation not be held
> to the same standard as code, when it comes to licensing and legal
> compliance ?
> 
> This is all copyright content that we merge & distribute under the
> same QEMU licensing terms, and we have the same legal obligations
> whether it is "source code" or "documentation" or other content
> that is not traditional "source code" (images for example).
> 
> 
>> As a non-native English speaker being often stuck trying to describe
>> function APIs, I'm very tempted to use a LLM to review my sentences
>> and make them better understandable.
> 
> I can understand that desire, and it is an admittedly tricky situation
> and tradeoff for which I don't have a great answer.
> 
> As a starting point we (as reviewers/maintainers) must be broadly
> very tolerant & accepting of content that is not perfect English,
> because we know many (probably even the majority of) contributors
> won't have English as their first language.
> 
> As a reviewer I don't mind imperfect language in submissions. Even
> if language is not perfect it is at least a direct expression of
> the author's understanding and thus we can have a level of trust
> in the docs based on our community experience with the contributor.
> 
> If docs have been altered in any significant manner by an LLM,
> even if they are linguistically improved, IMHO, knowing that use
> of LLM would reduce my personal trust in the technically accuracy
> of the contribution.
> 
> This is straying into the debate around the accuracy of LLMs though,
> which is interesting, but tangential from the purpose of this policy
> which aims to focus on the code provenance / legal side.
> 
> 
> 
> So, back on track, a important point is that this policy (& the
> legal concerns/risks it attempts to address) are implicitly
> around contributions that can be considered copyrightable.
> 
> Some so called "trivial" work can be so simplistic as to not meet
> the threshold for copyright protection, and it is thus easy for the
> DCO requirements to be satisfied.
> 
> 
> As a person, when you write the API documentation from scratch,
> your output would generally be considered to be copyrightable
> contribution by the author.
> 
> When a reviewer then suggests changes to your docs, most of the
> time those changes are so trivial, that the reviewer wouldn't be
> claiming copyright over the resulting work.
> 
> If the reviewer completely rewrites entire sentences in the
> docs though, though would be able to claim copyright over part
> of the resulting work.
> 
> 
> The tippping point between copyrightable/non-copyrightable is
> hard to define in a policy. It is inherantly fuzzy, and somewhat
> of a "you'll know it when you see it" or "lets debate it in court"
> situation...
> 
> 
> So back to LLMs.
> 
> 
> If you ask the LLM (or an agent using an LLM) to entirely write
> the API docs from scratch, I think that should be expected to
> fall under this proposed contribution policy in general.
> 
> 
> If you write the API docs yourself and ask the LLM to review and
> suggest improvements, that MAY or MAY NOT fall under this policy.
> 
> If the LLM suggested tweaks were minor enough to be considered
> not to meet the threshold to be copyrightable it would be fine,
> this is little different to a human reviewer suggesting tweaks.

Good.

> If the LLM suggested large scale rewriting that would be harder
> to draw the line, but would tend towards falling under this
> contribution policy.
> 
> So it depends on the scope of what the LLM suggested as a change
> to your docs.
> 
> IOW, LLM-as-sparkling-auto-correct is probably OK, but
> LLM-as-book-editor / LLM-as-ghost-writer is probably NOT OK

OK.

> This is a scenario where the QEMU contributor has to use their
> personal judgement as to whether their use of LLM in a docs context
> is compliant with this policy, or not. I don't think we should try
> to describe this in the policy given how fuzzy the situation is.

Thank you very much for this detailed explanation!

> 
> NB, this copyrightable/non-copyrightable situation applies to source
> code too, not just docs.
> 
> With regards,
> Daniel