From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 22ED7CAC5B9 for ; Mon, 29 Sep 2025 18:30:05 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1v3IcL-0003Q4-Vf; Mon, 29 Sep 2025 14:28:58 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1v3IcH-0003Pn-IP for qemu-devel@nongnu.org; Mon, 29 Sep 2025 14:28:54 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1v3IcA-0001CB-Bm for qemu-devel@nongnu.org; Mon, 29 Sep 2025 14:28:51 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1759170521; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:in-reply-to:in-reply-to: references:references; bh=R7naFlgNm5V9aYQfbw7FhGYN9Ym4gtd5a42eYqzm/30=; b=K83vohc1XTcHF1Unbl6+lMI03iC/fJkw22kNldlkEcuiWeaUvvgwfob9sUaYc8O55YmaIF LH5oXx52o6qaMolL5nMTiehqVXuwD1QGBzllxNy8izP56HpJSicgFLeNP+h4PhpyHIWRPs ze4pDdR8gNgvxK+zTLxqw/GlBzoAJQM= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-683-52Vi_Bp2NbiUFfAf5Psr3Q-1; Mon, 29 Sep 2025 14:28:38 -0400 X-MC-Unique: 52Vi_Bp2NbiUFfAf5Psr3Q-1 X-Mimecast-MFC-AGG-ID: 52Vi_Bp2NbiUFfAf5Psr3Q_1759170516 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 9FF081800451; Mon, 29 Sep 2025 18:28:36 +0000 (UTC) Received: from redhat.com (unknown [10.42.28.51]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 227C1195608E; Mon, 29 Sep 2025 18:28:33 +0000 (UTC) Date: Mon, 29 Sep 2025 19:28:30 +0100 From: Daniel =?utf-8?B?UC4gQmVycmFuZ8Op?= To: Paolo Bonzini Cc: qemu-devel@nongnu.org, peter.maydell@linaro.org, stefanha@redhat.com, alex.bennee@linaro.org Subject: Re: [PATCH] docs/code-provenance: add an exception for non-creative AI changes Message-ID: References: <20250925075630.352720-1-pbonzini@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20250925075630.352720-1-pbonzini@redhat.com> User-Agent: Mutt/2.2.14 (2025-02-20) X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 Received-SPF: pass client-ip=170.10.129.124; envelope-from=berrange@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -25 X-Spam_score: -2.6 X-Spam_bar: -- X-Spam_report: (-2.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.513, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Daniel =?utf-8?B?UC4gQmVycmFuZ8Op?= Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org On Thu, Sep 25, 2025 at 09:56:30AM +0200, Paolo Bonzini wrote: > AI tools can be used as a natural language refactoring engine for simple > tasks such as modifying all callers of a given function or all accesses > to a variable. These tasks are interesting for an exception because: > > * it is credible for a contributor to claim DCO compliance. If the > contributor can reasonably make the same change with different tools or > with just an editor, which tool is used (including an LLM) should have > no bearing on compliance. This also applies to less simple tasks such > as adding Python type annotations. When I read refactoring, I consider * No functional change * The change is a logical transformation of text The creative act is defining the rule for what the transformation will do. The resulting patch is a derivative work of the existing code, along with the rules for the transformation. In traditional cases, the rules are an awk prompt, or a semantic patch (coccinelle), but in this case, the rules are the AI agent prompt The AI training material will guide the tool on how to interpret your natural language rules, but overall it looks implausible to claim the resulting patch would be a derivative work of any training material. The only exception seems to be if some QEMU fork had made exactly the same refactoring already, which seems unlikely in general, when we disregard malicious contributors. Type annotations gets more complex to rationalize, because that is about writing net new functional code. As an analogy though, if there was a clean room dev team, whom were fed the commit message, their output would likely be identical. The defining characteristic is the lack of plausible (&correct) alternative approaches. All tools, whether AI or not, should converge on the same result. It again looks implausible to suggest the resulting patch could be a derivative work of any particular agent training material, unless some QEMU fork had made exactly the same functional change already. This feels unlikely. TL:DR: even in today's world where the laundering of training material licenses, vs AI output licenses is murky, there is likely sufficient confidence in these particular scenarios to claim the training material imposes negligible risk to QEMU. > * they are relatively easy to test and review, and can provide noticeable > time savings; > > * this kind of change is easily separated from more complex non-AI-generated > ones, which we encourage people to do anyway. It is therefore natural > to highlight them as AI-generated. > > Make an exception for patches that have "limited creative content" - that > is, mechanical transformations where the creativity lies in deciding what > to change rather than in how to implement the change. > > Signed-off-by: Paolo Bonzini > --- > docs/devel/code-provenance.rst | 27 +++++++++++++++++++++++---- > 1 file changed, 23 insertions(+), 4 deletions(-) > > diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst > index 8cdc56f6649..d6e86636964 100644 > --- a/docs/devel/code-provenance.rst > +++ b/docs/devel/code-provenance.rst > @@ -290,9 +290,11 @@ Use of AI-generated content > > TL;DR: > > - **Current QEMU project policy is to DECLINE any contributions which are > + **The general QEMU project policy is to DECLINE any contributions which are > believed to include or derive from AI generated content. This includes > - ChatGPT, Claude, Copilot, Llama and similar tools.** > + ChatGPT, Claude, Copilot, Llama and similar tools.** The following exceptions > + are acceptable: > + * **Limited creative content** (e.g., mechanical transformations) Ought to link to the detailed description further down. > > **This policy does not apply to other uses of AI, such as researching APIs > or algorithms, static analysis, or debugging, provided their output is not > @@ -323,8 +325,9 @@ content generators commonly available today is unclear. The QEMU project is > not willing or able to accept the legal risks of non-compliance. > > The QEMU project thus requires that contributors refrain from using AI content > -generators on patches intended to be submitted to the project, and will > -decline any contribution if use of AI is either known or suspected. > +generators on patches intended to be submitted to the project, with exceptions > +outlined below. If use of AI is known or suspected to go beyond the exceptions, > +QEMU will decline a contribution. > > Examples of tools impacted by this policy includes GitHub's CoPilot, OpenAI's > ChatGPT, Anthropic's Claude, and Meta's Code Llama, and code/content > @@ -347,3 +350,19 @@ requirements for contribution. In particular, the "Signed-off-by" > label in a patch submission is a statement that the author takes > responsibility for the entire contents of the patch, including any parts > that were generated or assisted by AI tools or other tools. > + > +The following exceptions are currently in place: > + > +**Limited creative content** > + Mechanical transformations where there is reasonably only one way to > + implement the change. Any tool, as well as a manual change, would > + produce substantially the same modifications to the code. Examples > + include adjustments to data structures, mechanical API migrations, > + or applying non-functional changes uniformly across a codebase. I can rationalize in my mind what I would be willing to accept under this description, but at the same time this is (intentionally) fairly fuzzy and open to interpretation. > +It is highly encouraged to provide background information such as the > +prompts that were used, and to not mix AI- and human-written code in the > +same commit, as much as possible. It would be nice to require full separation, but if an AI gets you 95% of the way there, and the remaining 5% is just stupid typos/ mistakes, it is better for a human to just fix it immediately, than to continue re-prompting the agent to try again and hoping you win. > +Maintainers should ask for a second opinion and avoid applying the > +exception to their own patch submissions. Could it be simpler as: Sign off is mandatory from a maintainer whom is not the author. Or where you implying that we need SoB from two maintainers ? In theory non-author S-o-B is best practice already, but analyzing git shows we have about 10% of our commits with no independent S-o-B at all. Often such commits are precisely the NFC refactorings, or the simple mechanical additions. So perhaps going from zero-or-one independent SoBs, to at least one SoB is sufficiently strong ? With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|