From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists1p.gnu.org (lists1p.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 259CACD6E57 for ; Wed, 3 Jun 2026 15:07:49 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists1p.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1wUnB7-0004jN-B8; Wed, 03 Jun 2026 11:06:45 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1wUnB4-0004il-1v for qemu-devel@nongnu.org; Wed, 03 Jun 2026 11:06:42 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1wUnB0-0002RL-4f for qemu-devel@nongnu.org; Wed, 03 Jun 2026 11:06:41 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1780499197; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Y93FARN0r2Hw3TkoY5pQVn7y6cvxQfwbVe7Kg7L+RdU=; b=CeGYWp3TTTNCChn5BNKfiEYWOlqcwkceeIKBbzFYfqB0nzrV4hlayKauySq3EmrmGbUSm6 YRSRYbyqG2S4xIih/DHSBCJZT5kBY9gde8ZYjFjB4AKyTGcjGgDAIHT+nZNgm/3YC40a9a OJG0S97NhhJbV3FaHQDzOGunZRs1O2I= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-360-Qh2ZPcnEMUiT-Ej_AXhkrA-1; Wed, 03 Jun 2026 11:06:34 -0400 X-MC-Unique: Qh2ZPcnEMUiT-Ej_AXhkrA-1 X-Mimecast-MFC-AGG-ID: Qh2ZPcnEMUiT-Ej_AXhkrA_1780499193 Received: by mail-wm1-f70.google.com with SMTP id 5b1f17b1804b1-490b2f22ea2so19398635e9.1 for ; Wed, 03 Jun 2026 08:06:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=google; t=1780499193; x=1781103993; darn=nongnu.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=Y93FARN0r2Hw3TkoY5pQVn7y6cvxQfwbVe7Kg7L+RdU=; b=HxnV99l/J9TtGru7aYhdheTARzpOjms7UeqeMxIk97To9i4fffem+nv5RISluBdVl1 KJflQVyLdJVFu/uHpjnsQA7p7dz2On2R20D2s3fZyZpPheW9qfWJjnZpC6wcVdf8SWk7 DWuzcctthpIsl09J4zeCLyFfHuEZeP/ch0mxsyk+PKwTmwRPRr2WHqBPVvvBghFEi3pC iemuDgHW2DY1Mg2mlltbKPNbJX9oTaNGorXq+O3ltORbRRBNgbyWd1OjJlMjDzHbnVfI GyfyfKpRRo1GoXDX+imAcO7I8sbqq3qXfFbUMYs7GHaLoBZbm6ccmHp4Xu2RJYoDOAEB 6rQw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1780499193; x=1781103993; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Y93FARN0r2Hw3TkoY5pQVn7y6cvxQfwbVe7Kg7L+RdU=; b=tLKf4euFt1ZEiP6FEqmr/7dowfkOBwdEi5ZK04Ifu/5BucrWNMXV+gKlpJAtPgQhpD 8bE0johiX4KZWNAqIvDXzp46HkgwDKPvPUKNcqEwdu429YuOh+28tALSSUFm3vT/x/kU ABOTKE62PVyFPpitpx2tztssjLxMcUwSMAr83fGuyCHbTFKGrYwV9yFUc0HjKZRB/yOg LTk3FzjfPcEucx7+sRhBsm3gncDtWXBabYy1BgXXEQgHeMdYaqOOf8HskLjB5Tt0y1BH Ij8LWvXZXWpttpsOlFNb/zwsNT07rfj7WbznEmmS+YKE/XPwG+Mh5jlOYaPiU8WTtrdX nhqg== X-Forwarded-Encrypted: i=1; AFNElJ+dG7oZhGk2q8zNeVngnw5G8zG1sdUf0vznkCaH9A7/a0HyeT4/XHNDdKloSdrBimDy73+FIFjBv3Eb@nongnu.org X-Gm-Message-State: AOJu0YxIft54KWpYwvv8QnahHKz0OqyjQUfYrkgwLocZZCYYvI6rA7s4 UrWf0qc++1YW4L0hSZ22fQPgeuTDlxOyBITuMM0MrYBHoshKLG1JeTHJAYEkBuyJIs/daYqeLf5 RvN6X44r9Od0xLBs9fGVLHWITTV7zp3dalVWPSqZtYXlFKNb+q4jZFR6i X-Gm-Gg: Acq92OFYImhGKWh+hrtUZP4bGtZd1g35ISq9o49+bOtTvUM/XV1MVdnEN8YF6EjFSRe mamoe/u57y0y87YIQKNNiqGVxsPyirNjmPrKcbUX/xBR2doMFh1p59V84dmlT9qit5jyAfTz3nV BgJNw+GsqgZxdmv9B9aQcDAbjLoONBJiBMesQ55V4lkN7N/2D2p87OilT5gJiXbo6t852rdqcJU PvQQxGAS+fVQxxuV4tNy86IXpVBM74CPqYQTvWilfgHMEURNVNHxMjz1iLuEV9KIq0JxwMmPKmj v3jO42ndSqxbJ3MvBJmKFaULkyJyC1PxMteyElkZhQtjefBgQyzFuHhFy4vcHI46tOxRnXGmj7z FwmvB3mla1+E/9qGfu3PPU4GygutE9YVCFqMFzoSOe+50qLFgaaczkA== X-Received: by 2002:a05:600c:4e87:b0:490:b432:6f1e with SMTP id 5b1f17b1804b1-490b60e4026mr61577525e9.33.1780499192186; Wed, 03 Jun 2026 08:06:32 -0700 (PDT) X-Received: by 2002:a05:600c:4e87:b0:490:b432:6f1e with SMTP id 5b1f17b1804b1-490b60e4026mr61576555e9.33.1780499191350; Wed, 03 Jun 2026 08:06:31 -0700 (PDT) Received: from redhat.com (IGLD-80-230-25-45.inter.net.il. [80.230.25.45]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-490b0e88fdesm156809255e9.14.2026.06.03.08.06.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 03 Jun 2026 08:06:30 -0700 (PDT) Date: Wed, 3 Jun 2026 11:06:27 -0400 From: "Michael S. Tsirkin" To: Daniel =?iso-8859-1?Q?P=2E_Berrang=E9?= Cc: Paolo Bonzini , qemu-devel@nongnu.org, Alex =?iso-8859-1?Q?Benn=E9e?= , Alistair Francis , BALATON Zoltan , Fabiano Rosas , Kevin Wolf , Peter Maydell , Warner Losh , Philippe =?iso-8859-1?Q?Mathieu-Daud=E9?= , Paolo Bonzini Subject: Re: [PATCH v2] docs/devel: relax policy on AI-generated contributions Message-ID: <20260603110555-mutt-send-email-mst@kernel.org> References: <20260529094619.1034458-1-pbonzini@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Received-SPF: pass client-ip=170.10.133.124; envelope-from=mst@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -24 X-Spam_score: -2.5 X-Spam_bar: -- X-Spam_report: (-2.5 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.445, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: qemu development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org On Wed, Jun 03, 2026 at 03:59:35PM +0100, Daniel P. Berrangé wrote: > On Fri, May 29, 2026 at 11:46:19AM +0200, Paolo Bonzini wrote: > > The concern that motivated the policy is unchanged, and it is worth stating > > precisely: the DCO is about whether the submitter has the legal right to > > contribute the code, not about "creative expression". While the status of > > LLM output seems to be converging towards non-copyrightability, questions > > around unintentional reproduction of copyrighted code are still open. > > What has shifted is the balance of risk: > > > > - projects accepting AI-assisted content have not run into serious > > legal trouble so far, which suggests the probability of the risk > > materializing is not high; > > "so far" is doing alot of heavy lifting here & generally I think this > rather over-estimates the speed at which legal issues might arises. > Copyright infringement is a "slow burn" where the risk accumulates > over time and issues, if discovered, may not be litigated immediately. > > That is NOT to say the risk is high. The risk may well still be > low. I'm just saying that there's not been sufficient time to use > "lack of lawsuits" as a rationalization IMHO. > > > - other organizations, such as Red Hat[1], have assessed the risk as > > acceptable -- though a community of individual developers does not > > have the legal backing of a company, and even an unfounded dispute > > would be a long-lasting distraction from work on QEMU. > > > > Nevertheless, even Red Hat mentions that "the possibility of occasional > > replication cannot be ignored". In QEMU's view, attentiveness and > > oversight are not a practical way to address this; yet as a copyleft > > project, copyright and code provenance are of utmost importance to us. > > > > Therefore, it remains prudent to only permit AI assistance where the > > ramifications of copyright violations are at least easy to revert and > > unlikely to spread: tests, documentation, mechanical changes, and small > > bug fixes. Core code that other things depend on, and that cannot > > simply be thrown away once a problem is noticed long after the fact, > > stays off-limits without prior agreement from a maintainer. > > The interaction of "small bug fixes" and "core code" doesn't > fit well IMHO. A "bug fix" describes an action, but the code > that is changed is usually a "feature" and will often be a > "core" part of something in QEMU. > > IIUC, by "small bug fixes", what you're actually trying to > express is an acceptance of code that is either > > * unlikely to meet the threshold for copyrightability > * small enough that the consequences of throwing it > away is negligible. > * possibly other aspects ? tightly coupled to specific state of qemu code and so original. > > > > diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst > > index 65b8f232a08..857588c43ba 100644 > > --- a/docs/devel/code-provenance.rst > > +++ b/docs/devel/code-provenance.rst > > @@ -1,7 +1,7 @@ > > .. _code-provenance: > > > > -Code provenance > > -=============== > > +Code provenance and AI usage > > +============================ > > In retrospect, I wonder if we shouldn't have have "ai-usage.rst" as > a separate doc from the start. While we can hyperlink to sub-titles > via anchors, it would be simpler if we could just point to a doc and > not require scrolling past pages of non-AI text. > > > @@ -288,62 +288,108 @@ content generators below. > > Use of AI-generated content > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > +Risks to open source projects include maintainer burnout from an > > +increased number of contributions, as well as the risk to the project > > +from unintentional inclusion of copyrighted material in the LLM's output. > > +In order to mitigate these risks, the QEMU project currently allows > > +using AI/LLM tools to produce patches in a limited set of scenarios: > > If we're opening the door to AI assisted contribution, then IMHO we > need to write about both the social and technical expectations. > Admittedly that will expand the scope of your proposal here, but > IMHO that's somewhat unavoidable. A significant part of the downsides > of AI-assisted contributions comes from bad social practices, rather > than merely bad technical practices. > > As a general theme, I would like us to emphasize at the start that the > act of collaboration & contribution in QEMU is about the interaction, > trust and relationships between humans, not bots. > > > If someone wants to use tools (LLM based or not) that's a choice, > but the accountability for actions needs to fall on a real human > and there needs to be transparency whenever automation is used. > > This starts from the commit message. A good commit message (and even > more so a good cover letter) describes the intent / thinking behind > the changes. An LLM doesn't think or have intent in its actions, > ergo a human should be driving the authorship of commit mesages / > cover letters, where a non-trivial explanation is needed. > > As reviewers, if we make use of LLM backed tools to respond, then > we need to be transparent about any feedback that came from a bot > rather than from a human. > > As contributors, if a reviewer gives feedback, the contributors > response should be their own rather than just feeding the email > review into a LLM and cut+pasting the LLMs answer back to the > list. > > The identity use to contribute to QEMU should reflect the human's > identify. As previously clarified, this doesn't need to be a real > name, but we don't want LLM agents being given a psuedonym to > pretend to be a human. > > > +**Mechanical changes** > > + If you can use a deterministic tool, it is preferred that you use it > > + and not replace it with AI. If you don't know how to do the change > > + deterministically, you can ask the AI for help. > > > +**Small bug fixes** > > + These should be limited to 20 lines of code or less, not including > > + tests. You are still expected to :ref:`understand and explain your changes > > + ` and the rationale behind them. > > I think the "20 lines or less" is not going a good job at expressing > the intent behind this point. I'd like us to emphasize between the > "why" of this point, as that helps contributors & reviewers make a > decision of whether a change is "within the spirit" or the rule of > not. > > > > > +**Documentation and code comments** > > + While AI can help draft text, it still requires significant human > > + oversight. Pay attention to the organization and flow of the generated > > + text, and strictly fact-check all technical details as LLMs are prone > > + to being confidently wrong. > > Docs is an area I'm more wary of from the social expectation side rather > than the technical or legal side. I don't feeel like "pay attention to > the organization and flow" really mitigates to the tendancy to production > of vast reams of convincing sounding slop. There's has always been a > problem with docs of well intentioned contributors trying to write about > stuff they don't really understand well enough. IOW they don't necccessarily > have the knowledge to fact check details either. As a maintainer, I've found > that reviewing docs and asking for rewrites can be even more of a burden than > code. IOW, encouraging use of AI for docs, in non-expert hands, has a strong > potential for expanding the burden on maintainers. > > I'd be more comfortable with AI tools for inline API docs, rather than > AI tools for prose under docs/. > > Not sure how to better word this point though ? > > > +**Tests** > > + Note that you must still confirm that each test actually exercises > > + the intended behavior including, for regression tests, that it > > + fails without the code under test and passes for the right reason. > > > > > +If you wish to send large amounts of AI-generated changes, or any other > > +contribution not in the above categories, please get in touch with the > > +maintainer beforehand. These can be treated as experiments, at the > > +discretion of the maintainer and the community, with no obligation > > +to accept them. > > IMHO it should not be at the discretion of individual maintainers to > accept large-scale AI authored changes outside these guidelines. To > quote the commit message rationale > > "Therefore, it remains prudent to only permit AI assistance where > the ramifications of copyright violations are at least easy to > revert and unlikely to spread" > > that does not suggest we should leave it to the discretion of maintainers > to override the guidelines. > > > +**Use of AI does not remove the need for authors to comply with all > > +other requirements for contribution.** In particular, the > > +``Signed-off-by`` label in a patch submission is a statement that > > +the author takes responsibility for the entire contents of the patch, > > +certifying that their patch submission is made in accordance with the > > +rules of the `Developer's Certificate of Origin (DCO) `. > > > This needs to be stronger language IMHO. The kernel has a more > explicit statement explicitly forbidding agents from adding > Signed-off-by on behalf of the human: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/coding-assistants.rst?id=4bf85afb9f3ecd7c3b5d15a85b0902f8e725cd06#n27 > > "Signed-off-by and Developer Certificate of Origin > ================================================= > > AI agents MUST NOT add Signed-off-by tags. Only humans can legally > certify the Developer Certificate of Origin (DCO). The human submitter > is responsible for: > > * Reviewing all AI-generated code > * Ensuring compliance with licensing requirements > * Adding their own Signed-off-by tag to certify the DCO > * Taking full responsibility for the contribution" > > > I think we should be similarly explicit that a human must take > the action of adding S-o-b - it is not a rubber stamp to be > automated by the AI. > > This should be emphasized in the earlier part of the doc before > the AI usage section where we described S-o-b usage. > > > > +Commit messages for AI-assisted changes > > +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > > +When AI/LLM tools produce or substantively shape your patch, add an > > "shape your patch" -> "shape the content of the submitted patch" > > as this better excludes the "background" usage mentioned below. > > > +``AI-used-for:`` line before ``Signed-off-by``, as a reminder of your > > +DCO obligations and a guide to reviewers. The text is one or more of > > +``code``, ``tests``, ``docs``, ``research``, possibly followed by an > > +explanation in parentheses: > > > > +.. code-block:: none > > + > > + AI-used-for: tests, docs > > + AI-used-for: code > > + AI-used-for: code (refactoring) > > + AI-used-for: code (prototype) > > + AI-used-for: research > > + > > +``AI-used-for`` should not be included for "background" usage such as > > +autocomplete or obtaining a pre-review of the patch. > > This is an interestng idea that I like much more than Assisted-by, > because it gives more directly useful info to the reviewer, without > turning into free advertizing for commercial vendors. > > > +There is no requirement to include your prompts or summarize the > > +conversation in the commit message or cover letter, but you may do so > > +if you think it helps a reviewer judge the result. For example: > > IMHO we should actively discourage the inclusion of prompts > entirely as it is the wrong information to provide. > > > + > > +**Helpful prompts** > > + These describe concrete constraints or instructions, making it easy for a > > + reviewer to see how the tool's output was guided: > > + > > + * "move field ``foo`` from ``struct aa`` to ``struct bb``. If a > > + function already has a local variable or parameter of type ``struct > > + bb``, use it instead of accessing ``aa.bb``" > > + > > + * "add an implementation of the trait for ``Mutex``; it > > + takes the lock around the calls and forwards to ``T``" > > These examples prompts are just expressing an aspect that should > already have been described in prose in the commit message. We > don't need to classify them as "ai prompts" in a a commit message, > we just need the author to write a useful commit message. > > > +**Unhelpful prompts** > > + These are too generic to provide meaningful context. You can of course > > + use them in the context of a complex interaction with the LLM, but they > > + should not be included in the commit message: > > + > > + * "write user-facing documentation for the new tool" > > + > > + * "write testcases for the new functions" > > Again this is just an illustration of an unhelpful commit message. > Those would be eqally useless in an entirely human authored patch. > Just emphasize the writing of useful commit messages. > > > > +QEMU does *not* use ``Assisted-by``, ``Co-authored-by`` or ``Generated-by`` > > +trailers to indicate AI usage. In particular, it is not necessary to > > +specify the exact AI model or tool used to create the commit. > > "does not use" doesn't imply "forbidden". > > IIUC, tools are liable to add these tags without the contributor > even asking for them. If we don't want to be providing free > advertizing IMHO we should explicitly forbid use of these tags > and validate this in checkpatch.pl > > Also any rules in this respect should be documented earlier in > this file where we outline what tags we use in commit messages, > either instead of, or in addition to, mentioning them under the > AI usage guidelines. > > With regards, > Daniel > -- > |: https://berrange.com ~~ https://hachyderm.io/@berrange :| > |: https://libvirt.org ~~ https://entangle-photo.org :| > |: https://pixelfed.art/berrange ~~ https://fstop138.berrange.com :|