From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists1p.gnu.org (lists1p.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id AD18CCD4F54 for ; Wed, 27 May 2026 10:44:09 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists1p.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1wSBjw-0007mi-Sp; Wed, 27 May 2026 06:43:57 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1wSBjr-0007jh-UW for qemu-devel@nongnu.org; Wed, 27 May 2026 06:43:52 -0400 Received: from mail-wm1-x32c.google.com ([2a00:1450:4864:20::32c]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1wSBjg-0005gU-Gw for qemu-devel@nongnu.org; Wed, 27 May 2026 06:43:44 -0400 Received: by mail-wm1-x32c.google.com with SMTP id 5b1f17b1804b1-4896c22fcbaso98843895e9.0 for ; Wed, 27 May 2026 03:43:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1779878618; x=1780483418; darn=nongnu.org; h=content-transfer-encoding:mime-version:message-id:date:user-agent :references:in-reply-to:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=FoN7XqTOPy9x0TS1tqLwqrv0ll8w3xbiqy6e58MeGLk=; b=gxt7UGjNhPibLhF3GyCSBJ9jHYxX48EehjjBAEIPW5TTyn/l6f40zO0GVO/nNzA15c 8V9Ly2THktP7hv6FeDQ/kCx3r2t0GHNvFaROHrlCYfPolljgy3hrbcyVQQO8CQw2+NjP bTKdPobpfyXpNyUMuXN9LRchLvyVLbh9bB3LGF7zo++G8P6RZCxbB8rX6ZjjzPF/WCFb gkhAL9ULDEh594beVhpKX5irjDWoREvHenTK8/Q5vXw+ZStsYkP3Do0qtVeJzMAWMIWA Wjvl2y71/3LYaQ7w11cGKsJn2IUQ3xuFXLdUcVuRXI5FtGXenJMqj1LF4nhLP4XRkrZO Dziw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779878618; x=1780483418; h=content-transfer-encoding:mime-version:message-id:date:user-agent :references:in-reply-to:subject:cc:to:from:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=FoN7XqTOPy9x0TS1tqLwqrv0ll8w3xbiqy6e58MeGLk=; b=YfrjJ8YJ5cm6Qv5lfidCuou/vrmIb+Nf02mPkU7Umav0uB0ONW4O/b/TJH824NvAda JDmPwFgMPBeHQUdTwgdAp1YGX4OQSmqM0+QaSXzwtmPR7oyvLnoBUQuk6qqH83I9vg0I KYuziQwWq+uKjmuQW5I597niD+pxqb9sVHynfa2AwuOob+/cJ8Iq/YvfaCkVaux1PXUt gD7DsWR5vBrGk197rA2iJQTv3Qj7PrMbsUB3/rHXuC/A1YbdcpMDstCiqd9f0r5m6QmH IylKVHKmFr5qGB4/3M3ccSuh0Q2IYoGADPZTcXobaIAD+BJhvgpW4cuqJ+rYvsxguzof 7aPA== X-Forwarded-Encrypted: i=1; AFNElJ9zXuieWi5hC/uU4wmq93K81NJSbIKIJsnaksbOkY8Dw6cZor/Ens0xJ9RbSdb4Ohaxn2gz+PwGJ+k4@nongnu.org X-Gm-Message-State: AOJu0YxgUmATBeohHNz7E7wqVhp1mAl9TYJ7TXR+gQprFF9uCEeWECgE Cj6EYOfjhmS9q5GyYFmONl9DHa1K3svZitb3AxTTq3E4TvUQPjj7CBYaKRwFzCQSFcpdHhH2neT 7Ezqv X-Gm-Gg: Acq92OH4NCQ8xxSursXxQ/gPMfvv7tXvX/HB430LIXhEUNG5jNtGuJ1z285SW7hLl/b L6GOmvX7Xvga9p8bD51fgAN3ljzh77SrRoRLbTJaCxJx26H/typiIQsuNMp2gPAW/XS+AOUB4Xo sF4PiDDetYQtRY2nocm5BAUJRH14s7X3lI1r+YTNCuCUhi9jvxZgDnG/ZHFjHLxrYDn6eNnnX3G wzz0WMUSGrWKzuvxq6GlHGFq/yIyA6lCFinSPHFqACZhpgn5vipQRD4M3ZAFhsveDZTfi7oFMJG JwsAAHyELAIz7hc79I+CPRdwq4pBEFJRXS2I+lqV3FLsnthLaPyIh4kKresHq6en0o9budEJPXt 9DqQgNgvrCn10/Oq29JuWQmrALTsqMJY4Bjz4gU8KW4UvNZHcmkR57wD1N2h3J8W8GDbiuf4UXw AP2RrbBmUFOXGjdBFe1lU8NKk= X-Received: by 2002:a05:600c:4fcd:b0:490:81cb:d67 with SMTP id 5b1f17b1804b1-49081cb0d90mr37781705e9.32.1779878617560; Wed, 27 May 2026 03:43:37 -0700 (PDT) Received: from draig.lan ([185.124.0.195]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-4907e69dc39sm15733735e9.4.2026.05.27.03.43.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 May 2026 03:43:36 -0700 (PDT) Received: from draig (localhost [IPv6:::1]) by draig.lan (Postfix) with ESMTP id C80625F7F0; Wed, 27 May 2026 11:43:35 +0100 (BST) From: =?utf-8?Q?Alex_Benn=C3=A9e?= To: Paolo Bonzini Cc: Kevin Wolf , Warner Losh , "Michael S. Tsirkin" , qemu-devel@nongnu.org, stefanha@redhat.com Subject: Re: on ai generated and code provenance In-Reply-To: (Paolo Bonzini's message of "Wed, 27 May 2026 12:01:10 +0200") References: <20260524083329-mutt-send-email-mst@kernel.org> <20260526140231-mutt-send-email-mst@kernel.org> <20260526152526-mutt-send-email-mst@kernel.org> User-Agent: mu4e 1.14.1; emacs 30.1 Date: Wed, 27 May 2026 11:43:35 +0100 Message-ID: <87se7dxhd4.fsf@draig.linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=2a00:1450:4864:20::32c; envelope-from=alex.bennee@linaro.org; helo=mail-wm1-x32c.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: qemu development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Paolo Bonzini writes: > On 5/27/26 10:41, Kevin Wolf wrote: >> Am 26.05.2026 um 21:52 hat Warner Losh geschrieben: >>> The QEMU Project currently may accept limited uses of AI that produce >>> high quality patches that are limited in the creative content added. >>> While maintainers will ultimately decide, changes like the following >>> fall within this policy >>> 1. Fixing obvious warnings in the obvious ways suggested by the tool >>> 2. Tree wide API changes, and other similar mechanical changes done >>> today with perl/python/sed/coccinelle >> As I said in the paragraph you quoted below, I don't think we should >> encourage using AI for tasks that a deterministic tool could do. > > In some cases such a tool does not exist. Much to my surprise, there > is no tool to do static type inference on Python code, but AI is very > good at doing it. > >> Letting AI perform the change directly instead may be an acceptable >> shortcut for a one-man hobby project that nobody else will ever look at, >> but in the context of a community project like QEMU in which your >> changes have to be reviewed and understood by others, it matters a lot >> that the output of the tool is reproducible. Otherwise, you're creating >> unnecessary work for others, and that isn't acceptable. > > When applicable, going through coccinelle (with the aid of AI if > needed! is indeed a good middle ground as it helps reviewers for large > changes. If you have many slightly different but easily separated > changes (e.g. you can split the patch by struct field), it may make > things worse. > > Its also worth noting that in other cases even sed or coccinelle, > while deterministic, cannot produce 100% of the patch. > >> So maybe we should even explicitly mention a recommendation like the >> following: >> If you can use a deterministic tool, don't use AI instead. If >> you >> don't know how to use the deterministic tool, use the AI to tell you >> how to use it instead of trying to replace it. > > I like it. > >>> 3. Limited, small changes to fix bugs or add a small new feature whose >>> scope is less than about 100 lines and the originator can explain >>> them all or the meta issues about the patch. >> Not sure if mentioning a number of lines is wise. 100 lines can be >> mostly boilerplate and simple sequential code or they can be a deeply >> nested complex algorithm. > > I'd put the threshold at 20-50 at most. > >> I think I would see more use in a tag like (better name welcome): >> AI-used-for: [code|tests|docs|commit message]... > > I like this *a lot*. No need for free advertisement, but some > traceability is useful. > > For tools such as sed or coccinelle, having the exact script in the > patch or commit message useful. Plus, the execution of the script > more or lesss delimits the commit by itself (or 90%+ of it). For LLMs > it's a bit less clear cut because separating docs makes little sense. > And the exact model is pointless, it will be obsolete in 6 months and > provide no useful information. > > So, something like: > > ------------------- 8< ------------------- > Use of AI-generated content > ~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > The QEMU project currently allows using AI/LLM tools to produce > patches in scenarios with limited creative content: > > Mechanical changes > If you can use a deterministic tool or a script, don't use AI instead. > If you don't know how to do the change deterministically, you may > ask the AI for help, rather than having it stand in for the tools. I like the idea of pointing people towards tools but I wouldn't be quite so prescriptive. The series MST referred to was easily eyeball-able and I suspect the extra steps would generate friction for contributions. That said the wider the change to the code base the more likely a random hallucination can get lost in the noise. Maybe: Mechanical changes Using AI tools to make simple mechanical changes is allowed. For larger tree-wide changes it is strongly recommended to use a deterministic tool like `sed` or `coccinelle`. You can use AI to help you craft the invocation for you. ? > Small bug fixes > These should be limited to 20 lines of code or less, not including > tests. You are still expected to understand and explain your changes > and the rationale behind them. > > These boundaries do not apply to other uses of AI, such as researching > APIs or algorithms, static analysis, or debugging, provided their output > is not included in contributions. Larger uses of AI are allowed as an > experiment, but they should be agreed upon with the maintainer prior > to submission. > > Use of AI does not remove the need for authors to comply with all other > requirements for contribution. In particular, the "Signed-off-by" > label in a patch submission is a statement that the author takes > responsibility for the entire contents of the patch, certifying that > their patch submission is made in accordance with the rules of the > `Developer's Certificate of Origin (DCO) `. > > Commit messages for AI-assisted changes > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > When AI/LLM tools produce or substantively shape your patch, add an > ``AI-used-for:`` trailer. The text of the trailer could be one or > more of ``code``, ``tests``, ``docs``, ``research``, possibly followed > by an explanation in parentheses:: > > AI-used-for: tests, docs > AI-used-for: code > AI-used-for: code (refactoring) > AI-used-for: code (prototype) > AI-used-for: research > > The trailer is intended as a clarification of your DCO obligations as > well as to guide reviewers. It is not intended for minimal presence > such as autocomplete or asking for a pre-review of the patch, and it > does not remove your responsibility to understand the changes that you > are submitting. > > Include the prompt in the commit message if it helps a reviewer judge > the result: > > * yes: "move field ``foo`` from ``struct aa`` to ``struct bb``. If a > function already has a local variable or parameter of type ``struct > bb``, use it instead of accessing ``aa.bb``." > > * yes: "add an implementation of the trait for ``Mutex``, > forwarding the member functions to ``T`` while taking the lock > around the calls". > > * no: "write user-facing documentation for the new tool" > > * no: "write testcases for the new functions" > > Deterministic tooling (sed, coccinelle, formatters) is out of scope > for the trailer, but should be mentioned in the commit message. --=20 Alex Benn=C3=A9e Virtualisation Tech Lead @ Linaro