From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists1p.gnu.org (lists1p.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 13AF4CD5BC8 for ; Tue, 26 May 2026 18:05:00 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists1p.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1wRw8b-0002VR-IZ; Tue, 26 May 2026 14:04:21 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists1p.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1wRw8R-0002VC-2T for qemu-devel@nongnu.org; Tue, 26 May 2026 14:04:12 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1wRw8O-0001di-6Y for qemu-devel@nongnu.org; Tue, 26 May 2026 14:04:10 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1779818646; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ErvKNckfGI9jf9zfwhwevSdQ6RXXH8OgfaSKTfVdB1U=; b=aNYkGDL7YMbeC1l0kTYOl9E776YE9zesIn3WecSk+b7lykw/1XFsoIeFj4emRWqeeG4iQ+ JhHx2Gu7ZcNAo0799i3e2iFciRNA4s2rbjrMaxIEr/rn0oMcyian5T8sxNleJYLrGSunH1 XbZ3xVCKkamAk36l1O4WSSuRB13EVSc= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-447-cd5lqDgoMBK7I8DuFf72Vg-1; Tue, 26 May 2026 14:04:04 -0400 X-MC-Unique: cd5lqDgoMBK7I8DuFf72Vg-1 X-Mimecast-MFC-AGG-ID: cd5lqDgoMBK7I8DuFf72Vg_1779818643 Received: by mail-wm1-f70.google.com with SMTP id 5b1f17b1804b1-49045f93baeso32179045e9.1 for ; Tue, 26 May 2026 11:04:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=google; t=1779818643; x=1780423443; darn=nongnu.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=ErvKNckfGI9jf9zfwhwevSdQ6RXXH8OgfaSKTfVdB1U=; b=aoK78Ht+LuhJw42HMKsx3C8+wjnwQnjcMvrDhvDfE0PdXrkZKWrOdYOHxRsGzCyoVo iscnBWCglr6daWV1TfJEQPqKjBSlM31jxva6h52NplHO/riaIncXW/V2syCUV5Vr9bft kK67G3qEbjHjKDQWyNVfVb+1BKXCP5nIQnZoUi43j67s6QMro2+rrqnkWq5C+ZNABZo/ 36R8rFEU5yyRSuTpCL6+W7s3CqJMkpPVAArvej5RcaEUD+HUieFlfF1AC0xt71MKywuD P5qcNaqKBeyBmDOUtM2OdEh0fzi7Om/ncuuy3yyFyPudhHcc4ilrBvdfNdDcQdDKzW5D XQ4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779818643; x=1780423443; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ErvKNckfGI9jf9zfwhwevSdQ6RXXH8OgfaSKTfVdB1U=; b=NUBkKp5ReWlRXYdGnMSvs/uJxW8Fu/NtlVfUVJKqxYSDeZBRvZEyqAWAWZm6cSMEzy XjTOXNAWEH4AgxKSSThTLt+S1rGp1hjSWmZ8zyAziEESW6CZ9MHc+BtBv3sOR6haS2LK qnAXxhXdjEU7SpjOQ5o1qLy9aKcaej1Tr0SshH6YbekbjNvzYC57941RNPCj26ubvPr+ ACz2aCTSys7LeC32XLSmuklSvHP0v5hDUvMbJW3h4BlzTmlksevskNQBQ4w9ucHE80jQ 9s/BAcTyM+rLDXa5vp6Hkq9YPZ8TMi0D8Dw8tm2k4Gv9ih09AbLT3L8ok9y15HAYN/6d GjNQ== X-Gm-Message-State: AOJu0YxYMVdF+gRuZrSuEPIc09VTXOdzBSV0gegZmb4JOnB4gSzZQpuQ phma+gDPKNDSaJK/GK3ZDRVo3HVwM4NRUdf0FLlBaod1OSYcLI46hCeH8bvt2wTreZg8Q6wXHFq ilsKQsJGXdzz1JkO2wEeRzPMmoI0nfiPMgH0ZRnKHWYGb6Z2qsBma1CNOOMWoVSOj X-Gm-Gg: Acq92OGm9S71LwPHF8VkNlJZBAVoHrDJ/3hgSJp5a0WRIoijgZWVboW/8kIyRlkCPFg ZoDel3XDUQnpoga4X6KalZBdfk6APIFen4h8Ho5tggfJVGzjkLeX9hXCpxpZTYSjIA2/weXcXCi 5aw3gUYV6i6PjBOKwmxlwQYBwlhugPwatHH+rAxdODEjkc1iSINmZoVYcSHzhzjNPqm8N79dXar Q9wZw6pP4Y4dSfRI9S/kQzg78axtjIlQKeivX6nIUqM7r0yy2ESzWXdmqc1CnyFyeySW73CUYEZ fQKMQhs/Ej0vDO6LZ7NWxpNyGQXnWvYP6+q39393rjaE4c646ZB5HYhIaHIr4q5XT9Db3OCKuG3 5oBLfpSnp5xZTAYyWK+obnJsffFShXJ58+BrAsiVc8+Q= X-Received: by 2002:a05:600c:5548:b0:48f:e26a:1744 with SMTP id 5b1f17b1804b1-490424b0854mr209214215e9.9.1779818643176; Tue, 26 May 2026 11:04:03 -0700 (PDT) X-Received: by 2002:a05:600c:5548:b0:48f:e26a:1744 with SMTP id 5b1f17b1804b1-490424b0854mr209213595e9.9.1779818642538; Tue, 26 May 2026 11:04:02 -0700 (PDT) Received: from redhat.com (IGLD-80-230-25-45.inter.net.il. [80.230.25.45]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-49045284855sm347966975e9.0.2026.05.26.11.04.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 26 May 2026 11:04:01 -0700 (PDT) Date: Tue, 26 May 2026 14:03:59 -0400 From: "Michael S. Tsirkin" To: Kevin Wolf Cc: qemu-devel@nongnu.org, stefanha@redhat.com Subject: Re: on ai generated and code provenance Message-ID: <20260526140231-mutt-send-email-mst@kernel.org> References: <20260524083329-mutt-send-email-mst@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Received-SPF: pass client-ip=170.10.129.124; envelope-from=mst@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -24 X-Spam_score: -2.5 X-Spam_bar: -- X-Spam_report: (-2.5 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.445, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: qemu development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org On Tue, May 26, 2026 at 07:43:35PM +0200, Kevin Wolf wrote: > Am 24.05.2026 um 14:42 hat Michael S. Tsirkin geschrieben: > > So, I had to reject a perfectly reasonable patch: > > https://lore.kernel.org/qemu-devel/20260320193746.242704-1-jinpu.wang@ionos.com/ > > just because of a tool used to make it. > > > > > > How contributors could comply with DCO terms (b) or (c) for the output of AI > > content generators commonly available today is unclear. The QEMU project is > > not willing or able to accept the legal risks of non-compliance. > > > > > > But, since this was written, Red Hat's Richard Fontana and Chris Wright > > published this piece: > > https://www.redhat.com/en/blog/ai-assisted-development-and-open-source-navigating-legal-issues > > > > > > Saying, in particular " > > We understand this concern, but the DCO has never > > been interpreted to require that every line of a contribution must be > > the personal creative expression of the contributor or another human > > developer. > > " > > I never found that blog post particularly convincing, especially because > they acknowledge a concern: > > There are two versions of this concern. The first is practical: that > an AI tool could covertly insert excerpts of proprietary (or > license-incompatible) code into an open source project, potentially > creating legal risk for maintainers and users. The second is broader > and more philosophical: that large language models, trained on vast > amounts of open source software, are essentially misappropriating > the community’s work, producing outputs stripped of the obligations > that open source licenses require. > > We think these concerns deserve to be taken seriously. > > The second one is essentially what I understood the QEMU policy to be > about. Unfortunately, the blog post then goes on to only ever deal with > the first one and ignore the second one that seems more relevant for us. > > So yes, the DCO isn't about "personal creative expression" or whatever > (and nobody suggested it is, this is a strawman), but it's about whether > the submitter has the legal rights to submit the code. And that's > exactly the question we decided we don't want to take a risk on. > > > So if that part isn't helpful, what has changed since we introduced the > AI policy? It's a few points: > > 1. While AI has been in use for a while now, we haven't seen projects > accepting AI generated code/content get into big trouble. While it > could still happen in the future, it might be an indication that the > probability of the risk hitting us is not that high. > > 2. The useful part of the blog post is that it tells us that Red Hat > considers the risk acceptable. This can inform our assessment of the > risks, though of course there might be a significant difference in > the impact of the risk for a company with a legal department and an > open source community consisting mainly of developers acting as > individuals. > > I think it's obvious that if the QEMU project gets involved in a > legal case, we have a problem (at the very least long lasting > distraction from actual work on QEMU), even if we didn't do anything > wrong and a good lawyer would easily win the case. > > 3. It was easy to just outright ban AI while its results were usually > not really usable anyway. This has changed meanwhile, so it's much > harder to maintain an absolute ban. > > It's not really the best use of my time to look at the idea in > AI-generated test cases and then rewrite them from scratch so I can > actually submit them. (On the other hand, I think my rewritten > submissions were always better and more maintainable than what AI > produced initially, so there's that.) > > So while my perspective is a lot more nuanced than yours, I do see a > shift in the balance and was actually thinking of suggesting a change of > the policy myself. > > What I was thinking of was allowing AI-generated content in places where > it's at least easy to revert if there is ever a problem with it: Tests, > documentation etc., but not core code that lots of other things depend > on and that will have evolved a lot when we notice a problem and for > which throwing away is simply not an option. OK. what about trivial changes? Using AI as a better sed? > > I propose adopting linux's rules instead: > > https://docs.kernel.org/process/coding-assistants.html > > > > which boils down to attribution. > > What would we actually do with the detailed information? Why do we care > which model was used? Is this helpful commit metadata or is it just free > advertising for a handful of companies? I presume, if a specific model is somehow declared "contaminated" so we can locate its output? > I think I would see more use in a tag like (better name welcome): > > AI-used-for: [code|tests|docs|commit message]... > > Kevin I surely don't mind. -- MST