From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id DD121CCA470 for ; Wed, 8 Oct 2025 07:19:15 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1v6ORN-00025O-Sp; Wed, 08 Oct 2025 03:18:25 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1v6ORF-00022D-2w for qemu-devel@nongnu.org; Wed, 08 Oct 2025 03:18:19 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1v6ORC-0000kq-LY for qemu-devel@nongnu.org; Wed, 08 Oct 2025 03:18:16 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1759907892; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=aqV2OB91OUf1agETbDGxMFK1qWXZBK5bFwEVPg/7p28=; b=Xi2QkcqtbR0kvemjV90VW2hiBXTxByhPjcTIzyU3VC3C8xYtdm2ObglMr9qO+AfH/6s1HU gATpJtyyd4GolRCjdbXuj+DQSmHkttJKaiyluKvz+nnDdXOrk2EoaHYMSSV5uGvZAGIGeV 356ZQEG1y42y8GYR0oreKDFp2fv6GKo= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-39-7OrKoIzXOr6ByFQqoY1NQg-1; Wed, 08 Oct 2025 03:18:08 -0400 X-MC-Unique: 7OrKoIzXOr6ByFQqoY1NQg-1 X-Mimecast-MFC-AGG-ID: 7OrKoIzXOr6ByFQqoY1NQg_1759907888 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id C656C180047F; Wed, 8 Oct 2025 07:18:07 +0000 (UTC) Received: from blackfin.pond.sub.org (unknown [10.45.242.6]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 726711800446; Wed, 8 Oct 2025 07:18:07 +0000 (UTC) Received: by blackfin.pond.sub.org (Postfix, from userid 1000) id D6DDE21E6A27; Wed, 08 Oct 2025 09:18:04 +0200 (CEST) From: Markus Armbruster To: Paolo Bonzini Cc: qemu-devel@nongnu.org, Alex =?utf-8?Q?Benn=C3=A9e?= , Daniel P . =?utf-8?Q?Berrang=C3=A9?= , John Snow , Kevin Wolf , Peter Maydell , Stefan Hajnoczi , Mads Ynddal Subject: Re: [PATCH 0/6] tracetool: add mypy --strict checking [AI discussion ahead!] In-Reply-To: <20251008063546.376603-1-pbonzini@redhat.com> (Paolo Bonzini's message of "Wed, 8 Oct 2025 08:35:39 +0200") References: <20251008063546.376603-1-pbonzini@redhat.com> Date: Wed, 08 Oct 2025 09:18:04 +0200 Message-ID: <87ikgpn9oz.fsf@pond.sub.org> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Received-SPF: pass client-ip=170.10.129.124; envelope-from=armbru@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -24 X-Spam_score: -2.5 X-Spam_bar: -- X-Spam_report: (-2.5 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.422, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Paolo Bonzini writes: > [People in Cc are a mix of Python people, tracing people, and people > who followed the recent AI discussions. - Paolo] > > This series adds type annotations to tracetool. While useful on its own, > it also served as an experiment in whether AI tools could be useful and > appropriate for mechanical code transformations that may not involve > copyrightable expression. > > In this version, the types were added mostly with the RightTyper tool > (https://github.com/RightTyper/RightTyper), which uses profiling to detect > the types of arguments and return types at run time. However, because > adding type annotations is such a narrow and verifiable task, I also developed > a parallel version using an LLM, to provide some data on topics such as: > > - how much choice/creativity is there in writing type annotations? > Is it closer to writing functional code or to refactoring? Based on my work with John Snow on typing of the QAPI generator: there is some choice. Consider typing a function's argument. Should we pick it based on what the function requires from its argument? Or should the type reflect how the function is used? Say the function iterates over the argument. So we make the argument Iterable[...], right? But what if all callers pass a list? Making it List[...] could be clearer then. It's a choice. I think the choice depends on context and taste. At some library's external interface, picking a more general type can make the function more generally useful. But for some internal helper, I'd pick the actual type. My point isn't that an LLM could not possibly do the right thing based on context, and maybe even "taste" distilled from its training data. My point is that this isn't entirely mechanical with basically one correct output. Once we have such judgement calls, there's the question how an LLM's choice depends on its training data (first order approximation today: nobody knows), and whether and when that makes the LLM's output a derived work of its training data (to be settled in court). [...] > Based on this experience, my answer to the copyrightability question is > that, for this kind of narrow request, the output of AI can be treated as > the output of an imperfect tool, rather than as creative content potentially > tainted by the training material. Maybe. > Of course this is one data point and > is intended as an experiment rather than a policy recommendation. Understood. We need to develop a better understanding of capabilities, potential benefits and risks, and such experiments can only help with that.