qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] tracetool: add mypy --strict checking [AI discussion ahead!]
@ 2025-10-08  6:35 Paolo Bonzini
  2025-10-08  6:35 ` [PATCH 1/6] tracetool: rename variable with conflicting types Paolo Bonzini
                   ` (8 more replies)
  0 siblings, 9 replies; 31+ messages in thread
From: Paolo Bonzini @ 2025-10-08  6:35 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Daniel P . Berrangé, John Snow, Kevin Wolf,
	Markus Armbruster, Peter Maydell, Stefan Hajnoczi, Mads Ynddal

[People in Cc are a mix of Python people, tracing people, and people
 who followed the recent AI discussions. - Paolo]

This series adds type annotations to tracetool. While useful on its own, 
it also served as an experiment in whether AI tools could be useful and
appropriate for mechanical code transformations that may not involve
copyrightable expression.

In this version, the types were added mostly with the RightTyper tool
(https://github.com/RightTyper/RightTyper), which uses profiling to detect
the types of arguments and return types at run time.  However, because
adding type annotations is such a narrow and verifiable task, I also developed
a parallel version using an LLM, to provide some data on topics such as:

- how much choice/creativity is there in writing type annotations?
  Is it closer to writing functional code or to refactoring?

- how does AI output for such mechanical transformations compare to other
  automated tools, whose output is known not to be copyrightable? 

- what is the kind of time saving that the tool can provide?

- how effective is an LLM for this task?  Is the required human help a
  pain to provide, or can the human keep the fun part for itself?

While the version submitted here does not use any LLM-generated content
in compliance with QEMU's policy, the first version was developed using
an LLM tool (Claude Code) with a very simple prompt:
    
      Add type annotations to tracetool.py and all .py files in
      tracetool/*.py.  Use new-style annotations, for example list[str]
      instead of typing.List[str].  When done, use mypy to find any
      issues.

The results were surprisingly good for such a simple-minded test, and
while there were some issues they turned out to be mostly due to dead
code that confused the LLM.  Thus, I removed the dead code (this is
already in qemu.git) and then did the actual experiment, running
both the RightTyper script (which is detailed in patch 4) and the LLM
in separate branches.  The remaining errors from the LLM were also
reasonably easy to fix, so I did that manually:

      tracetool/__init__.py:239: error: Need type annotation for "_args" (hint: "_args: list[<type>] = ...")  [var-annotated]
      tracetool/__init__.py:272: error: Argument 1 to "Arguments" has incompatible type "list[tuple[str, str]]"; expected "list[tuple[str, str] | Arguments]"  [arg-type]
      tracetool/__init__.py:579: error: Incompatible types in assignment (expression has type "str | None", variable has type "None")  [assignment]
      tracetool/__init__.py:580: error: Incompatible types in assignment (expression has type "str | None", variable has type "None")  [assignment]

After reviewing the changes, I followed up with another prompt for mechanical
changes:

      Change "backend: Any" to "backend: Wrapper" in all tracetool/format/ files

Honestly, I could have done this part in less time without any help; I
was just curious to see if the LLM would also remove the unused import.
Not only it didn't; this prompt didn't even touch most of the files so
I did the change, ran isort and called it a day.

Comparing the results from RightTyper and AI, both tools benefit from human
help: the dead code removal which I mentioned, the small cleanups in patch 1,
the final manual fixes for the remaining (mostly trivial) errors.  But at
least in this case, AI is a winner:

- it left basically no unannotated code: fixing the above errors was enough
  to pass "mypy --strict", unlike RightTyper which needed a little more work
  due to its profiling-based nature and a few other limitations (see patch 5).

- most importantly, trying various other tools that didn't work, as well as
  figuring out how to use RightTyper, took a couple hours more.  Surprising
  as it was, I could not find any static type inferencing tool for Python;
  neither pytype nor pyre worked for me.  This is also why I think this
  is not apples to oranges, but a fair comparison between AI-based and
  regular tooling.

I want to highlight "in this case".  I had other experiments where the
LLM's output was all but awful, and I wasted time debugging code that
I should have thrown away immediately!

After the diffstat, you can find a diff from this series to the version
based on Claude Code.  It's impossible to be 100% objective but,
besides being functionally equivalent, I don't think either would be
identifiable as written by an LLM, by a person, by a tool+human combo,
or even by a good type inferencing tool (if it existed).

Based on this experience, my answer to the copyrightability question is
that, for this kind of narrow request, the output of AI can be treated as
the output of an imperfect tool, rather than as creative content potentially
tainted by the training material.  Of course this is one data point and
is intended as an experiment rather than a policy recommendation.

Paolo



Paolo Bonzini (6):
  tracetool: rename variable with conflicting types
  tracetool: apply isort and add check
  tracetool: "import annotations"
  tracetool: add type annotations
  tracetool: complete typing annotations
  tracetool: add typing checks to "make -C python check"

 python/tests/tracetool-isort.sh              |  4 +
 python/tests/tracetool-mypy.sh               |  5 ++
 scripts/tracetool.py                         | 12 +--
 scripts/tracetool/__init__.py                | 84 ++++++++++----------
 scripts/tracetool/backend/__init__.py        | 21 ++---
 scripts/tracetool/backend/dtrace.py          | 19 ++---
 scripts/tracetool/backend/ftrace.py          | 13 +--
 scripts/tracetool/backend/log.py             | 13 +--
 scripts/tracetool/backend/simple.py          | 19 ++---
 scripts/tracetool/backend/syslog.py          | 13 +--
 scripts/tracetool/backend/ust.py             | 11 +--
 scripts/tracetool/format/__init__.py         |  9 ++-
 scripts/tracetool/format/c.py                |  7 +-
 scripts/tracetool/format/d.py                |  7 +-
 scripts/tracetool/format/h.py                |  7 +-
 scripts/tracetool/format/log_stap.py         | 12 +--
 scripts/tracetool/format/rs.py               |  7 +-
 scripts/tracetool/format/simpletrace_stap.py |  7 +-
 scripts/tracetool/format/stap.py             | 10 ++-
 scripts/tracetool/format/ust_events_c.py     |  7 +-
 scripts/tracetool/format/ust_events_h.py     |  7 +-
 21 files changed, 173 insertions(+), 121 deletions(-)
 create mode 100755 python/tests/tracetool-isort.sh
 create mode 100755 python/tests/tracetool-mypy.sh
-- 
2.51.0


diff --git a/scripts/tracetool.py b/scripts/tracetool.py
index 13c5583fa21..06c40aa71e7 100755
--- a/scripts/tracetool.py
+++ b/scripts/tracetool.py
@@ -16,7 +16,6 @@
 
 import getopt
 import sys
-from typing import NoReturn
 
 import tracetool.backend
 import tracetool.format
@@ -24,7 +24,7 @@
 
 _SCRIPT = ""
 
-def error_opt(msg: str | None = None) -> NoReturn:
+def error_opt(msg: str | None = None) -> None:
     if msg is not None:
         error_write("Error: " + msg + "\n")
 
diff --git a/scripts/tracetool/__init__.py b/scripts/tracetool/__init__.py
index 58b7a289425..c9509b327f4 100644
--- a/scripts/tracetool/__init__.py
+++ b/scripts/tracetool/__init__.py
@@ -17,9 +17,8 @@
 import os
 import re
 import sys
-from io import TextIOWrapper
 from pathlib import PurePath
-from typing import Any, Iterator, Sequence
+from typing import Any, Iterator, Sequence, TextIO
 
 import tracetool.backend
 import tracetool.format
@@ -50,7 +50,7 @@ def error(*lines: str) -> None:
     'MAX': 'j',
 }
 
-def expand_format_string(c_fmt: str, prefix: str="") -> str:
+def expand_format_string(c_fmt: str, prefix: str = "") -> str:
     def pri_macro_to_fmt(pri_macro: str) -> str:
         assert pri_macro.startswith("PRI")
         fmt_type = pri_macro[3]  # 'd', 'i', 'u', or 'x'
@@ -469,13 +469,13 @@ def formats(self) -> list[str]:
     QEMU_BACKEND_DSTATE      = "TRACE_%(NAME)s_BACKEND_DSTATE"
     QEMU_EVENT               = "_TRACE_%(NAME)s_EVENT"
 
-    def api(self, fmt: str|None=None) -> str:
+    def api(self, fmt: str | None = None) -> str:
         if fmt is None:
             fmt = Event.QEMU_TRACE
         return fmt % {"name": self.name, "NAME": self.name.upper()}
 
 
-def read_events(fobj: TextIOWrapper, fname: str) -> list[Event]:
+def read_events(fobj: TextIO, fname: str) -> list[Event]:
     """Generate the output for the given (format, backends) pair.
 
     Parameters
@@ -514,7 +514,7 @@ class TracetoolError (Exception):
     pass
 
 
-def try_import(mod_name: str, attr_name: str | None=None, attr_default: Any=None) -> tuple[bool, Any]:
+def try_import(mod_name: str, attr_name: str | None = None, attr_default: Any = None) -> tuple[bool, Any]:
     """Try to import a module and get an attribute from it.
 
     Parameters
@@ -541,7 +541,7 @@ def try_import(mod_name: str, attr_name: str | None=None, attr_default: Any=None
 
 
 def generate(events: list[Event], group: str, format: str, backends: list[str],
-             binary: str|None=None, probe_prefix: str|None=None) -> None:
+             binary: str | None = None, probe_prefix: str | None = None) -> None:
     """Generate the output for the given (format, backends) pair.
 
     Parameters
@@ -581,7 +581,7 @@ def generate(events: list[Event], group: str, format: str, backends: list[str],
 
     tracetool.format.generate(events, format, backend, group)
 
-def posix_relpath(path: str, start: str|None=None) -> str:
+def posix_relpath(path: str, start: str | None = None) -> str:
     try:
         path = os.path.relpath(path, start)
     except ValueError:
diff --git a/scripts/tracetool/backend/__init__.py b/scripts/tracetool/backend/__init__.py
index 84bab3f42a0..37654a133ce 100644
--- a/scripts/tracetool/backend/__init__.py
+++ b/scripts/tracetool/backend/__init__.py
@@ -122,7 +122,7 @@ def backend_modules(self) -> Iterator[Any]:
              if module is not None:
                  yield module
 
-    def _run_function(self, name: str, *args: Any, check_trace_event_get_state: bool | None=None, **kwargs: Any) -> None:
+    def _run_function(self, name: str, *args: Any, check_trace_event_get_state: bool | None = None, **kwargs: Any) -> None:
         for backend in self.backend_modules():
             func = getattr(backend, name % self._format, None)
             if func is not None and \
@@ -133,7 +133,7 @@ def _run_function(self, name: str, *args: Any, check_trace_event_get_state: bool
     def generate_begin(self, events: list[tracetool.Event], group: str) -> None:
         self._run_function("generate_%s_begin", events, group)
 
-    def generate(self, event: tracetool.Event, group: str, check_trace_event_get_state: bool | None=None) -> None:
+    def generate(self, event: tracetool.Event, group: str, check_trace_event_get_state: bool | None = None) -> None:
         self._run_function("generate_%s", event, group, check_trace_event_get_state=check_trace_event_get_state)
 
     def generate_backend_dstate(self, event: tracetool.Event, group: str) -> None:
diff --git a/scripts/tracetool/backend/dtrace.py b/scripts/tracetool/backend/dtrace.py
index d9f4b8f09c8..e54847e96b6 100644
--- a/scripts/tracetool/backend/dtrace.py
+++ b/scripts/tracetool/backend/dtrace.py
@@ -19,7 +19,7 @@
 PUBLIC = True
 
 
-PROBEPREFIX: str|None = None
+PROBEPREFIX: str | None = None
 
 def probeprefix() -> str:
     if PROBEPREFIX is None:
@@ -27,7 +29,7 @@ def probeprefix() -> str:
     return PROBEPREFIX
 
 
-BINARY: str|None = None
+BINARY: str | None = None
 
 def binary() -> str:
     if BINARY is None:
diff --git a/scripts/tracetool/format/__init__.py b/scripts/tracetool/format/__init__.py
index c287e93ed30..d4fc452b4ee 100644
--- a/scripts/tracetool/format/__init__.py
+++ b/scripts/tracetool/format/__init__.py
@@ -40,7 +40,7 @@
 import os
 
 import tracetool
-from tracetool.backend import Wrapper
+import tracetool.backend
 
 
 def get_list() -> list[tuple[str, str]]:
@@ -76,7 +76,7 @@ def exists(name: str) -> bool:
     return tracetool.try_import("tracetool.format." + name)[0]
 
 
-def generate(events: list[tracetool.Event], format: str, backend: Wrapper, group: str) -> None:
+def generate(events: list[tracetool.Event], format: str, backend: tracetool.backend.Wrapper, group: str) -> None:
     if not exists(format):
         raise ValueError("unknown format: %s" % format)
     format = format.replace("-", "_")
diff --git a/scripts/tracetool/format/c.py b/scripts/tracetool/format/c.py
index e8848c51051..2e02365444d 100644
--- a/scripts/tracetool/format/c.py
+++ b/scripts/tracetool/format/c.py
@@ -14,11 +14,11 @@
 __email__      = "stefanha@redhat.com"
 
 
+import tracetool.backend
 from tracetool import Event, out
-from tracetool.backend import Wrapper
 
 
-def generate(events: list[Event], backend: Wrapper, group: str) -> None:
+def generate(events: list[Event], backend: tracetool.backend.Wrapper, group: str) -> None:
     active_events = [e for e in events
                      if "disable" not in e.properties]
 
diff --git a/scripts/tracetool/format/d.py b/scripts/tracetool/format/d.py
index 2abf8bc24ba..bc9bc57e0b3 100644
--- a/scripts/tracetool/format/d.py
+++ b/scripts/tracetool/format/d.py
@@ -16,8 +16,8 @@
 
 from sys import platform
 
+import tracetool.backend
 from tracetool import Event, out
-from tracetool.backend import Wrapper
 
 # Reserved keywords from
 # https://wikis.oracle.com/display/DTrace/Types,+Operators+and+Expressions
@@ -32,7 +32,7 @@
 )
 
 
-def generate(events: list[Event], backend: Wrapper, group: str) -> None:
+def generate(events: list[Event], backend: tracetool.backend.Wrapper, group: str) -> None:
     events = [e for e in events
               if "disable" not in e.properties]
 
diff --git a/scripts/tracetool/format/h.py b/scripts/tracetool/format/h.py
index 2324234dd24..6d2d5cb4a9a 100644
--- a/scripts/tracetool/format/h.py
+++ b/scripts/tracetool/format/h.py
@@ -14,11 +14,11 @@
 __email__      = "stefanha@redhat.com"
 
 
+import tracetool.backend
 from tracetool import Event, out
-from tracetool.backend import Wrapper
 
 
-def generate(events: list[Event], backend: Wrapper, group: str) -> None:
+def generate(events: list[Event], backend: tracetool.backend.Wrapper, group: str) -> None:
     header = "trace/control.h"
 
     out('/* This file is autogenerated by tracetool, do not edit. */',
diff --git a/scripts/tracetool/format/log_stap.py b/scripts/tracetool/format/log_stap.py
index 914e4674ffe..4223e2d9e2d 100644
--- a/scripts/tracetool/format/log_stap.py
+++ b/scripts/tracetool/format/log_stap.py
@@ -15,8 +15,8 @@
 
 import re
 
+import tracetool.backend
 from tracetool import Event, out
-from tracetool.backend import Wrapper
 from tracetool.backend.dtrace import binary, probeprefix
 from tracetool.backend.simple import is_string
 from tracetool.format.stap import stap_escape
@@ -86,7 +86,7 @@ def c_fmt_to_stap(fmt: str) -> str:
     fmt = re.sub(r"%(\d*)(l+|z)(x|u|d)", r"%\1\3", "".join(bits))
     return fmt
 
-def generate(events: list[Event], backend: Wrapper, group: str) -> None:
+def generate(events: list[Event], backend: tracetool.backend.Wrapper, group: str) -> None:
     out('/* This file is autogenerated by tracetool, do not edit. */',
         '/* SPDX-License-Identifier: GPL-2.0-or-later */',
         '')
diff --git a/scripts/tracetool/format/rs.py b/scripts/tracetool/format/rs.py
index 27a4025e3d6..08129d2d426 100644
--- a/scripts/tracetool/format/rs.py
+++ b/scripts/tracetool/format/rs.py
@@ -14,11 +14,11 @@
 __email__      = "stefanha@redhat.com"
 
 
+import tracetool.backend
 from tracetool import Event, out
-from tracetool.backend import Wrapper
 
 
-def generate(events: list[Event], backend: Wrapper, group: str) -> None:
+def generate(events: list[Event], backend: tracetool.backend.Wrapper, group: str) -> None:
     out('// SPDX-License-Identifier: GPL-2.0-or-later',
         '// This file is @generated by tracetool, do not edit.',
         '',
diff --git a/scripts/tracetool/format/simpletrace_stap.py b/scripts/tracetool/format/simpletrace_stap.py
index c76cf426853..b42bd3d6f02 100644
--- a/scripts/tracetool/format/simpletrace_stap.py
+++ b/scripts/tracetool/format/simpletrace_stap.py
@@ -14,14 +14,14 @@
 __email__      = "stefanha@redhat.com"
 
 
+import tracetool.backend
 from tracetool import Event, out
-from tracetool.backend import Wrapper
 from tracetool.backend.dtrace import probeprefix
 from tracetool.backend.simple import is_string
 from tracetool.format.stap import stap_escape
 
 
-def generate(events: list[Event], backend: Wrapper, group: str) -> None:
+def generate(events: list[Event], backend: tracetool.backend.Wrapper, group: str) -> None:
     out('/* This file is autogenerated by tracetool, do not edit. */',
         '/* SPDX-License-Identifier: GPL-2.0-or-later */',
         '')
diff --git a/scripts/tracetool/format/stap.py b/scripts/tracetool/format/stap.py
index af98f3c44a3..2c15c75f7c2 100644
--- a/scripts/tracetool/format/stap.py
+++ b/scripts/tracetool/format/stap.py
@@ -14,8 +14,8 @@
 __email__      = "stefanha@redhat.com"
 
 
+import tracetool.backend
 from tracetool import Event, out
-from tracetool.backend import Wrapper
 from tracetool.backend.dtrace import binary, probeprefix
 
 # Technically 'self' is not used by systemtap yet, but
@@ -35,7 +35,7 @@ def stap_escape(identifier: str) -> str:
     return identifier
 
 
-def generate(events: list[Event], backend: Wrapper, group: str) -> None:
+def generate(events: list[Event], backend: tracetool.backend.Wrapper, group: str) -> None:
     events = [e for e in events
               if "disable" not in e.properties]
 
diff --git a/scripts/tracetool/format/ust_events_c.py b/scripts/tracetool/format/ust_events_c.py
index 2c8256ed7a8..634dd9a7115 100644
--- a/scripts/tracetool/format/ust_events_c.py
+++ b/scripts/tracetool/format/ust_events_c.py
@@ -14,11 +14,11 @@
 __email__      = "stefanha@redhat.com"
 
 
+import tracetool.backend
 from tracetool import Event, out
-from tracetool.backend import Wrapper
 
 
-def generate(events: list[Event], backend: Wrapper, group: str) -> None:
+def generate(events: list[Event], backend: tracetool.backend.Wrapper, group: str) -> None:
     events = [e for e in events
               if "disabled" not in e.properties]
 
diff --git a/scripts/tracetool/format/ust_events_h.py b/scripts/tracetool/format/ust_events_h.py
index f0ffcae6941..098c28c7e7a 100644
--- a/scripts/tracetool/format/ust_events_h.py
+++ b/scripts/tracetool/format/ust_events_h.py
@@ -14,11 +14,11 @@
 __email__      = "stefanha@redhat.com"
 
 
+import tracetool.backend
 from tracetool import Event, out
-from tracetool.backend import Wrapper
 
 
-def generate(events: list[Event], backend: Wrapper, group: str) -> None:
+def generate(events: list[Event], backend: tracetool.backend.Wrapper, group: str) -> None:
     events = [e for e in events
               if "disabled" not in e.properties]
 



^ permalink raw reply related	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2025-10-10 17:45 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-08  6:35 [PATCH 0/6] tracetool: add mypy --strict checking [AI discussion ahead!] Paolo Bonzini
2025-10-08  6:35 ` [PATCH 1/6] tracetool: rename variable with conflicting types Paolo Bonzini
2025-10-08  9:27   ` Daniel P. Berrangé
2025-10-08 18:06   ` Stefan Hajnoczi
2025-10-08  6:35 ` [PATCH 2/6] tracetool: apply isort and add check Paolo Bonzini
2025-10-08  9:29   ` Daniel P. Berrangé
2025-10-08 17:58   ` Stefan Hajnoczi
2025-10-09  7:52     ` Daniel P. Berrangé
2025-10-09  8:22       ` Paolo Bonzini
2025-10-09  8:58         ` Daniel P. Berrangé
2025-10-09 11:26           ` Paolo Bonzini
2025-10-09 12:05             ` Daniel P. Berrangé
2025-10-09  7:58     ` Paolo Bonzini
2025-10-08  6:35 ` [PATCH 3/6] tracetool: "import annotations" Paolo Bonzini
2025-10-08  9:31   ` Daniel P. Berrangé
2025-10-08 18:06   ` Stefan Hajnoczi
2025-10-08  6:35 ` [PATCH 4/6] tracetool: add type annotations Paolo Bonzini
2025-10-08 18:09   ` Stefan Hajnoczi
2025-10-08  6:35 ` [PATCH 5/6] tracetool: complete typing annotations Paolo Bonzini
2025-10-08 18:10   ` Stefan Hajnoczi
2025-10-08  6:35 ` [PATCH 6/6] tracetool: add typing checks to "make -C python check" Paolo Bonzini
2025-10-08  9:32   ` Daniel P. Berrangé
2025-10-08 18:12   ` Stefan Hajnoczi
2025-10-08  7:18 ` [PATCH 0/6] tracetool: add mypy --strict checking [AI discussion ahead!] Markus Armbruster
2025-10-08 10:34   ` Daniel P. Berrangé
2025-10-10 12:38     ` Markus Armbruster
2025-10-10 13:49       ` Paolo Bonzini
2025-10-10 17:41         ` Markus Armbruster
2025-10-08 10:40 ` Daniel P. Berrangé
2025-10-08 17:23 ` Stefan Hajnoczi
2025-10-08 17:41   ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).