Adding upstream because I think there's little reason to keep this discussion off-list.

The context of this mailing thread is that we are discussing how we might want to store and record historical QMP API interface information such that we can programmatically answer questions like "when was X command introduced?", but possibly also to generate changelog QMP reports for auditing QMP changes for each version during the release candidate window.

On Thu, Jun 6, 2024 at 6:25 AM Victor Toso <victortoso@redhat.com> wrote:

Hi John!

On Wed, Jun 05, 2024 at 11:47:53AM GMT, John Snow wrote:
> Hi,
>
> as part of my project to modernize QMP documentation I recently
> forked the QAPI generator and modded it to be able to read
> historical QAPI schema going all the way back to v1.0.
>
> We want to be able to use this ability to generate accurate
> "since:" data for the docs.

:)

> My question for you is: do you have any input or preferences
> for the format of "output" or "compiled" schemata?

Not sure if I understand 'output' & 'compiled' reference here.

The idea is that even though I can modify the schema parser to cope with historical versions of the schema, we should re-save the schema into a unified format so that the gross hacks and kludges made to support old versions in my forked QAPI parser don't need to be kept in-tree. In theory, we only need to parse each historical schema precisely once.

(And then we'll only need to parse each new schema going forward with the version of the QAPI parser that's already in-tree.)

Importantly, old versions of the schema aren't contained *entirely* within the schema. Here's a timeline:

v0.12.0: QMP first introduced. Events are hardcoded, commands are defined in qemu-monitor.hx. query commands are hard-coded in monitor.c.

v0.14.0: qemu-monitor.hx is forked into qmp-commands.hx and hmp-commands.hx

v1.0: First version which features qapi-schema.json; all query commands are qapified but most other commands are not.

v1.1.0: A very large chunk of commands are QAPIfied.

v1.3.0: Most commands are now QAPIfied, but there are 2-3 remaining.

v2.1.0: events are now fully qapified; most are now defined in qapi/events.json

v2.8.0: The remaining commands are fully qapified; qmp-commands.hx is removed.

So what I mean by "compiled" schema is parsing historical revisions of the schema, including descriptive schema definitions for items not-yet-qapified (but nonetheless remain available in that version), and writing that curated information back out to disk (and checking into qemu.git) for later reference.

There are multiple choices for this output format:

(A) Just output in native qapi-schema format, just choose the "latest version".

(B) Choose some other arbitrary format.

I don't like the native qapi schema idea, for a few reasons:

- It changes over time

- It does not support nested structures except by reference, but

- Type names are not meant to API visible

- Detecting data changes in nested, shared types is difficult

In my prototype thus far, I have used a JSON-Schema based format with a type definition library that catalogues all of the command arg / command return / event data types with *all fields* fully dereferenced and inlined, except where impossible due to cyclical/recursive references (PCI and BlockStats come to mind.)

A benefit of this "compilation" is that all commands have their arguments and return values described solely by type (and for enums, values) and not by type name - fully removing non-API information from the "compiled" version.

A downside is that this is yet-another-format that differs from the existing format that requires someone knowledgeable to manipulate in case there are errors found in it, or worse - we decide to change or upgrade the format in some way to support a new feature in the future and we must yet-again revise our catalogue of historical schemata.

My preference is for this metadata to be there when the generator
parses the schema. It could another external metadata file, but
available to the generator.

Yes, ideally...

I'm thinking that I'd like to include a qapi/history/ directory which contains a record of all compiled historical schemata, and the generator can parse the directory and load up them all up - and then compile a quick lookup table that is able to answer basic questions about the data.

i.e. "when was X [command/event] first introduced?"

"when was Command.arguments.key first introduced / incompatibly modified?"

The syntax/API for how to ask the QAPISchema object these questions remains unpondered, as does the question "how do we report 'since' information in the rendered QMP documentation HTML for nested objects that we begrudgingly refer to only by type?"

Even if QAPI type names are not API, and even if my new QMP documentation project eliminates as many type names as it can by inlining inherited structures, boxed arguments, and branches - there are still referenced types for argument types. That might be a messy/unsolvable problem right now.

We can then identify all the cases where something might have
changed, when it changed, and what proper outputs would we have
due to that. (e.g: field added/removed)

For a whole discussion of this between Markus, Daniel and Andrea
with Go output (possibilities) in mind, see the thread from

Date: Tue, 10 May 2022 13:51:05 +0100
From: "Daniel P. Berrangé" <berrange@redhat.com>
Subject: Re: [RFC PATCH v1 0/8] qapi: add generator for Golang interface
Message-ID: <YnpfuYvBu56CCi7b@redhat.com>

> I figured you might have some opinions because of your dbus and
> other API work. Possibly this historical record is useful to
> you in some way.

I'm thinking back on when I worked in the libvirt-go-module code
generator.

For the Go bindings itself, I'm inclined to a have its versions
follow closely QEMU versions which brings each qapi-go tag to
support to what QAPI itself supports for that QEMU version we
generated that code. I think this is more or less close to what
Go applications would expect. So, the Since version is not that
impactful as it was for libvirt-go-module. Ah, based on what
Marc-André wrote, I think this is the same for qapi-rs.

Cheers,
Victor

As bonus context: here's output from a small utility I wrote called schema-diff that compares the compiled v1.0 and v1.1.0 schema:

```
jsnow@scv ~/s/qemu ((v3.1.0))> ./schema-diff.py qapi-compiled-v1.0.json qapi-compiled-v1.1.0.json
Added commands
==============
block-job-cancel
block-job-set-speed
block-stream
block_set_io_throttle
change-vnc-password
qom-get
qom-list
qom-list-types
qom-set
query-block-jobs
system_wakeup
transaction
xen-save-devices-state

Removed commands
================

Modified commands
=================
add_client (x-qemu-arguments)
++ arguments.tls: Optional<boolean>
blockdev-snapshot-sync (x-qemu-arguments)
-- arguments.snapshot-file: Optional<string>
++ arguments.mode: Optional<enum>
++ arguments.snapshot-file: string
memsave (x-qemu-arguments)
++ arguments.cpu-index: Optional<integer>
query-block (x-qemu-returns)
++ returns[].inserted.bps: integer
++ returns[].inserted.bps_rd: integer
++ returns[].inserted.bps_wr: integer
++ returns[].inserted.iops: integer
++ returns[].inserted.iops_rd: integer
++ returns[].inserted.iops_wr: integer
query-spice (x-qemu-returns)
++ returns.mouse-mode: enum
query-status (x-qemu-returns)
returns.status: enum
++ suspended
```

It's possibly rough around the edges (branches, ifcond and features are not yet supported; alternates are kinda-supported but do not yet diff well/prettily, ignore the "x-qemu-" stuff), and the diff output uses an arbitrary/invented "key path" and "field type" notation in order to simplify the changelog summary, but you can hopefully see the utility of such a report.

Helpfully, it shows *all levels* of changes, no matter how deeply nested, so *all* potential changes to the API can be observed / audited / documented. The only types it ever reports are fundamental JSON types and "any" and "enum", so any refactoring we do in the QAPI schema with regards to type names, inheritance, boxing, etc are fully transparent in this analysis.

It's a useful tool for reviewing QAPI changes during the RC period to ensure we did not accidentally create any breaking changes since the last released version.

One more example for the road:

```
jsnow@scv ~/s/qemu ((v3.1.0))> ./schema-diff.py qapi-compiled-v1.1.0.json qapi-compiled-v1.2.0.json
Added commands
==============
add-fd
device-list-properties
dump-guest-memory
migrate-set-cache-size
migrate-set-capabilities
query-cpu-definitions
query-events
query-fdsets
query-machines
query-migrate-cache-size
query-migrate-capabilities
query-target
remove-fd

Removed commands
================

Modified commands
=================
query-block (x-qemu-returns)
++ returns[].inserted.backing_file_depth: integer
++ returns[].inserted.encryption_key_missing: boolean
query-migrate (x-qemu-returns)
++ returns.disk.duplicate: integer
++ returns.disk.normal: integer
++ returns.disk.normal-bytes: integer
++ returns.ram.duplicate: integer
++ returns.ram.normal: integer
++ returns.ram.normal-bytes: integer
++ returns.total-time: Optional<integer>
++ returns.xbzrle-cache: Optional<object>
++ returns.xbzrle-cache.bytes: integer
++ returns.xbzrle-cache.cache-miss: integer
++ returns.xbzrle-cache.cache-size: integer
++ returns.xbzrle-cache.overflow: integer
++ returns.xbzrle-cache.pages: integer
```

Here, you can see that although the "disk" and "ram" fields for query-migrate's return type are very likely the same actual type, the compiler has evaluated everything down to its constituent parts and reported each difference independently for the report; preserving the possibility to obtain wire-protocol accurate added/modified data for each and every individual key.

I'll also point out that this diff view does not confuse "first version qapified" with "first version introduced"; v1.2.0 actually qapified several commands for the first time, but they remained available in earlier versions. The diff output accurately reflects this.

--js