Historical QAPI schema parser, "compiled schema", and qapi-schema-diff

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* Historical QAPI schema parser, "compiled schema", and qapi-schema-diff
@ 2024-06-13  6:13 John Snow
  2024-06-13 16:12 ` Daniel P. Berrangé
  0 siblings, 1 reply; 3+ messages in thread
From: John Snow @ 2024-06-13  6:13 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel, Victor Toso de Carvalho

[-- Attachment #1: Type: text/plain, Size: 3875 bytes --]

Hi, recently I've been working on overhauling our QMP documentation; see
https://jsnow.gitlab.io/qemu/qapi/index.html for a recent work-in-progress
page showcasing this.

As part of this project, Markus and I decided it'd be nice to be able to
auto-generate "Since" information. The short reason for 'why' is because
since info hard-coded into doc comments may not be accurate with regards to
the wire protocol availability for a given field when a QAPI definition is
shared or inherited by multiple sources. If we can generate it, it should
always be accurate.

So, I've prototyped three things:

(1) An out-of-tree fork of the QAPI generator that is capable of parsing
qemu-commands.hx, qmp-commands.hx, and all versions of our qapi-schema.json
files going all the way back to v0.12.0.

It accomplishes this with some fairly brutish hacks that I never expect to
need to check in to qemu.git.

(2) A schema "compiler", a QAPI generator module that takes a parsed Schema
and produces a single-file JSON Schema document that describes every
command and event as it exists on the wire without using type names or any
information not considered to be "API".

This part *would* need to be checked in to qemu.git (if we go in this
direction.)
The compiled historical schema would also get checked in, for the QAPI
parser to reference against to generate the since information.

(Or, some kind of meta-compiled document with just the since information.
Either way; the idea is that we'll catalog the output without needing to
commit the parser compatibility hacks.)

(3) A script that can diff two compiled schema, showing a change report
between two versions. (I sent an email earlier today/yesterday showing
example output of this script.) This one was more for "fun", but it helped
prove all the other parts were working correctly, and it might be useful in
the future when auditing changes during the RC phase. We may well decide to
commit this script upstream, or one like it.

All of those things are here: https://gitlab.com/jsnow/externalized-qapi

I'm sharing this in its out-of-tree form mostly for Markus's sake as we
debate the pros/cons of various choices I've made in this prototype, but
you're welcome to peep the early discussions if you'd like, too.

Notes:

1. If you want to try "compiling" schema yourself, clone the git repo and
install it with "pip install .". Navigate to your qemu.git root and check
out a release tag (such as v0.12.0 or v1.0 or v9.0.0) and then run
"qapi-compile".

(If your git tags are "weird", this might break. Sorry about that, it's a
prototype... the hacky code that uses "git describe" is in qapi/compat.py
if you run into troubles and wanna mess around with it.)

2. The "qapi compiler" makes use of schema addendum files for some old
versions to produce correct output. You can browse them on gitlab here:
https://gitlab.com/jsnow/externalized-qapi/-/tree/main/qapi/schemata?ref_type=heads

There are addendum files for v0.12.0 through v2.0.0. Other "errata" are
handled in code; no errata of any kind are needed in v2.8.0 or later.

3. If you don't wanna run the compiler yourself (or it broke because it's a
real hackjob), I compiled all of the historical QAPI schema myself and
checked them into the repo here:
https://gitlab.com/jsnow/externalized-qapi/-/tree/main/compiled?ref_type=heads

4. You can diff any two compiled schema with "qapi-schema-diff A.json
B.json". Put the earlier version first.

5.  qapi-compile and qapi-schema-diff don't yet support "if" and "features"
everywhere they should, but everything else should work correctly.

6. The commit history for this repo is actually pretty well factored; each
compatibility hack for the QAPI parser has its own commit, so it's easy to
suss out what work was required to make this work.

I'm about to head off on a long weekend, I'll be back Tuesday.

Have fun,
--js

[-- Attachment #2: Type: text/html, Size: 4988 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Historical QAPI schema parser, "compiled schema", and qapi-schema-diff
  2024-06-13  6:13 Historical QAPI schema parser, "compiled schema", and qapi-schema-diff John Snow
@ 2024-06-13 16:12 ` Daniel P. Berrangé
  2024-06-19  0:55   ` John Snow
  0 siblings, 1 reply; 3+ messages in thread
From: Daniel P. Berrangé @ 2024-06-13 16:12 UTC (permalink / raw)
  To: John Snow; +Cc: Markus Armbruster, qemu-devel, Victor Toso de Carvalho

On Thu, Jun 13, 2024 at 02:13:15AM -0400, John Snow wrote:
> Hi, recently I've been working on overhauling our QMP documentation; see
> https://jsnow.gitlab.io/qemu/qapi/index.html for a recent work-in-progress
> page showcasing this.
> 
> As part of this project, Markus and I decided it'd be nice to be able to
> auto-generate "Since" information. The short reason for 'why' is because
> since info hard-coded into doc comments may not be accurate with regards to
> the wire protocol availability for a given field when a QAPI definition is
> shared or inherited by multiple sources. If we can generate it, it should
> always be accurate.
> 
> So, I've prototyped three things:
> 
> (1) An out-of-tree fork of the QAPI generator that is capable of parsing
> qemu-commands.hx, qmp-commands.hx, and all versions of our qapi-schema.json
> files going all the way back to v0.12.0.
> 
> It accomplishes this with some fairly brutish hacks that I never expect to
> need to check in to qemu.git.
> 
> (2) A schema "compiler", a QAPI generator module that takes a parsed Schema
> and produces a single-file JSON Schema document that describes every
> command and event as it exists on the wire without using type names or any
> information not considered to be "API".
> 
> This part *would* need to be checked in to qemu.git (if we go in this
> direction.)
> The compiled historical schema would also get checked in, for the QAPI
> parser to reference against to generate the since information.

The upside with checking in every historical schema is that we
have a set of self-contained schemas where you can see everything
at a glance for each version.

The downside with checking in every historical schema is that between
any adjacent pair of schemas 99% of the content is identical. IOW we
are very wasteful of storage.

Looking at your other mail about schema diffs, I wonder if we the
diff format you show there can kill two birds with one stone.

  https://lists.nongnu.org/archive/html/qemu-devel/2024-06/msg02398.html

In my reply I had illustrated a variant of your format:

 - x-query-rdma
 -     returns.human-readable-text: str
 . blockdev-backup
 +     arguments.discard-source: Optional<boolean>
 . migrate
 -    arguments.blk: Optional<boolean>
 -    arguments.inc: Optional<boolean>
 . object-add
 .    arguments.qom-type: enum
 +        'sev-snp-guest'
 +    arguments[sev-guest].legacy-vm-type: Optional<boolean>
 +    arguments[sev-snp-guest].author-key-enabled: Optional<boolean>
 +    arguments[sev-snp-guest].cbitpos: Optional<integer>

Where '.' is just pre-existing context, and +/- have the obvious
meaning for the 2 given versions.

What if, we append a version number to *every* line, and exclusively
use +/-.

Taking just one small command:

 + 6.2.0: x-query-rdma
 + 6.2.0:    returns.human-readable-text: str
 - 9.1.0: x-query-rdma

This tell us 'x-query-rdma' was added in 6.2.0, the
'human-readable-text' parameter arrived at the same
time, and the whole command was then deleted in 9.1.0
That has implicit property deletion, but for completeness
we could be explicit about each property when deleting
a command:

 + 6.2.0: x-query-rdma
 + 6.2.0:    returns.human-readable-text: str
 - 9.1.0:    returns.human-readable-text: str
 - 9.1.0: x-query-rdma

Taking the more complex 'object-add' command

 +  2.0.0: object-add
 +  2.0.0:   arguments.qom-type: enum
 +  2.0.0:     '....'
 + 2.11.0:     'sev-guest'
 +  9.1.0:     'sev-snp-guest'
 + 2.11.0:   arguments[sev-guest].policy: uint32
 + 2.11.0:   arguments[sev-guest].session-file: str
 + 2.11.0:   arguments[sev-guest].dh-cert: str
 +  9.1.0:   arguments[sev-guest].legacy-vm-type: Optional<boolean>
 +  9.1.0:   arguments[sev-snp-guest].author-key-enabled: Optional<boolean>
 +  9.1.0:   arguments[sev-snp-guest].cbitpos: Optional<integer>

IOW, object-add was introduced in 2.0.0. The 'sev-guest' enum
variant was added in 2.11.0 with various fields at the same
time. The 'sev-guest' enum variant got an exctra field in 9.1.0
The 'sev-snp-guest' enum variant was added in 9.1.0 with some
fields.

For fields which change from Optional <-> Required, that could
be modelled simply as parameter deletion + addition in the
same version eg hypothetically lets say the 'sev-guest' field
'policy' had changed, we would see:

 +  2.0.0: object-add
 +  2.0.0:   arguments.qom-type: enum
 +  2.0.0:     '....'
 + 2.11.0:     'sev-guest'
 +  9.1.0:     'sev-snp-guest'
 + 2.11.0:   arguments[sev-guest].policy: uint32
 -  6.2.0:   arguments[sev-guest].policy: uint32
 +  6.2.0:   arguments[sev-guest].policy: Optional<uint32>
 + 2.11.0:   arguments[sev-guest].session-file: str
 + 2.11.0:   arguments[sev-guest].dh-cert: str
 +  9.1.0:   arguments[sev-guest].legacy-vm-type: Optional<boolean>
 +  9.1.0:   arguments[sev-snp-guest].author-key-enabled: Optional<boolean>
 +  9.1.0:   arguments[sev-snp-guest].cbitpos: Optional<integer>

Incidentally, if going down this route, I think I would NOT
have 1 file with the whole schema history, but have 1 file
per command / event. eg qapi/history/object-add.txt,
qapi/history/x-query-rdma.txt, qapi/history/VFIO_MIGRATION.txt,
etc. This will make it trivial for a person to focus in on
changes in the command they care about, likely without even
needing a schema diff tool much of the time, as the per-command
files will often be concise enough you can consider the full
history without filtering.

> (3) A script that can diff two compiled schema, showing a change report
> between two versions. (I sent an email earlier today/yesterday showing
> example output of this script.) This one was more for "fun", but it helped
> prove all the other parts were working correctly, and it might be useful in
> the future when auditing changes during the RC phase. We may well decide to
> commit this script upstream, or one like it.

With a single file containing all deltas, where each line is
version annotated, the "diff" tool becomes little more than
something which can 'grep' for lines in the file which have
a version number within the desired range. In fact it can also
optionally offer something better than a diff, as instead of
showing you only the orignal state and result state, it
can trivially shows you any intermediate changes and what
version they happened with. 

eg if you asked for a diff between 2.0.0 and 9.1.0, and there
was a command or property that was added in 4.0.0 and deleted
in 6.0.0, a traditional diff will not tell you about this. You'll
never notice it ever existed. 

A "history grep" showing the set of changes between 2 versions
will highlight things that come + go, which can be quite
useful for understanding API evolution I think.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Historical QAPI schema parser, "compiled schema", and qapi-schema-diff
  2024-06-13 16:12 ` Daniel P. Berrangé
@ 2024-06-19  0:55   ` John Snow
  0 siblings, 0 replies; 3+ messages in thread
From: John Snow @ 2024-06-19  0:55 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Markus Armbruster, qemu-devel, Victor Toso de Carvalho

[-- Attachment #1: Type: text/plain, Size: 10345 bytes --]

On Thu, Jun 13, 2024 at 12:12 PM Daniel P. Berrangé <berrange@redhat.com>
wrote:

> On Thu, Jun 13, 2024 at 02:13:15AM -0400, John Snow wrote:
> > Hi, recently I've been working on overhauling our QMP documentation; see
> > https://jsnow.gitlab.io/qemu/qapi/index.html for a recent
> work-in-progress
> > page showcasing this.
> >
> > As part of this project, Markus and I decided it'd be nice to be able to
> > auto-generate "Since" information. The short reason for 'why' is because
> > since info hard-coded into doc comments may not be accurate with regards
> to
> > the wire protocol availability for a given field when a QAPI definition
> is
> > shared or inherited by multiple sources. If we can generate it, it should
> > always be accurate.
> >
> > So, I've prototyped three things:
> >
> > (1) An out-of-tree fork of the QAPI generator that is capable of parsing
> > qemu-commands.hx, qmp-commands.hx, and all versions of our
> qapi-schema.json
> > files going all the way back to v0.12.0.
> >
> > It accomplishes this with some fairly brutish hacks that I never expect
> to
> > need to check in to qemu.git.
> >
> > (2) A schema "compiler", a QAPI generator module that takes a parsed
> Schema
> > and produces a single-file JSON Schema document that describes every
> > command and event as it exists on the wire without using type names or
> any
> > information not considered to be "API".
> >
> > This part *would* need to be checked in to qemu.git (if we go in this
> > direction.)
> > The compiled historical schema would also get checked in, for the QAPI
> > parser to reference against to generate the since information.
>
> The upside with checking in every historical schema is that we
> have a set of self-contained schemas where you can see everything
> at a glance for each version.
>

Yep. It's "dumb" but very easy to access and work with.


>
> The downside with checking in every historical schema is that between
> any adjacent pair of schemas 99% of the content is identical. IOW we
> are very wasteful of storage.
>

... Also agree. Because these files avoid shared types as an explicit
design goal, and JSON Schema is *very* verbose, these files get extremely
large while saying little.

I chose them for the proof of concept because they're an existing
standard/format I didn't have to engineer or reason about heavily, and
really no other reason.


>
> Looking at your other mail about schema diffs, I wonder if we the
> diff format you show there can kill two birds with one stone.
>
>   https://lists.nongnu.org/archive/html/qemu-devel/2024-06/msg02398.html
>
> In my reply I had illustrated a variant of your format:
>
>  - x-query-rdma
>  -     returns.human-readable-text: str
>  . blockdev-backup
>  +     arguments.discard-source: Optional<boolean>
>  . migrate
>  -    arguments.blk: Optional<boolean>
>  -    arguments.inc: Optional<boolean>
>  . object-add
>  .    arguments.qom-type: enum
>  +        'sev-snp-guest'
>  +    arguments[sev-guest].legacy-vm-type: Optional<boolean>
>  +    arguments[sev-snp-guest].author-key-enabled: Optional<boolean>
>  +    arguments[sev-snp-guest].cbitpos: Optional<integer>
>
>
> Where '.' is just pre-existing context, and +/- have the obvious
> meaning for the 2 given versions.
>
> What if, we append a version number to *every* line, and exclusively
> use +/-.
>
> Taking just one small command:
>
>  + 6.2.0: x-query-rdma
>  + 6.2.0:    returns.human-readable-text: str
>  - 9.1.0: x-query-rdma
>
> This tell us 'x-query-rdma' was added in 6.2.0, the
> 'human-readable-text' parameter arrived at the same
> time, and the whole command was then deleted in 9.1.0
> That has implicit property deletion, but for completeness
> we could be explicit about each property when deleting
> a command:
>
>  + 6.2.0: x-query-rdma
>  + 6.2.0:    returns.human-readable-text: str
>  - 9.1.0:    returns.human-readable-text: str
>  - 9.1.0: x-query-rdma
>
> Taking the more complex 'object-add' command
>
>  +  2.0.0: object-add
>  +  2.0.0:   arguments.qom-type: enum
>  +  2.0.0:     '....'
>  + 2.11.0:     'sev-guest'
>  +  9.1.0:     'sev-snp-guest'
>  + 2.11.0:   arguments[sev-guest].policy: uint32
>  + 2.11.0:   arguments[sev-guest].session-file: str
>  + 2.11.0:   arguments[sev-guest].dh-cert: str
>  +  9.1.0:   arguments[sev-guest].legacy-vm-type: Optional<boolean>
>  +  9.1.0:   arguments[sev-snp-guest].author-key-enabled: Optional<boolean>
>  +  9.1.0:   arguments[sev-snp-guest].cbitpos: Optional<integer>
>
>
> IOW, object-add was introduced in 2.0.0. The 'sev-guest' enum
> variant was added in 2.11.0 with various fields at the same
> time. The 'sev-guest' enum variant got an exctra field in 9.1.0
> The 'sev-snp-guest' enum variant was added in 9.1.0 with some
> fields.
>
>
> For fields which change from Optional <-> Required, that could
> be modelled simply as parameter deletion + addition in the
> same version eg hypothetically lets say the 'sev-guest' field
> 'policy' had changed, we would see:
>
>  +  2.0.0: object-add
>  +  2.0.0:   arguments.qom-type: enum
>  +  2.0.0:     '....'
>  + 2.11.0:     'sev-guest'
>  +  9.1.0:     'sev-snp-guest'
>  + 2.11.0:   arguments[sev-guest].policy: uint32
>  -  6.2.0:   arguments[sev-guest].policy: uint32
>  +  6.2.0:   arguments[sev-guest].policy: Optional<uint32>
>  + 2.11.0:   arguments[sev-guest].session-file: str
>  + 2.11.0:   arguments[sev-guest].dh-cert: str
>  +  9.1.0:   arguments[sev-guest].legacy-vm-type: Optional<boolean>
>  +  9.1.0:   arguments[sev-snp-guest].author-key-enabled: Optional<boolean>
>  +  9.1.0:   arguments[sev-snp-guest].cbitpos: Optional<integer>
>
>
Very interesting idea, I think this could be a reasonable compromise. I'll
have to spend some time prototyping it (and digesting your other mail,
too), but tentatively, I like the idea. Thanks a lot for really digging
into both of these mails to give your feedback on this subproject.

(IOW: I think I like it, but haven't sat with it enough to really know if
there's anything it doesn't do that I need it to do. Prototyping it will
tell me. One concern I might have is that I'll need some custom code to
compare a QAPISchema object against the stored history file in order to
amend that history file. I'm not sure how complex that will be at present,
but admit my current solution is very egregious with regards to SLOC in the
git repo. And it's not as if the JSON Schema writing/reading code I
prototyped is particularly short, either.)


>
> Incidentally, if going down this route, I think I would NOT
> have 1 file with the whole schema history, but have 1 file
> per command / event. eg qapi/history/object-add.txt,
> qapi/history/x-query-rdma.txt, qapi/history/VFIO_MIGRATION.txt,
> etc. This will make it trivial for a person to focus in on
> changes in the command they care about, likely without even
> needing a schema diff tool much of the time, as the per-command
> files will often be concise enough you can consider the full
> history without filtering.
>

Interesting idea... might be a lot of files, but I suppose those don't
really *cost* anything, now do they? :)

I guess you lose out on a good summary, but a tool can just parse
qapi/history/*.txt or whatnot and concatenate the results to stdout for
you; I suppose it'd be little more than `cat qapi/history/*.txt | grep
"9\.0\.0"` or similar.


>
> > (3) A script that can diff two compiled schema, showing a change report
> > between two versions. (I sent an email earlier today/yesterday showing
> > example output of this script.) This one was more for "fun", but it
> helped
> > prove all the other parts were working correctly, and it might be useful
> in
> > the future when auditing changes during the RC phase. We may well decide
> to
> > commit this script upstream, or one like it.
>
> With a single file containing all deltas, where each line is
> version annotated, the "diff" tool becomes little more than
> something which can 'grep' for lines in the file which have
> a version number within the desired range. In fact it can also
> optionally offer something better than a diff, as instead of
> showing you only the orignal state and result state, it
> can trivially shows you any intermediate changes and what
> version they happened with.
>
> eg if you asked for a diff between 2.0.0 and 9.1.0, and there
> was a command or property that was added in 4.0.0 and deleted
> in 6.0.0, a traditional diff will not tell you about this. You'll
> never notice it ever existed.
>
> A "history grep" showing the set of changes between 2 versions
> will highlight things that come + go, which can be quite
> useful for understanding API evolution I think.
>

Good point. The existing diff tool I wrote was just a prototype to prove
"this sort of thing was possible", but I didn't put much thought into its
design beyond "It was quick to write as a proof of concept".

Maximizing this information's utility for use with existing utilities
without needing to maintain lots of our own script code is a great design
goal to keep in mind.


>
>
>
> With regards,
> Daniel
> --
> |: https://berrange.com      -o-
> https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-
> https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-
> https://www.instagram.com/dberrange :|
>

Thanks again! I'm going to re-focus on some of the more immediate changes
for the documentation project for now, but I'll no doubt be returning to
the historical parsing / since information subproject before too long --
just didn't want to sit on your email for too long so as to appear
ungrateful ;)

I'll loop you into future discussions on this subproject when I pick it
back up (Hopefully, not too far in the future.) -- and I'll make sure to
keep it on-list. Markus and I haven't gone too in-depth on this part yet,
so I figure I'll pick the prototyping back up when he's chewed through more
of my other patches and all of the Maintainers that need to care about this
are paying active attention. (Sorry for all the patches, Markus... You
asked for it!)

--js

[-- Attachment #2: Type: text/html, Size: 13328 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-06-19  0:57 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-13  6:13 Historical QAPI schema parser, "compiled schema", and qapi-schema-diff John Snow
2024-06-13 16:12 ` Daniel P. Berrangé
2024-06-19  0:55   ` John Snow

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).