* Historical QAPI schema parser, "compiled schema", and qapi-schema-diff @ 2024-06-13 6:13 John Snow 2024-06-13 16:12 ` Daniel P. Berrangé 0 siblings, 1 reply; 3+ messages in thread From: John Snow @ 2024-06-13 6:13 UTC (permalink / raw) To: Markus Armbruster; +Cc: qemu-devel, Victor Toso de Carvalho [-- Attachment #1: Type: text/plain, Size: 3875 bytes --] Hi, recently I've been working on overhauling our QMP documentation; see https://jsnow.gitlab.io/qemu/qapi/index.html for a recent work-in-progress page showcasing this. As part of this project, Markus and I decided it'd be nice to be able to auto-generate "Since" information. The short reason for 'why' is because since info hard-coded into doc comments may not be accurate with regards to the wire protocol availability for a given field when a QAPI definition is shared or inherited by multiple sources. If we can generate it, it should always be accurate. So, I've prototyped three things: (1) An out-of-tree fork of the QAPI generator that is capable of parsing qemu-commands.hx, qmp-commands.hx, and all versions of our qapi-schema.json files going all the way back to v0.12.0. It accomplishes this with some fairly brutish hacks that I never expect to need to check in to qemu.git. (2) A schema "compiler", a QAPI generator module that takes a parsed Schema and produces a single-file JSON Schema document that describes every command and event as it exists on the wire without using type names or any information not considered to be "API". This part *would* need to be checked in to qemu.git (if we go in this direction.) The compiled historical schema would also get checked in, for the QAPI parser to reference against to generate the since information. (Or, some kind of meta-compiled document with just the since information. Either way; the idea is that we'll catalog the output without needing to commit the parser compatibility hacks.) (3) A script that can diff two compiled schema, showing a change report between two versions. (I sent an email earlier today/yesterday showing example output of this script.) This one was more for "fun", but it helped prove all the other parts were working correctly, and it might be useful in the future when auditing changes during the RC phase. We may well decide to commit this script upstream, or one like it. All of those things are here: https://gitlab.com/jsnow/externalized-qapi I'm sharing this in its out-of-tree form mostly for Markus's sake as we debate the pros/cons of various choices I've made in this prototype, but you're welcome to peep the early discussions if you'd like, too. Notes: 1. If you want to try "compiling" schema yourself, clone the git repo and install it with "pip install .". Navigate to your qemu.git root and check out a release tag (such as v0.12.0 or v1.0 or v9.0.0) and then run "qapi-compile". (If your git tags are "weird", this might break. Sorry about that, it's a prototype... the hacky code that uses "git describe" is in qapi/compat.py if you run into troubles and wanna mess around with it.) 2. The "qapi compiler" makes use of schema addendum files for some old versions to produce correct output. You can browse them on gitlab here: https://gitlab.com/jsnow/externalized-qapi/-/tree/main/qapi/schemata?ref_type=heads There are addendum files for v0.12.0 through v2.0.0. Other "errata" are handled in code; no errata of any kind are needed in v2.8.0 or later. 3. If you don't wanna run the compiler yourself (or it broke because it's a real hackjob), I compiled all of the historical QAPI schema myself and checked them into the repo here: https://gitlab.com/jsnow/externalized-qapi/-/tree/main/compiled?ref_type=heads 4. You can diff any two compiled schema with "qapi-schema-diff A.json B.json". Put the earlier version first. 5. qapi-compile and qapi-schema-diff don't yet support "if" and "features" everywhere they should, but everything else should work correctly. 6. The commit history for this repo is actually pretty well factored; each compatibility hack for the QAPI parser has its own commit, so it's easy to suss out what work was required to make this work. I'm about to head off on a long weekend, I'll be back Tuesday. Have fun, --js [-- Attachment #2: Type: text/html, Size: 4988 bytes --] ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Historical QAPI schema parser, "compiled schema", and qapi-schema-diff 2024-06-13 6:13 Historical QAPI schema parser, "compiled schema", and qapi-schema-diff John Snow @ 2024-06-13 16:12 ` Daniel P. Berrangé 2024-06-19 0:55 ` John Snow 0 siblings, 1 reply; 3+ messages in thread From: Daniel P. Berrangé @ 2024-06-13 16:12 UTC (permalink / raw) To: John Snow; +Cc: Markus Armbruster, qemu-devel, Victor Toso de Carvalho On Thu, Jun 13, 2024 at 02:13:15AM -0400, John Snow wrote: > Hi, recently I've been working on overhauling our QMP documentation; see > https://jsnow.gitlab.io/qemu/qapi/index.html for a recent work-in-progress > page showcasing this. > > As part of this project, Markus and I decided it'd be nice to be able to > auto-generate "Since" information. The short reason for 'why' is because > since info hard-coded into doc comments may not be accurate with regards to > the wire protocol availability for a given field when a QAPI definition is > shared or inherited by multiple sources. If we can generate it, it should > always be accurate. > > So, I've prototyped three things: > > (1) An out-of-tree fork of the QAPI generator that is capable of parsing > qemu-commands.hx, qmp-commands.hx, and all versions of our qapi-schema.json > files going all the way back to v0.12.0. > > It accomplishes this with some fairly brutish hacks that I never expect to > need to check in to qemu.git. > > (2) A schema "compiler", a QAPI generator module that takes a parsed Schema > and produces a single-file JSON Schema document that describes every > command and event as it exists on the wire without using type names or any > information not considered to be "API". > > This part *would* need to be checked in to qemu.git (if we go in this > direction.) > The compiled historical schema would also get checked in, for the QAPI > parser to reference against to generate the since information. The upside with checking in every historical schema is that we have a set of self-contained schemas where you can see everything at a glance for each version. The downside with checking in every historical schema is that between any adjacent pair of schemas 99% of the content is identical. IOW we are very wasteful of storage. Looking at your other mail about schema diffs, I wonder if we the diff format you show there can kill two birds with one stone. https://lists.nongnu.org/archive/html/qemu-devel/2024-06/msg02398.html In my reply I had illustrated a variant of your format: - x-query-rdma - returns.human-readable-text: str . blockdev-backup + arguments.discard-source: Optional<boolean> . migrate - arguments.blk: Optional<boolean> - arguments.inc: Optional<boolean> . object-add . arguments.qom-type: enum + 'sev-snp-guest' + arguments[sev-guest].legacy-vm-type: Optional<boolean> + arguments[sev-snp-guest].author-key-enabled: Optional<boolean> + arguments[sev-snp-guest].cbitpos: Optional<integer> Where '.' is just pre-existing context, and +/- have the obvious meaning for the 2 given versions. What if, we append a version number to *every* line, and exclusively use +/-. Taking just one small command: + 6.2.0: x-query-rdma + 6.2.0: returns.human-readable-text: str - 9.1.0: x-query-rdma This tell us 'x-query-rdma' was added in 6.2.0, the 'human-readable-text' parameter arrived at the same time, and the whole command was then deleted in 9.1.0 That has implicit property deletion, but for completeness we could be explicit about each property when deleting a command: + 6.2.0: x-query-rdma + 6.2.0: returns.human-readable-text: str - 9.1.0: returns.human-readable-text: str - 9.1.0: x-query-rdma Taking the more complex 'object-add' command + 2.0.0: object-add + 2.0.0: arguments.qom-type: enum + 2.0.0: '....' + 2.11.0: 'sev-guest' + 9.1.0: 'sev-snp-guest' + 2.11.0: arguments[sev-guest].policy: uint32 + 2.11.0: arguments[sev-guest].session-file: str + 2.11.0: arguments[sev-guest].dh-cert: str + 9.1.0: arguments[sev-guest].legacy-vm-type: Optional<boolean> + 9.1.0: arguments[sev-snp-guest].author-key-enabled: Optional<boolean> + 9.1.0: arguments[sev-snp-guest].cbitpos: Optional<integer> IOW, object-add was introduced in 2.0.0. The 'sev-guest' enum variant was added in 2.11.0 with various fields at the same time. The 'sev-guest' enum variant got an exctra field in 9.1.0 The 'sev-snp-guest' enum variant was added in 9.1.0 with some fields. For fields which change from Optional <-> Required, that could be modelled simply as parameter deletion + addition in the same version eg hypothetically lets say the 'sev-guest' field 'policy' had changed, we would see: + 2.0.0: object-add + 2.0.0: arguments.qom-type: enum + 2.0.0: '....' + 2.11.0: 'sev-guest' + 9.1.0: 'sev-snp-guest' + 2.11.0: arguments[sev-guest].policy: uint32 - 6.2.0: arguments[sev-guest].policy: uint32 + 6.2.0: arguments[sev-guest].policy: Optional<uint32> + 2.11.0: arguments[sev-guest].session-file: str + 2.11.0: arguments[sev-guest].dh-cert: str + 9.1.0: arguments[sev-guest].legacy-vm-type: Optional<boolean> + 9.1.0: arguments[sev-snp-guest].author-key-enabled: Optional<boolean> + 9.1.0: arguments[sev-snp-guest].cbitpos: Optional<integer> Incidentally, if going down this route, I think I would NOT have 1 file with the whole schema history, but have 1 file per command / event. eg qapi/history/object-add.txt, qapi/history/x-query-rdma.txt, qapi/history/VFIO_MIGRATION.txt, etc. This will make it trivial for a person to focus in on changes in the command they care about, likely without even needing a schema diff tool much of the time, as the per-command files will often be concise enough you can consider the full history without filtering. > (3) A script that can diff two compiled schema, showing a change report > between two versions. (I sent an email earlier today/yesterday showing > example output of this script.) This one was more for "fun", but it helped > prove all the other parts were working correctly, and it might be useful in > the future when auditing changes during the RC phase. We may well decide to > commit this script upstream, or one like it. With a single file containing all deltas, where each line is version annotated, the "diff" tool becomes little more than something which can 'grep' for lines in the file which have a version number within the desired range. In fact it can also optionally offer something better than a diff, as instead of showing you only the orignal state and result state, it can trivially shows you any intermediate changes and what version they happened with. eg if you asked for a diff between 2.0.0 and 9.1.0, and there was a command or property that was added in 4.0.0 and deleted in 6.0.0, a traditional diff will not tell you about this. You'll never notice it ever existed. A "history grep" showing the set of changes between 2 versions will highlight things that come + go, which can be quite useful for understanding API evolution I think. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Historical QAPI schema parser, "compiled schema", and qapi-schema-diff 2024-06-13 16:12 ` Daniel P. Berrangé @ 2024-06-19 0:55 ` John Snow 0 siblings, 0 replies; 3+ messages in thread From: John Snow @ 2024-06-19 0:55 UTC (permalink / raw) To: Daniel P. Berrangé Cc: Markus Armbruster, qemu-devel, Victor Toso de Carvalho [-- Attachment #1: Type: text/plain, Size: 10345 bytes --] On Thu, Jun 13, 2024 at 12:12 PM Daniel P. Berrangé <berrange@redhat.com> wrote: > On Thu, Jun 13, 2024 at 02:13:15AM -0400, John Snow wrote: > > Hi, recently I've been working on overhauling our QMP documentation; see > > https://jsnow.gitlab.io/qemu/qapi/index.html for a recent > work-in-progress > > page showcasing this. > > > > As part of this project, Markus and I decided it'd be nice to be able to > > auto-generate "Since" information. The short reason for 'why' is because > > since info hard-coded into doc comments may not be accurate with regards > to > > the wire protocol availability for a given field when a QAPI definition > is > > shared or inherited by multiple sources. If we can generate it, it should > > always be accurate. > > > > So, I've prototyped three things: > > > > (1) An out-of-tree fork of the QAPI generator that is capable of parsing > > qemu-commands.hx, qmp-commands.hx, and all versions of our > qapi-schema.json > > files going all the way back to v0.12.0. > > > > It accomplishes this with some fairly brutish hacks that I never expect > to > > need to check in to qemu.git. > > > > (2) A schema "compiler", a QAPI generator module that takes a parsed > Schema > > and produces a single-file JSON Schema document that describes every > > command and event as it exists on the wire without using type names or > any > > information not considered to be "API". > > > > This part *would* need to be checked in to qemu.git (if we go in this > > direction.) > > The compiled historical schema would also get checked in, for the QAPI > > parser to reference against to generate the since information. > > The upside with checking in every historical schema is that we > have a set of self-contained schemas where you can see everything > at a glance for each version. > Yep. It's "dumb" but very easy to access and work with. > > The downside with checking in every historical schema is that between > any adjacent pair of schemas 99% of the content is identical. IOW we > are very wasteful of storage. > ... Also agree. Because these files avoid shared types as an explicit design goal, and JSON Schema is *very* verbose, these files get extremely large while saying little. I chose them for the proof of concept because they're an existing standard/format I didn't have to engineer or reason about heavily, and really no other reason. > > Looking at your other mail about schema diffs, I wonder if we the > diff format you show there can kill two birds with one stone. > > https://lists.nongnu.org/archive/html/qemu-devel/2024-06/msg02398.html > > In my reply I had illustrated a variant of your format: > > - x-query-rdma > - returns.human-readable-text: str > . blockdev-backup > + arguments.discard-source: Optional<boolean> > . migrate > - arguments.blk: Optional<boolean> > - arguments.inc: Optional<boolean> > . object-add > . arguments.qom-type: enum > + 'sev-snp-guest' > + arguments[sev-guest].legacy-vm-type: Optional<boolean> > + arguments[sev-snp-guest].author-key-enabled: Optional<boolean> > + arguments[sev-snp-guest].cbitpos: Optional<integer> > > > Where '.' is just pre-existing context, and +/- have the obvious > meaning for the 2 given versions. > > What if, we append a version number to *every* line, and exclusively > use +/-. > > Taking just one small command: > > + 6.2.0: x-query-rdma > + 6.2.0: returns.human-readable-text: str > - 9.1.0: x-query-rdma > > This tell us 'x-query-rdma' was added in 6.2.0, the > 'human-readable-text' parameter arrived at the same > time, and the whole command was then deleted in 9.1.0 > That has implicit property deletion, but for completeness > we could be explicit about each property when deleting > a command: > > + 6.2.0: x-query-rdma > + 6.2.0: returns.human-readable-text: str > - 9.1.0: returns.human-readable-text: str > - 9.1.0: x-query-rdma > > Taking the more complex 'object-add' command > > + 2.0.0: object-add > + 2.0.0: arguments.qom-type: enum > + 2.0.0: '....' > + 2.11.0: 'sev-guest' > + 9.1.0: 'sev-snp-guest' > + 2.11.0: arguments[sev-guest].policy: uint32 > + 2.11.0: arguments[sev-guest].session-file: str > + 2.11.0: arguments[sev-guest].dh-cert: str > + 9.1.0: arguments[sev-guest].legacy-vm-type: Optional<boolean> > + 9.1.0: arguments[sev-snp-guest].author-key-enabled: Optional<boolean> > + 9.1.0: arguments[sev-snp-guest].cbitpos: Optional<integer> > > > IOW, object-add was introduced in 2.0.0. The 'sev-guest' enum > variant was added in 2.11.0 with various fields at the same > time. The 'sev-guest' enum variant got an exctra field in 9.1.0 > The 'sev-snp-guest' enum variant was added in 9.1.0 with some > fields. > > > For fields which change from Optional <-> Required, that could > be modelled simply as parameter deletion + addition in the > same version eg hypothetically lets say the 'sev-guest' field > 'policy' had changed, we would see: > > + 2.0.0: object-add > + 2.0.0: arguments.qom-type: enum > + 2.0.0: '....' > + 2.11.0: 'sev-guest' > + 9.1.0: 'sev-snp-guest' > + 2.11.0: arguments[sev-guest].policy: uint32 > - 6.2.0: arguments[sev-guest].policy: uint32 > + 6.2.0: arguments[sev-guest].policy: Optional<uint32> > + 2.11.0: arguments[sev-guest].session-file: str > + 2.11.0: arguments[sev-guest].dh-cert: str > + 9.1.0: arguments[sev-guest].legacy-vm-type: Optional<boolean> > + 9.1.0: arguments[sev-snp-guest].author-key-enabled: Optional<boolean> > + 9.1.0: arguments[sev-snp-guest].cbitpos: Optional<integer> > > Very interesting idea, I think this could be a reasonable compromise. I'll have to spend some time prototyping it (and digesting your other mail, too), but tentatively, I like the idea. Thanks a lot for really digging into both of these mails to give your feedback on this subproject. (IOW: I think I like it, but haven't sat with it enough to really know if there's anything it doesn't do that I need it to do. Prototyping it will tell me. One concern I might have is that I'll need some custom code to compare a QAPISchema object against the stored history file in order to amend that history file. I'm not sure how complex that will be at present, but admit my current solution is very egregious with regards to SLOC in the git repo. And it's not as if the JSON Schema writing/reading code I prototyped is particularly short, either.) > > Incidentally, if going down this route, I think I would NOT > have 1 file with the whole schema history, but have 1 file > per command / event. eg qapi/history/object-add.txt, > qapi/history/x-query-rdma.txt, qapi/history/VFIO_MIGRATION.txt, > etc. This will make it trivial for a person to focus in on > changes in the command they care about, likely without even > needing a schema diff tool much of the time, as the per-command > files will often be concise enough you can consider the full > history without filtering. > Interesting idea... might be a lot of files, but I suppose those don't really *cost* anything, now do they? :) I guess you lose out on a good summary, but a tool can just parse qapi/history/*.txt or whatnot and concatenate the results to stdout for you; I suppose it'd be little more than `cat qapi/history/*.txt | grep "9\.0\.0"` or similar. > > > (3) A script that can diff two compiled schema, showing a change report > > between two versions. (I sent an email earlier today/yesterday showing > > example output of this script.) This one was more for "fun", but it > helped > > prove all the other parts were working correctly, and it might be useful > in > > the future when auditing changes during the RC phase. We may well decide > to > > commit this script upstream, or one like it. > > With a single file containing all deltas, where each line is > version annotated, the "diff" tool becomes little more than > something which can 'grep' for lines in the file which have > a version number within the desired range. In fact it can also > optionally offer something better than a diff, as instead of > showing you only the orignal state and result state, it > can trivially shows you any intermediate changes and what > version they happened with. > > eg if you asked for a diff between 2.0.0 and 9.1.0, and there > was a command or property that was added in 4.0.0 and deleted > in 6.0.0, a traditional diff will not tell you about this. You'll > never notice it ever existed. > > A "history grep" showing the set of changes between 2 versions > will highlight things that come + go, which can be quite > useful for understanding API evolution I think. > Good point. The existing diff tool I wrote was just a prototype to prove "this sort of thing was possible", but I didn't put much thought into its design beyond "It was quick to write as a proof of concept". Maximizing this information's utility for use with existing utilities without needing to maintain lots of our own script code is a great design goal to keep in mind. > > > > With regards, > Daniel > -- > |: https://berrange.com -o- > https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- > https://fstop138.berrange.com :| > |: https://entangle-photo.org -o- > https://www.instagram.com/dberrange :| > Thanks again! I'm going to re-focus on some of the more immediate changes for the documentation project for now, but I'll no doubt be returning to the historical parsing / since information subproject before too long -- just didn't want to sit on your email for too long so as to appear ungrateful ;) I'll loop you into future discussions on this subproject when I pick it back up (Hopefully, not too far in the future.) -- and I'll make sure to keep it on-list. Markus and I haven't gone too in-depth on this part yet, so I figure I'll pick the prototyping back up when he's chewed through more of my other patches and all of the Maintainers that need to care about this are paying active attention. (Sorry for all the patches, Markus... You asked for it!) --js [-- Attachment #2: Type: text/html, Size: 13328 bytes --] ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2024-06-19 0:57 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-06-13 6:13 Historical QAPI schema parser, "compiled schema", and qapi-schema-diff John Snow 2024-06-13 16:12 ` Daniel P. Berrangé 2024-06-19 0:55 ` John Snow
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).