Re: [PATCH] docs/contrib: add insert_crossrefs script

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: John Snow <jsnow@redhat.com>
To: qemu-devel@nongnu.org, armbru@redhat.com, eblake@redhat.com
Subject: Re: [PATCH] docs/contrib: add insert_crossrefs script
Date: Mon, 16 Jun 2025 17:32:08 -0400	[thread overview]
Message-ID: <CAFn=p-YxYriosFwOu5Nk0cYnCb0ffazai_JSa2KDSAANiGPw=Q@mail.gmail.com> (raw)
In-Reply-To: <20250616211604.1399219-1-jsnow@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 6267 bytes --]

Markus, Eric: Some commentary and additional information below.

I did not polish this script as I believe it's hacky enough that covering
all of the edge cases, testing and documentation is more effort than it's
worth, but I still signed off on it in case someone wanted to "adopt it".
My intent here is really just to advertise "Here's how I wrote that series"
and give you opportunities to spot problems with the programmatic
conversion before I send out my v2 so I can keep the email bombs to a
minimum.

My as-of-now-unsent v2 includes any additional instances located by this
version of the script, as well as one or two manual instances of the
ignored tokens that looked appropriate to convert.

Eric: Thank you for diving into the series, I appreciate it.

On Mon, Jun 16, 2025 at 5:16 PM John Snow <jsnow@redhat.com> wrote:

> This isn't really meant for inclusion as it's a bit of a hackjob, but I
> figured it would be best to share it in some form or another to serve as
> a basis for a kind of meta-review of the crossreferenceification series.
>
> This script is designed to convert 'name', "name", name, and @name
> instances in qapi/*.json files to `name` for the purposes of
> cross-referencing commands, events, and data types in the generated HTML
> documentation. It is specifically tuned for our QAPI files and is not
> suitable for running on generic rST source files. It can likely be made
> to operate on QEMU guest agent or other qapi JSON files with some edits
> to which files its opening.
>
> Navigate to your qemu/qapi/ directory and run this script with "python
> insert_crossrefs.py" and it'll handle the rest. Definitely don't run it
> in a non-git-controlled folder, it edits your source files.
>

Specifically, "python3 ../contrib/autoxref/insert_crossrefs.py"


>
> (Yes, in polishing this script, I found a few instances of
> cross-references I missed in my v1 series. I figure I'll let us discuss
> the conversion a bit before I send out a v2 patchbomb.)
>
> Signed-off-by: John Snow <jsnow@redhat.com>
>
---
>  contrib/autoxref/insert_crossrefs.py | 69 ++++++++++++++++++++++++++++
>  1 file changed, 69 insertions(+)
>  create mode 100644 contrib/autoxref/insert_crossrefs.py
>
> diff --git a/contrib/autoxref/insert_crossrefs.py
> b/contrib/autoxref/insert_crossrefs.py
> new file mode 100644
> index 00000000000..399dd7524c2
> --- /dev/null
> +++ b/contrib/autoxref/insert_crossrefs.py
> @@ -0,0 +1,69 @@
> +# SPDX-License-Identifier: GPL-2.0-or-later
> +
> +import os
> +import re
> +import sys
> +
> +if not os.path.exists("qapi-schema.json"):
> +    raise Exception(
> +        "This script was meant to be run from the qemu.git/qapi
> directory."
> +    )
> +sys.path.append("../scripts/")
> +
> +from qapi.schema import QAPISchema, QAPISchemaDefinition
> +
> +# Adjust this global to exclude certain tokens from being xreffed.
> +SKIP_TOKENS = ('String', 'stop', 'transaction', 'eject', 'migrate',
> 'quit')
>

At least *some* of these are still valid conversions, but the majority are
not. You can always comment out this line and review the diff in your
working tree to see what I mean.


> +
> +print("Compiling schema to build list of reference-able entities ...",
> end='')
> +tokens = []
> +schema = QAPISchema("qapi-schema.json")
> +for ent in schema._entity_list:
> +    if isinstance(ent, QAPISchemaDefinition) and not ent.is_implicit():
> +        if ent.name not in SKIP_TOKENS:
> +            tokens.append(ent.name)
> +print("OK")
> +
> +patt_names = r'(' + '|'.join(tokens) + r')'
> +
> +# catch 'token' and "token" specifically
> +patt = re.compile(r'([\'"]|``)' + patt_names + r'\1')
> +# catch naked instances of token, excluding those where prefixed or
> +# suffixed by a quote, dash, or word character. Exclude "@" references
> +# specifically to handle them elsewhere. Exclude <name> matches, as
> +# these are explicit cross-reference targets.
> +patt2 = r"(?<![-@`'\"\w<])" + patt_names + r"(?![-`'\"\w>])"
>

I'm quite aware this pattern doesn't match <token> specifically, because
the suffixes and prefixes are not contextually linked. Hacky. Got the job
done. Probably doesn't miss anything...


> +# catch @references. prohibit when followed by ":" to exclude members
> +# whose names happen to match xreffable entities.
> +patt3 = r"@" + patt_names + r"(?![-\w:])"
>

Excluding "@foo:" is also kludgy, but in manual review it didn't miss
anything.

I'm sure there's some big-brained way to not need three separate patterns,
but I refuse to learn regex any better than I already have so I have some
brain space left to admire flowers and birds.


> +
> +
> +
> +
> +for file in os.scandir():
> +    outlines = []
> +    if not file.name.endswith(".json"):
> +        continue
> +    print(f"Scanning {file.name} ...")
> +    with open(file.name) as searchfile:
> +        block_start = False
> +        for line in searchfile:
> +            # Don't mess with the start of doc blocks.
> +            # We don't want to convert "# @name:" to a reference!
> +            if block_start and line.startswith('# @'):
> +                outlines.append(line)
> +                continue
> +            block_start = bool(line.startswith('##'))
>

Similarly, I'm sure I could bake these ad-hoc conditions into the regexes
themselves, but it's harder and makes the expressions uglier. For a script
that only needs to be run once, whatever.


> +
> +            # Don't mess with anything outside of comment blocks,
> +            # and don't mess with example blocks. We use five spaces
> +            # as a heuristic for detecting example blocks. It's not
> perfect,
> +            # but it seemingly does the job well.
> +            if line.startswith('# ') and not line.startswith('#     '):
> +                line = re.sub(patt, r'`\2`', line)
> +                line = re.sub(patt2, r'`\1`', line)
> +                line = re.sub(patt3, r'`\1`', line)
> +            outlines.append(line)
> +    with open(file.name, "w") as outfile:
> +        for line in outlines:
> +            outfile.write(line)
> --
> 2.48.1


Thanks!

[-- Attachment #2: Type: text/html, Size: 8912 bytes --]

     prev parent reply	other threads:[~2025-06-16 21:33 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-16 21:16 [PATCH] docs/contrib: add insert_crossrefs script John Snow
2025-06-16 21:32 ` John Snow [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFn=p-YxYriosFwOu5Nk0cYnCb0ffazai_JSa2KDSAANiGPw=Q@mail.gmail.com' \
    --to=jsnow@redhat.com \
    --cc=armbru@redhat.com \
    --cc=eblake@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).