From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B5991C71136 for ; Mon, 16 Jun 2025 21:33:08 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1uRHRU-0007C4-R0; Mon, 16 Jun 2025 17:32:36 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1uRHRN-0007BP-Sz for qemu-devel@nongnu.org; Mon, 16 Jun 2025 17:32:30 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1uRHRK-0007ck-A0 for qemu-devel@nongnu.org; Mon, 16 Jun 2025 17:32:29 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1750109544; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=4t6LMhSW1GeanY0J4zTRn50DJbKVMzUSDICiTrenvn8=; b=SNRD1mb4eNfnOBjAAXo81c+hjW8uZWh27xskNZUmN0gh9DHIv4xYBllfuSArADpBZWZtcT t3C9f+4jV1NhEJpvk4Usb89Xrw4p4/EdfUlrQyzWevq+ROCHqhSftj8w5o2qsXmexcmZw7 EsQA4ddX/AQG8LU50HVRTgkuFe16Wno= Received: from mail-pj1-f69.google.com (mail-pj1-f69.google.com [209.85.216.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-673--hWWIr7YPDudhyVjNi76cA-1; Mon, 16 Jun 2025 17:32:23 -0400 X-MC-Unique: -hWWIr7YPDudhyVjNi76cA-1 X-Mimecast-MFC-AGG-ID: -hWWIr7YPDudhyVjNi76cA_1750109542 Received: by mail-pj1-f69.google.com with SMTP id 98e67ed59e1d1-3141f9ce4e2so1059453a91.1 for ; Mon, 16 Jun 2025 14:32:22 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750109541; x=1750714341; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=4t6LMhSW1GeanY0J4zTRn50DJbKVMzUSDICiTrenvn8=; b=kbgmutUk7S41KssXGDXVE5h2BOQJMNjXtSZTfiTSRaE9LIJcZGo/aSdUhuZafCVj0/ wrJk8MzOuFdFEGx9Ugvl6jT1zw3uc7Y4ZbQHo6EtaInVo35QqmNytmZJFwfU/2fjy25R He0Xvy+tTcq5CQ9Fb1dNmI5PD+ylR7knKVHtwAmCF8Mtqdowc3PvhMpWFaltPKuvExxU nwINI/12W8W185b+/Z9VXqpbHVpHjwMKRJLFHtYZBT3TSJNzhzlEcHpMblrTHpaHen2U 8SHP+RmnkbZLop2kE7LQO1nFN7fKlGGIAQuz1o9dZYiy1Hd6tElb/h2VASXNhbOkvKyl 7Esw== X-Gm-Message-State: AOJu0Yyhv5x0cr1iGuMeye9ZTcpNc5u9sig161fsxDXuvCsIPz2bO7UM /x+tCW/F5twvoIdIJIC1vp2XqReZmIC10UgJ2S7pb+BA/6MVqS2NcTGUvv6tSb8+s4AXSxVBp/f 5z28c+Bb+pFtSZ0hGBTq3Mis+0khhsJN60eXXgWTH3WNHVnZYDcBJPG+m+ftJPIyxVcE/FXx5Dv dU+WBZwQLAFGyTBZtp1raXN08VM7Fhipqb6u3mhgiAUw== X-Gm-Gg: ASbGncvw4IzefVqIR5EMCMwuJij7i9E/mDT60NM1z/g/PcOrJsV41xpsjNTHgHVY3me SAAb+zVEN5U3MTVYngUf+wCSrPjG+KI1J8v9DFrVO32hxSYM0OFxDFNDP8clDMzXEsUkAxwBdPO AWxwU0UMOkxCHEw7evLXQppmsVJEWa7CAFyOw= X-Received: by 2002:a17:90b:3ec5:b0:311:e8cc:4248 with SMTP id 98e67ed59e1d1-313f1e51737mr19127700a91.33.1750109541350; Mon, 16 Jun 2025 14:32:21 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEnsjJxSV3qwmEqzq6CDLSvFWGRIBi2vuQbZopaUfeAqL8OfAIEzOAfHdMwgtW/zcdD96Bcb8b9ThUu0QgShiw= X-Received: by 2002:a17:90b:3ec5:b0:311:e8cc:4248 with SMTP id 98e67ed59e1d1-313f1e51737mr19127667a91.33.1750109540949; Mon, 16 Jun 2025 14:32:20 -0700 (PDT) MIME-Version: 1.0 References: <20250616211604.1399219-1-jsnow@redhat.com> In-Reply-To: <20250616211604.1399219-1-jsnow@redhat.com> From: John Snow Date: Mon, 16 Jun 2025 17:32:08 -0400 X-Gm-Features: AX0GCFta3gawThzRKNaWLB1a4xsJa2t_j9VkW1_WaV3X4Tq6MxOFq-_334B6yv8 Message-ID: Subject: Re: [PATCH] docs/contrib: add insert_crossrefs script To: qemu-devel@nongnu.org, armbru@redhat.com, eblake@redhat.com Content-Type: multipart/alternative; boundary="00000000000081cf5f0637b72122" Received-SPF: pass client-ip=170.10.129.124; envelope-from=jsnow@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -39 X-Spam_score: -4.0 X-Spam_bar: ---- X-Spam_report: (-4.0 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-1.892, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org --00000000000081cf5f0637b72122 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Markus, Eric: Some commentary and additional information below. I did not polish this script as I believe it's hacky enough that covering all of the edge cases, testing and documentation is more effort than it's worth, but I still signed off on it in case someone wanted to "adopt it". My intent here is really just to advertise "Here's how I wrote that series" and give you opportunities to spot problems with the programmatic conversion before I send out my v2 so I can keep the email bombs to a minimum. My as-of-now-unsent v2 includes any additional instances located by this version of the script, as well as one or two manual instances of the ignored tokens that looked appropriate to convert. Eric: Thank you for diving into the series, I appreciate it. On Mon, Jun 16, 2025 at 5:16=E2=80=AFPM John Snow wrote: > This isn't really meant for inclusion as it's a bit of a hackjob, but I > figured it would be best to share it in some form or another to serve as > a basis for a kind of meta-review of the crossreferenceification series. > > This script is designed to convert 'name', "name", name, and @name > instances in qapi/*.json files to `name` for the purposes of > cross-referencing commands, events, and data types in the generated HTML > documentation. It is specifically tuned for our QAPI files and is not > suitable for running on generic rST source files. It can likely be made > to operate on QEMU guest agent or other qapi JSON files with some edits > to which files its opening. > > Navigate to your qemu/qapi/ directory and run this script with "python > insert_crossrefs.py" and it'll handle the rest. Definitely don't run it > in a non-git-controlled folder, it edits your source files. > Specifically, "python3 ../contrib/autoxref/insert_crossrefs.py" > > (Yes, in polishing this script, I found a few instances of > cross-references I missed in my v1 series. I figure I'll let us discuss > the conversion a bit before I send out a v2 patchbomb.) > > Signed-off-by: John Snow > --- > contrib/autoxref/insert_crossrefs.py | 69 ++++++++++++++++++++++++++++ > 1 file changed, 69 insertions(+) > create mode 100644 contrib/autoxref/insert_crossrefs.py > > diff --git a/contrib/autoxref/insert_crossrefs.py > b/contrib/autoxref/insert_crossrefs.py > new file mode 100644 > index 00000000000..399dd7524c2 > --- /dev/null > +++ b/contrib/autoxref/insert_crossrefs.py > @@ -0,0 +1,69 @@ > +# SPDX-License-Identifier: GPL-2.0-or-later > + > +import os > +import re > +import sys > + > +if not os.path.exists("qapi-schema.json"): > + raise Exception( > + "This script was meant to be run from the qemu.git/qapi > directory." > + ) > +sys.path.append("../scripts/") > + > +from qapi.schema import QAPISchema, QAPISchemaDefinition > + > +# Adjust this global to exclude certain tokens from being xreffed. > +SKIP_TOKENS =3D ('String', 'stop', 'transaction', 'eject', 'migrate', > 'quit') > At least *some* of these are still valid conversions, but the majority are not. You can always comment out this line and review the diff in your working tree to see what I mean. > + > +print("Compiling schema to build list of reference-able entities ...", > end=3D'') > +tokens =3D [] > +schema =3D QAPISchema("qapi-schema.json") > +for ent in schema._entity_list: > + if isinstance(ent, QAPISchemaDefinition) and not ent.is_implicit(): > + if ent.name not in SKIP_TOKENS: > + tokens.append(ent.name) > +print("OK") > + > +patt_names =3D r'(' + '|'.join(tokens) + r')' > + > +# catch 'token' and "token" specifically > +patt =3D re.compile(r'([\'"]|``)' + patt_names + r'\1') > +# catch naked instances of token, excluding those where prefixed or > +# suffixed by a quote, dash, or word character. Exclude "@" references > +# specifically to handle them elsewhere. Exclude matches, as > +# these are explicit cross-reference targets. > +patt2 =3D r"(?])" > I'm quite aware this pattern doesn't match specifically, because the suffixes and prefixes are not contextually linked. Hacky. Got the job done. Probably doesn't miss anything... > +# catch @references. prohibit when followed by ":" to exclude members > +# whose names happen to match xreffable entities. > +patt3 =3D r"@" + patt_names + r"(?![-\w:])" > Excluding "@foo:" is also kludgy, but in manual review it didn't miss anything. I'm sure there's some big-brained way to not need three separate patterns, but I refuse to learn regex any better than I already have so I have some brain space left to admire flowers and birds. > + > + > + > + > +for file in os.scandir(): > + outlines =3D [] > + if not file.name.endswith(".json"): > + continue > + print(f"Scanning {file.name} ...") > + with open(file.name) as searchfile: > + block_start =3D False > + for line in searchfile: > + # Don't mess with the start of doc blocks. > + # We don't want to convert "# @name:" to a reference! > + if block_start and line.startswith('# @'): > + outlines.append(line) > + continue > + block_start =3D bool(line.startswith('##')) > Similarly, I'm sure I could bake these ad-hoc conditions into the regexes themselves, but it's harder and makes the expressions uglier. For a script that only needs to be run once, whatever. > + > + # Don't mess with anything outside of comment blocks, > + # and don't mess with example blocks. We use five spaces > + # as a heuristic for detecting example blocks. It's not > perfect, > + # but it seemingly does the job well. > + if line.startswith('# ') and not line.startswith('# '): > + line =3D re.sub(patt, r'`\2`', line) > + line =3D re.sub(patt2, r'`\1`', line) > + line =3D re.sub(patt3, r'`\1`', line) > + outlines.append(line) > + with open(file.name, "w") as outfile: > + for line in outlines: > + outfile.write(line) > -- > 2.48.1 Thanks! --00000000000081cf5f0637b72122 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Markus, Eric: Some commentary and addition= al information below.

I did not polish this script= as I believe it's hacky enough that=20 covering all of the edge cases, testing and documentation is more effort than it's worth, but I still signed off on it in case someone wanted t= o "adopt it". My intent here is really just to advertise "Here= 's how I=20 wrote that series" and give you opportunities to spot problems with th= e=20 programmatic conversion before I send out my v2 so I can keep the email=20 bombs to a minimum.

My as-of-now-unsent v2 include= s any additional instances located by this version of the script, as well a= s one or two manual instances of the ignored tokens that looked appropriate= to convert.

Eric: Thank you for diving into the s= eries, I appreciate it.

On Mon, Jun 16, 202= 5 at 5:16=E2=80=AFPM John Snow <jsno= w@redhat.com> wrote:
This isn't really meant for inclusion as it's a bit of = a hackjob, but I
figured it would be best to share it in some form or another to serve as a basis for a kind of meta-review of the crossreferenceification series.
This script is designed to convert 'name', "name", name, = and @name
instances in qapi/*.json files to `name` for the purposes of
cross-referencing commands, events, and data types in the generated HTML documentation. It is specifically tuned for our QAPI files and is not
suitable for running on generic rST source files. It can likely be made
to operate on QEMU guest agent or other qapi JSON files with some edits
to which files its opening.

Navigate to your qemu/qapi/ directory and run this script with "python=
insert_crossrefs.py" and it'll handle the rest. Definitely don'= ;t run it
in a non-git-controlled folder, it edits your source files.

Specifically, "python3 ../contrib/autoxref/insert= _crossrefs.py"
=C2=A0

(Yes, in polishing this script, I found a few instances of
cross-references I missed in my v1 series. I figure I'll let us discuss=
the conversion a bit before I send out a v2 patchbomb.)

Signed-off-by: John Snow <jsnow@redhat.com>
---
=C2=A0contrib/autoxref/insert_crossrefs.py | 69 +++++++++++++++++++++++++++= +
=C2=A01 file changed, 69 insertions(+)
=C2=A0create mode 100644 contrib/autoxref/insert_crossrefs.py

diff --git a/contrib/autoxref/insert_crossrefs.py b/contrib/autoxref/insert= _crossrefs.py
new file mode 100644
index 00000000000..399dd7524c2
--- /dev/null
+++ b/contrib/autoxref/insert_crossrefs.py
@@ -0,0 +1,69 @@
+# SPDX-License-Identifier: GPL-2.0-or-later
+
+import os
+import re
+import sys
+
+if not os.path.exists("qapi-schema.json"):
+=C2=A0 =C2=A0 raise Exception(
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 "This script was meant to be run from the= qemu.git/qapi directory."
+=C2=A0 =C2=A0 )
+sys.path.append("../scripts/")
+
+from qapi.schema import QAPISchema, QAPISchemaDefinition
+
+# Adjust this global to exclude certain tokens from being xreffed.
+SKIP_TOKENS =3D ('String', 'stop', 'transaction', = 'eject', 'migrate', 'quit')
At least *some* of these are still valid conversions, but the = majority are not. You can always comment out this line and review the diff = in your working tree to see what I mean.
=C2=A0
+
+print("Compiling schema to build list of reference-able entities ...&= quot;, end=3D'')
+tokens =3D []
+schema =3D QAPISchema("qapi-schema.json")
+for ent in schema._entity_list:
+=C2=A0 =C2=A0 if isinstance(ent, QAPISchemaDefinition) and not ent.is_impl= icit():
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 if ent.name not in SKIP_TOKENS:
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 tokens.append(ent.name)
+print("OK")
+
+patt_names =3D r'(' + '|'.join(tokens) + r')'
+
+# catch 'token' and "token" specifically
+patt =3D re.compile(r'([\'"]|``)' + patt_names + r'\1= ')
+# catch naked instances of token, excluding those where prefixed or
+# suffixed by a quote, dash, or word character. Exclude "@" refe= rences
+# specifically to handle them elsewhere. Exclude <name> matches, as<= br> +# these are explicit cross-reference targets.
+patt2 =3D r"(?<![-@`'\"\w<])" + patt_names + r&qu= ot;(?![-`'\"\w>])"

I&#= 39;m quite aware this pattern doesn't match <token> specifically,= because the suffixes and prefixes are not contextually linked. Hacky. Got = the job done. Probably doesn't miss anything...
=C2=A0
<= blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-l= eft:1px solid rgb(204,204,204);padding-left:1ex"> +# catch @references. prohibit when followed by ":" to exclude me= mbers
+# whose names happen to match xreffable entities.
+patt3 =3D r"@" + patt_names + r"(?![-\w:])"

Excluding "@foo:" is also kludgy, but i= n manual review it didn't miss anything.

I'= ;m sure there's some big-brained way to not need three separate pattern= s, but I refuse to learn regex any better than I already have so I have som= e brain space left to admire flowers and birds.
=C2=A0
+
+
+
+
+for file in os.scandir():
+=C2=A0 =C2=A0 outlines =3D []
+=C2=A0 =C2=A0 if not file.name.endswith(".json"):
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 continue
+=C2=A0 =C2=A0 print(f"Scanning {file.name} ...")
+=C2=A0 =C2=A0 with open(file.name) as searchfile:
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 block_start =3D False
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 for line in searchfile:
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 # Don't mess with the start = of doc blocks.
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 # We don't want to convert &= quot;# @name:" to a reference!
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if block_start and line.startswi= th('# @'):
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 outlines.append(li= ne)
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 continue
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 block_start =3D bool(line.starts= with('##'))

Similarly, I'm = sure I could bake these ad-hoc conditions into the regexes themselves, but = it's harder and makes the expressions uglier. For a script that only ne= eds to be run once, whatever.
=C2=A0
+
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 # Don't mess with anything o= utside of comment blocks,
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 # and don't mess with exampl= e blocks. We use five spaces
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 # as a heuristic for detecting e= xample blocks. It's not perfect,
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 # but it seemingly does the job = well.
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if line.startswith('# ')= and not line.startswith('#=C2=A0 =C2=A0 =C2=A0'):
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 line =3D re.sub(pa= tt, r'`\2`', line)
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 line =3D re.sub(pa= tt2, r'`\1`', line)
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 line =3D re.sub(pa= tt3, r'`\1`', line)
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 outlines.append(line)
+=C2=A0 =C2=A0 with open(file.name, "w") as outfile:
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 for line in outlines:
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 outfile.write(line)
--
2.48.1

Thanks!
--00000000000081cf5f0637b72122--