From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
To: Jonathan Corbet <corbet@lwn.net>,
Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Kees Cook <kees@kernel.org>,
linux-doc@vger.kernel.org, linux-hardening@vger.kernel.org,
linux-kernel@vger.kernel.org,
"Gustavo A. R. Silva" <gustavoars@kernel.org>,
Aleksandr Loktionov <aleksandr.loktionov@intel.com>,
Randy Dunlap <rdunlap@infradead.org>,
Shuah Khan <skhan@linuxfoundation.org>,
Vincent Mailhol <mailhol@kernel.org>
Subject: Re: [PATCH v2 00/28] kernel-doc: use a C lexical tokenizer for transforms
Date: Fri, 13 Mar 2026 10:17:58 +0100 [thread overview]
Message-ID: <20260313101758.1dc691bc@foz.lan> (raw)
In-Reply-To: <cover.1773326442.git.mchehab+huawei@kernel.org>
Hi Jon,
On Thu, 12 Mar 2026 15:54:20 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> Also, I didn't notice any relevant change on the documentation build
> time.
After more tests, I actually noticed an issue after this changeset:
https://lore.kernel.org/linux-doc/2b957decdb6cedab4268f71a166c25b7abdb9a61.1773326442.git.mchehab+huawei@kernel.org/
Basically, a broken kernel-doc like this:
/**
* enum dmub_abm_ace_curve_type - ACE curve type.
*/
enum dmub_abm_ace_curve_type {
/**
* ACE curve as defined by the SW layer.
*/
ABM_ACE_CURVE_TYPE__SW = 0,
/**
* ACE curve as defined by the SW to HW translation interface layer.
*/
ABM_ACE_CURVE_TYPE__SW_IF = 1,
};
where the inlined markups don't have "@symbol" doesn't parse well. If
you run current kernel-doc, it would produce:
.. c:enum:: dmub_abm_ace_curve_type
ACE curve type.
.. container:: kernelindent
**Constants**
``*/ ABM_ACE_CURVE_TYPE__SW = 0``
*undescribed*
`` */ ABM_ACE_CURVE_TYPE__SW_IF = 1``
*undescribed*
Because Kernel-doc currently drops the "/**" line. My fix patch
above fixes it, but inlined comments confuse enum/struct detection.
To avoid that, we need to strip comments earlier at dump_struct and
dump_enum:
https://lore.kernel.org/linux-doc/d112804ace83e0ad8496f687977596bb7f091560.1773390831.git.mchehab+huawei@kernel.org/T/#u
After such fix, the output is now:
.. c:enum:: dmub_abm_ace_curve_type
ACE curve type.
.. container:: kernelindent
**Constants**
``ABM_ACE_CURVE_TYPE__SW``
*undescribed*
``ABM_ACE_CURVE_TYPE__SW_IF``
*undescribed*
which is the result expected when there's no proper inlined
kernel-doc markups.
Due to this issue, I ended adding a 29/28 patch on this series.
> With that regards, right now, every time a CMatch replacement
> rule takes in place, it does:
>
> for each transform:
> - tokenizes the source code;
> - handle CMatch;
> - convert tokens back to a string.
>
> A possible optimization would be to do, instead:
>
> - tokenizes source code;
> - for each transform handle CMatch;
> - convert tokens back to a string.
>
> For now, I opted not do do it, because:
>
> - too much changes on a single row;
> - docs build time is taking ~3:30 minutes, which is
> about the same time it ws taken before the changes;
> - there is a very dirty hack inside function_xforms:
> (KernRe(r"_noprof"), ""). This is meant to change
> function prototypes instead of function arguments.
>
> So, if ok for you, I would prefer to merge this one first. We can later
> optimize kdoc_parser to avoid multiple token <-> string conversions.
I did such optimization and it worked fine. So, I ended adding
a 30/28 patch at the end. With that, running kernel-doc before/after
the entire series won't have significant performance changes.
# Current approach
$ time ./scripts/kernel-doc . -man >original 2>&1
real 0m37.344s
user 0m36.447s
sys 0m0.712s
# Tokenizer running multiple times (patch 29)
$ time ./scripts/kernel-doc . -man >before 2>&1
real 1m32.427s
user 1m25.377s
sys 0m1.293s
# After optimization (patch 30)
$ time ./scripts/kernel-doc . -man >after 2>&1
real 0m47.094s
user 0m46.106s
sys 0m0.751s
10 seconds slower than before when parsing everything, which affects
make mandocs, but the time differences spent at kernel-doc parser during
make htmldocs is minimal: ir is about ~4 seconds(*):
$ run_kdoc.py -none 2>/dev/null
Checking what files are currently used on documentation...
Running kernel-doc
Elapsed time: 0:00:04.348008
(*) the slowest logic when building docs with Sphinx is inside its
RST parser code.
See the enclosed script to see how I measured the parsing time for
existing ".. kernel-doc::" markups inside Documentation.
Thanks,
Mauro
---
This is the run_kdoc.py script I'm using here to pick the same files
as make htmldocs do:
#!/bin/env python3
import os
import re
import subprocess
import sys
from datetime import datetime
from glob import glob
print("Checking what files are currently used on documentation...")
kdoc_files = set()
re_kernel_doc = re.compile(r"^\.\.\s+kernel-doc::\s*(\S+)")
for fname in glob(os.path.join(".", "**"), recursive=True):
if os.path.isfile(fname) and fname.endswith(".rst"):
with open(fname, "r", encoding="utf-8") as in_fp:
data = in_fp.read()
for line in data.split("\n"):
match = re_kernel_doc.match(line)
if match:
if os.path.isfile(match.group(1)):
kdoc_files.add(match.group(1))
if not kdoc_files:
sys.exit(f"Directory doesn't contain kernel-doc tags")
cmd = [ "./tools/docs/kernel-doc" ]
cmd += sys.argv[1:]
cmd += sorted(kdoc_files)
print("Running kernel-doc")
start_time = datetime.now()
try:
result = subprocess.run(cmd, check=True)
except subprocess.CalledProcessError as e:
print(f"kernel-doc failed: {repr(e)}")
elapsed = datetime.now() - start_time
print(f"\nElapsed time: {elapsed}")
next prev parent reply other threads:[~2026-03-13 9:18 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-12 14:54 [PATCH v2 00/28] kernel-doc: use a C lexical tokenizer for transforms Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 01/28] docs: python: add helpers to run unit tests Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 02/28] unittests: add a testbench to check public/private kdoc comments Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 03/28] docs: kdoc: don't add broken comments inside prototypes Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 04/28] docs: kdoc: properly handle empty enum arguments Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 05/28] docs: kdoc_re: add a C tokenizer Mauro Carvalho Chehab
2026-03-16 23:01 ` Jonathan Corbet
2026-03-17 7:59 ` Mauro Carvalho Chehab
2026-03-16 23:03 ` Jonathan Corbet
2026-03-16 23:29 ` Randy Dunlap
2026-03-16 23:40 ` Jonathan Corbet
2026-03-17 8:21 ` Mauro Carvalho Chehab
2026-03-17 17:04 ` Jonathan Corbet
2026-03-17 7:03 ` Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 06/28] docs: kdoc: use tokenizer to handle comments on structs Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 07/28] docs: kdoc: move C Tokenizer to c_lex module Mauro Carvalho Chehab
2026-03-16 23:30 ` Jonathan Corbet
2026-03-17 8:02 ` Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 08/28] unittests: test_private: modify it to use CTokenizer directly Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 09/28] unittests: test_tokenizer: check if the tokenizer works Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 10/28] unittests: add a runner to execute all unittests Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 11/28] docs: kdoc: create a CMatch to match nested C blocks Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 12/28] tools: unittests: add tests for CMatch Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 13/28] docs: c_lex: properly implement a sub() method " Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 14/28] unittests: test_cmatch: add tests for sub() Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 15/28] docs: kdoc: replace NestedMatch with CMatch Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 16/28] docs: kdoc_re: get rid of NestedMatch class Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 17/28] docs: xforms_lists: handle struct_group directly Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 18/28] docs: xforms_lists: better evaluate struct_group macros Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 19/28] docs: c_lex: add support to work with pure name ids Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 20/28] docs: xforms_lists: use CMatch for all identifiers Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 21/28] docs: c_lex: add "@" operator Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 22/28] docs: c_lex: don't exclude an extra token Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 23/28] docs: c_lex: setup a logger to report tokenizer issues Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 24/28] docs: unittests: add and adjust tests to check for errors Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 25/28] docs: c_lex: better handle BEGIN/END at search Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 26/28] docs: kernel-doc.rst: document private: scope propagation Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 27/28] docs: c_lex: produce a cleaner str() representation Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 28/28] unittests: test_cmatch: remove weird stuff from expected results Mauro Carvalho Chehab
2026-03-13 8:34 ` [PATCH v2 29/28] docs: kdoc: ensure that comments are dropped before calling split_struct_proto() Mauro Carvalho Chehab
2026-03-13 8:34 ` [PATCH v2 30/28] docs: kdoc_parser: avoid tokenizing structs everytime Mauro Carvalho Chehab
2026-03-13 11:05 ` Loktionov, Aleksandr
2026-03-13 11:05 ` [PATCH v2 29/28] docs: kdoc: ensure that comments are dropped before calling split_struct_proto() Loktionov, Aleksandr
2026-03-13 9:17 ` Mauro Carvalho Chehab [this message]
2026-03-17 17:12 ` [PATCH v2 00/28] kernel-doc: use a C lexical tokenizer for transforms Jonathan Corbet
2026-03-17 18:00 ` Mauro Carvalho Chehab
2026-03-17 18:57 ` Mauro Carvalho Chehab
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260313101758.1dc691bc@foz.lan \
--to=mchehab+huawei@kernel.org \
--cc=aleksandr.loktionov@intel.com \
--cc=corbet@lwn.net \
--cc=gustavoars@kernel.org \
--cc=kees@kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-hardening@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mailhol@kernel.org \
--cc=mchehab@kernel.org \
--cc=rdunlap@infradead.org \
--cc=skhan@linuxfoundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox