From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
To: Jonathan Corbet <corbet@lwn.net>, Kees Cook <kees@kernel.org>,
Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>,
linux-doc@vger.kernel.org, linux-hardening@vger.kernel.org,
linux-kernel@vger.kernel.org,
"Gustavo A. R. Silva" <gustavoars@kernel.org>,
Aleksandr Loktionov <aleksandr.loktionov@intel.com>,
Randy Dunlap <rdunlap@infradead.org>,
Shuah Khan <skhan@linuxfoundation.org>
Subject: [PATCH v2 00/20] kernel-doc: use a C lexical tokenizer for transforms
Date: Thu, 12 Mar 2026 08:12:08 +0100 [thread overview]
Message-ID: <cover.1773297828.git.mchehab+huawei@kernel.org> (raw)
Hi Jon,
This patch series change how kdoc parser handles macro replacements.
Instead of heavily relying on regular expressions that can sometimes
be very complex, it uses a C lexical tokenizer. This ensures that
BEGIN/END blocks on functions and structs are properly handled,
even when nested.
Checking before/after the patch series, for both man pages and
rst only had:
- whitespace differences;
- struct_group macros now are shown as inner anonimous structs
as it should be.
Also, I didn't notice any relevant change on the documentation build
time. With that regards, right now, every time a CMatch replacement
rule takes in place, it does:
for each transform:
- tokenizes the source code;
- handle CMatch;
- convert tokens back to a string.
A possible optimization would be to do, instead:
- tokenizes source code;
- for each transform handle CMatch;
- convert tokens back to a string.
For now, I opted not do do it, because:
- too much changes on a single row;
- docs build time is taking ~3:30 minutes, which is
about the same time it ws taken before the changes;
- there is a very dirty hack inside function_xforms:
(KernRe(r"_noprof"), ""). This is meant to change
function prototypes instead of function arguments.
So, if ok for you, I would prefer to merge this one first. We can later
optimize kdoc_parser to avoid multiple token <-> string conversions.
-
One important aspect of this series is that it introduces unittests
for kernel-doc. I used it a lot during the development of this series,
to ensure that the changes I was doing were producing the expected
results. Tests are on two separate files that can be executed directly.
Alternatively, there is a run.py script that runs all of them (and
any other python script named tools/unittests/test_*.py"):
$ ./tools/unittests/run.py
test_cmatch:
TestSearch:
test_search_acquires_multiple: OK
test_search_acquires_nested_paren: OK
test_search_acquires_simple: OK
test_search_must_hold: OK
test_search_must_hold_shared: OK
test_search_no_false_positive: OK
test_search_no_function: OK
test_search_no_macro_remains: OK
TestSubMultipleMacros:
test_acquires_multiple: OK
test_acquires_nested_paren: OK
test_acquires_simple: OK
test_mixed_macros: OK
test_must_hold: OK
test_must_hold_shared: OK
test_no_false_positive: OK
test_no_function: OK
test_no_macro_remains: OK
TestSubSimple:
test_strip_multiple_acquires: OK
test_sub_count_parameter: OK
test_sub_mixed_placeholders: OK
test_sub_multiple_placeholders: OK
test_sub_no_placeholder: OK
test_sub_single_placeholder: OK
test_sub_with_capture: OK
test_sub_zero_placeholder: OK
TestSubWithLocalXforms:
test_functions_with_acquires_and_releases: OK
test_raw_struct_group: OK
test_raw_struct_group_tagged: OK
test_struct_group: OK
test_struct_group_attr: OK
test_struct_group_tagged_with_private: OK
test_struct_kcov: OK
test_vars_stackdepot: OK
test_tokenizer:
TestPublicPrivate:
test_balanced_inner_private: OK
test_balanced_non_greddy_private: OK
test_balanced_private: OK
test_no private: OK
test_unbalanced_inner_private: OK
test_unbalanced_private: OK
test_unbalanced_struct_group_tagged_with_private: OK
test_unbalanced_two_struct_group_tagged_first_with_private: OK
test_unbalanced_without_end_of_line: OK
TestTokenizer:
test_basic_tokens: OK
test_depth_counters: OK
test_mismatch_error: OK
Ran 45 tests
PS.: This series contain the contents of the previous /8 series:
https://lore.kernel.org/linux-doc/cover.1773074166.git.mchehab+huawei@kernel.org/
Mauro Carvalho Chehab (20):
docs: python: add helpers to run unit tests
unittests: add a testbench to check public/private kdoc comments
docs: kdoc: don't add broken comments inside prototypes
docs: kdoc: properly handle empty enum arguments
docs: kdoc_re: add a C tokenizer
docs: kdoc: use tokenizer to handle comments on structs
docs: kdoc: move C Tokenizer to c_lex module
unittests: test_private: modify it to use CTokenizer directly
unittests: test_tokenizer: check if the tokenizer works
unittests: add a runner to execute all unittests
docs: kdoc: create a CMatch to match nested C blocks
tools: unittests: add tests for CMatch
docs: c_lex: properly implement a sub() method for CMatch
unittests: test_cmatch: add tests for sub()
docs: kdoc: replace NestedMatch with CMatch
docs: kdoc_re: get rid of NestedMatch class
docs: xforms_lists: handle struct_group directly
docs: xforms_lists: better evaluate struct_group macros
docs: c_lex: add support to work with pure name ids
docs: xforms_lists: use CMatch for all identifiers
Documentation/tools/python.rst | 2 +
Documentation/tools/unittest.rst | 24 +
tools/lib/python/kdoc/c_lex.py | 593 +++++++++++++++++++
tools/lib/python/kdoc/kdoc_parser.py | 26 +-
tools/lib/python/kdoc/kdoc_re.py | 201 -------
tools/lib/python/kdoc/xforms_lists.py | 209 +++----
tools/lib/python/unittest_helper.py | 353 +++++++++++
tools/unittests/run.py | 17 +
tools/unittests/test_cmatch.py | 812 ++++++++++++++++++++++++++
tools/unittests/test_tokenizer.py | 461 +++++++++++++++
10 files changed, 2366 insertions(+), 332 deletions(-)
create mode 100644 Documentation/tools/unittest.rst
create mode 100644 tools/lib/python/kdoc/c_lex.py
create mode 100755 tools/lib/python/unittest_helper.py
create mode 100755 tools/unittests/run.py
create mode 100755 tools/unittests/test_cmatch.py
create mode 100755 tools/unittests/test_tokenizer.py
--
2.53.0
next reply other threads:[~2026-03-12 7:12 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-12 7:12 Mauro Carvalho Chehab [this message]
2026-03-12 7:12 ` [PATCH v2 01/20] docs: python: add helpers to run unit tests Mauro Carvalho Chehab
2026-03-12 7:12 ` [PATCH v2 02/20] unittests: add a testbench to check public/private kdoc comments Mauro Carvalho Chehab
2026-03-12 7:12 ` [PATCH v2 03/20] docs: kdoc: don't add broken comments inside prototypes Mauro Carvalho Chehab
2026-03-12 7:12 ` [PATCH v2 04/20] docs: kdoc: properly handle empty enum arguments Mauro Carvalho Chehab
2026-03-12 7:12 ` [PATCH v2 05/20] docs: kdoc_re: add a C tokenizer Mauro Carvalho Chehab
2026-03-12 7:12 ` [PATCH v2 06/20] docs: kdoc: use tokenizer to handle comments on structs Mauro Carvalho Chehab
2026-03-12 7:12 ` [PATCH v2 07/20] docs: kdoc: move C Tokenizer to c_lex module Mauro Carvalho Chehab
2026-03-12 7:12 ` [PATCH v2 08/20] unittests: test_private: modify it to use CTokenizer directly Mauro Carvalho Chehab
2026-03-12 7:12 ` [PATCH v2 09/20] unittests: test_tokenizer: check if the tokenizer works Mauro Carvalho Chehab
2026-03-12 7:12 ` [PATCH v2 10/20] unittests: add a runner to execute all unittests Mauro Carvalho Chehab
2026-03-12 7:12 ` [PATCH v2 11/20] docs: kdoc: create a CMatch to match nested C blocks Mauro Carvalho Chehab
2026-03-12 7:12 ` [PATCH v2 12/20] tools: unittests: add tests for CMatch Mauro Carvalho Chehab
2026-03-12 7:12 ` [PATCH v2 13/20] docs: c_lex: properly implement a sub() method " Mauro Carvalho Chehab
2026-03-12 7:12 ` [PATCH v2 14/20] unittests: test_cmatch: add tests for sub() Mauro Carvalho Chehab
2026-03-12 7:12 ` [PATCH v2 15/20] docs: kdoc: replace NestedMatch with CMatch Mauro Carvalho Chehab
2026-03-12 7:12 ` [PATCH v2 16/20] docs: kdoc_re: get rid of NestedMatch class Mauro Carvalho Chehab
2026-03-12 7:12 ` [PATCH v2 17/20] docs: xforms_lists: handle struct_group directly Mauro Carvalho Chehab
2026-03-12 7:12 ` [PATCH v2 18/20] docs: xforms_lists: better evaluate struct_group macros Mauro Carvalho Chehab
2026-03-12 7:12 ` [PATCH v2 19/20] docs: c_lex: add support to work with pure name ids Mauro Carvalho Chehab
2026-03-12 7:12 ` [PATCH v2 20/20] docs: xforms_lists: use CMatch for all identifiers Mauro Carvalho Chehab
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1773297828.git.mchehab+huawei@kernel.org \
--to=mchehab+huawei@kernel.org \
--cc=aleksandr.loktionov@intel.com \
--cc=corbet@lwn.net \
--cc=gustavoars@kernel.org \
--cc=kees@kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-hardening@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mchehab@kernel.org \
--cc=rdunlap@infradead.org \
--cc=skhan@linuxfoundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox