From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
To: Jonathan Corbet <corbet@lwn.net>, Kees Cook <kees@kernel.org>,
Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>,
linux-doc@vger.kernel.org, linux-hardening@vger.kernel.org,
linux-kernel@vger.kernel.org,
"Gustavo A. R. Silva" <gustavoars@kernel.org>,
Aleksandr Loktionov <aleksandr.loktionov@intel.com>,
Randy Dunlap <rdunlap@infradead.org>,
Shuah Khan <skhan@linuxfoundation.org>
Subject: [PATCH v2 00/20] kernel-doc: use a C lexical tokenizer for transforms
Date: Thu, 12 Mar 2026 08:12:08 +0100 [thread overview]
Message-ID: <cover.1773297828.git.mchehab+huawei@kernel.org> (raw)
Hi Jon,
This patch series change how kdoc parser handles macro replacements.
Instead of heavily relying on regular expressions that can sometimes
be very complex, it uses a C lexical tokenizer. This ensures that
BEGIN/END blocks on functions and structs are properly handled,
even when nested.
Checking before/after the patch series, for both man pages and
rst only had:
- whitespace differences;
- struct_group macros now are shown as inner anonimous structs
as it should be.
Also, I didn't notice any relevant change on the documentation build
time. With that regards, right now, every time a CMatch replacement
rule takes in place, it does:
for each transform:
- tokenizes the source code;
- handle CMatch;
- convert tokens back to a string.
A possible optimization would be to do, instead:
- tokenizes source code;
- for each transform handle CMatch;
- convert tokens back to a string.
For now, I opted not do do it, because:
- too much changes on a single row;
- docs build time is taking ~3:30 minutes, which is
about the same time it ws taken before the changes;
- there is a very dirty hack inside function_xforms:
(KernRe(r"_noprof"), ""). This is meant to change
function prototypes instead of function arguments.
So, if ok for you, I would prefer to merge this one first. We can later
optimize kdoc_parser to avoid multiple token <-> string conversions.
-
One important aspect of this series is that it introduces unittests
for kernel-doc. I used it a lot during the development of this series,
to ensure that the changes I was doing were producing the expected
results. Tests are on two separate files that can be executed directly.
Alternatively, there is a run.py script that runs all of them (and
any other python script named tools/unittests/test_*.py"):
$ ./tools/unittests/run.py
test_cmatch:
TestSearch:
test_search_acquires_multiple: OK
test_search_acquires_nested_paren: OK
test_search_acquires_simple: OK
test_search_must_hold: OK
test_search_must_hold_shared: OK
test_search_no_false_positive: OK
test_search_no_function: OK
test_search_no_macro_remains: OK
TestSubMultipleMacros:
test_acquires_multiple: OK
test_acquires_nested_paren: OK
test_acquires_simple: OK
test_mixed_macros: OK
test_must_hold: OK
test_must_hold_shared: OK
test_no_false_positive: OK
test_no_function: OK
test_no_macro_remains: OK
TestSubSimple:
test_strip_multiple_acquires: OK
test_sub_count_parameter: OK
test_sub_mixed_placeholders: OK
test_sub_multiple_placeholders: OK
test_sub_no_placeholder: OK
test_sub_single_placeholder: OK
test_sub_with_capture: OK
test_sub_zero_placeholder: OK
TestSubWithLocalXforms:
test_functions_with_acquires_and_releases: OK
test_raw_struct_group: OK
test_raw_struct_group_tagged: OK
test_struct_group: OK
test_struct_group_attr: OK
test_struct_group_tagged_with_private: OK
test_struct_kcov: OK
test_vars_stackdepot: OK
test_tokenizer:
TestPublicPrivate:
test_balanced_inner_private: OK
test_balanced_non_greddy_private: OK
test_balanced_private: OK
test_no private: OK
test_unbalanced_inner_private: OK
test_unbalanced_private: OK
test_unbalanced_struct_group_tagged_with_private: OK
test_unbalanced_two_struct_group_tagged_first_with_private: OK
test_unbalanced_without_end_of_line: OK
TestTokenizer:
test_basic_tokens: OK
test_depth_counters: OK
test_mismatch_error: OK
Ran 45 tests
PS.: This series contain the contents of the previous /8 series:
https://lore.kernel.org/linux-doc/cover.1773074166.git.mchehab+huawei@kernel.org/
Mauro Carvalho Chehab (20):
docs: python: add helpers to run unit tests
unittests: add a testbench to check public/private kdoc comments
docs: kdoc: don't add broken comments inside prototypes
docs: kdoc: properly handle empty enum arguments
docs: kdoc_re: add a C tokenizer
docs: kdoc: use tokenizer to handle comments on structs
docs: kdoc: move C Tokenizer to c_lex module
unittests: test_private: modify it to use CTokenizer directly
unittests: test_tokenizer: check if the tokenizer works
unittests: add a runner to execute all unittests
docs: kdoc: create a CMatch to match nested C blocks
tools: unittests: add tests for CMatch
docs: c_lex: properly implement a sub() method for CMatch
unittests: test_cmatch: add tests for sub()
docs: kdoc: replace NestedMatch with CMatch
docs: kdoc_re: get rid of NestedMatch class
docs: xforms_lists: handle struct_group directly
docs: xforms_lists: better evaluate struct_group macros
docs: c_lex: add support to work with pure name ids
docs: xforms_lists: use CMatch for all identifiers
Documentation/tools/python.rst | 2 +
Documentation/tools/unittest.rst | 24 +
tools/lib/python/kdoc/c_lex.py | 593 +++++++++++++++++++
tools/lib/python/kdoc/kdoc_parser.py | 26 +-
tools/lib/python/kdoc/kdoc_re.py | 201 -------
tools/lib/python/kdoc/xforms_lists.py | 209 +++----
tools/lib/python/unittest_helper.py | 353 +++++++++++
tools/unittests/run.py | 17 +
tools/unittests/test_cmatch.py | 812 ++++++++++++++++++++++++++
tools/unittests/test_tokenizer.py | 461 +++++++++++++++
10 files changed, 2366 insertions(+), 332 deletions(-)
create mode 100644 Documentation/tools/unittest.rst
create mode 100644 tools/lib/python/kdoc/c_lex.py
create mode 100755 tools/lib/python/unittest_helper.py
create mode 100755 tools/unittests/run.py
create mode 100755 tools/unittests/test_cmatch.py
create mode 100755 tools/unittests/test_tokenizer.py
--
2.53.0
next reply other threads:[~2026-03-12 7:12 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-12 7:12 Mauro Carvalho Chehab [this message]
2026-03-12 7:12 ` [PATCH v2 01/20] docs: python: add helpers to run unit tests Mauro Carvalho Chehab
2026-03-12 7:12 ` [PATCH v2 02/20] unittests: add a testbench to check public/private kdoc comments Mauro Carvalho Chehab
2026-03-12 7:12 ` [PATCH v2 03/20] docs: kdoc: don't add broken comments inside prototypes Mauro Carvalho Chehab
2026-03-12 7:12 ` [PATCH v2 04/20] docs: kdoc: properly handle empty enum arguments Mauro Carvalho Chehab
2026-03-12 7:12 ` [PATCH v2 05/20] docs: kdoc_re: add a C tokenizer Mauro Carvalho Chehab
2026-03-12 7:12 ` [PATCH v2 06/20] docs: kdoc: use tokenizer to handle comments on structs Mauro Carvalho Chehab
2026-03-12 7:12 ` [PATCH v2 07/20] docs: kdoc: move C Tokenizer to c_lex module Mauro Carvalho Chehab
2026-03-12 7:12 ` [PATCH v2 08/20] unittests: test_private: modify it to use CTokenizer directly Mauro Carvalho Chehab
2026-03-12 7:12 ` [PATCH v2 09/20] unittests: test_tokenizer: check if the tokenizer works Mauro Carvalho Chehab
2026-03-12 7:12 ` [PATCH v2 10/20] unittests: add a runner to execute all unittests Mauro Carvalho Chehab
2026-03-12 7:12 ` [PATCH v2 11/20] docs: kdoc: create a CMatch to match nested C blocks Mauro Carvalho Chehab
2026-03-12 7:12 ` [PATCH v2 12/20] tools: unittests: add tests for CMatch Mauro Carvalho Chehab
2026-03-12 7:12 ` [PATCH v2 13/20] docs: c_lex: properly implement a sub() method " Mauro Carvalho Chehab
2026-03-12 7:12 ` [PATCH v2 14/20] unittests: test_cmatch: add tests for sub() Mauro Carvalho Chehab
2026-03-12 7:12 ` [PATCH v2 15/20] docs: kdoc: replace NestedMatch with CMatch Mauro Carvalho Chehab
2026-03-12 7:12 ` [PATCH v2 16/20] docs: kdoc_re: get rid of NestedMatch class Mauro Carvalho Chehab
2026-03-12 7:12 ` [PATCH v2 17/20] docs: xforms_lists: handle struct_group directly Mauro Carvalho Chehab
2026-03-12 7:12 ` [PATCH v2 18/20] docs: xforms_lists: better evaluate struct_group macros Mauro Carvalho Chehab
2026-03-12 7:12 ` [PATCH v2 19/20] docs: c_lex: add support to work with pure name ids Mauro Carvalho Chehab
2026-03-12 7:12 ` [PATCH v2 20/20] docs: xforms_lists: use CMatch for all identifiers Mauro Carvalho Chehab
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1773297828.git.mchehab+huawei@kernel.org \
--to=mchehab+huawei@kernel.org \
--cc=aleksandr.loktionov@intel.com \
--cc=corbet@lwn.net \
--cc=gustavoars@kernel.org \
--cc=kees@kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-hardening@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mchehab@kernel.org \
--cc=rdunlap@infradead.org \
--cc=skhan@linuxfoundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.