public inbox for linux-doc@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/28] kernel-doc: use a C lexical tokenizer for transforms
@ 2026-03-12 14:54 Mauro Carvalho Chehab
  2026-03-12 14:54 ` [PATCH v2 01/28] docs: python: add helpers to run unit tests Mauro Carvalho Chehab
                   ` (30 more replies)
  0 siblings, 31 replies; 47+ messages in thread
From: Mauro Carvalho Chehab @ 2026-03-12 14:54 UTC (permalink / raw)
  To: Jonathan Corbet, Kees Cook, Mauro Carvalho Chehab
  Cc: Mauro Carvalho Chehab, linux-doc, linux-hardening, linux-kernel,
	Gustavo A. R. Silva, Aleksandr Loktionov, Randy Dunlap,
	Shuah Khan, Vincent Mailhol

Hi Jon,

Sorry for respamming this one too quick. It ends that v1 had some
bugs causing it to fail on several cases. I opted to add extra
patches in the end. This way, it better integrates with kdoc_re.
As part of it, now c_lex will output file name when reporting
errors. With that regards, only more serious errors will raise
an exception. They are meant to indicate problems at kernel-doc
itself. Parsing errors are now using the same warning approach
as kdoc_parser.

I also added a filter at Ctokenizer __str__() logic for the
string convertion to drop some weirdness whitespaces and uneeded
";" characters at the output.

Finally, v2 address the undefined behavior about private: comment
propagation.

This patch series change how kdoc parser handles macro replacements.

Instead of heavily relying on regular expressions that can sometimes
be very complex, it uses a C lexical tokenizer. This ensures that
BEGIN/END blocks on functions and structs are properly handled,
even when nested.

Checking before/after the patch series, for both man pages and
rst only had:
    - whitespace differences;
    - struct_group macros now are shown as inner anonimous structs
      as it should be.

Also, I didn't notice any relevant change on the documentation build
time. With that regards, right now, every time a CMatch replacement
rule takes in place, it does:

    for each transform:
    - tokenizes the source code;
    - handle CMatch;
    - convert tokens back to a string.

A possible optimization would be to do, instead:

    - tokenizes source code;
    - for each transform handle CMatch;
    - convert tokens back to a string.

For now, I opted not do do it, because:

    - too much changes on a single row;
    - docs build time is taking ~3:30 minutes, which is
      about the same time it ws taken before the changes;
    - there is a very dirty hack inside function_xforms:
         (KernRe(r"_noprof"), ""). This is meant to change
      function prototypes instead of function arguments.

So, if ok for you, I would prefer to merge this one first. We can later
optimize kdoc_parser to avoid multiple token <-> string conversions.

-

One important aspect of this series is that it introduces unittests
for kernel-doc. I used it a lot during the development of this series,
to ensure that the changes I was doing were producing the expected
results. Tests are on two separate files that can be executed directly.

Alternatively, there is a run.py script that runs all of them (and
any other python script named  tools/unittests/test_*.py"):

  $ tools/unittests/run.py
  test_cmatch:
      TestSearch:
          test_search_acquires_multiple:                               OK
          test_search_acquires_nested_paren:                           OK
          test_search_acquires_simple:                                 OK
          test_search_must_hold:                                       OK
          test_search_must_hold_shared:                                OK
          test_search_no_false_positive:                               OK
          test_search_no_function:                                     OK
          test_search_no_macro_remains:                                OK
      TestSubMultipleMacros:
          test_acquires_multiple:                                      OK
          test_acquires_nested_paren:                                  OK
          test_acquires_simple:                                        OK
          test_mixed_macros:                                           OK
          test_must_hold:                                              OK
          test_must_hold_shared:                                       OK
          test_no_false_positive:                                      OK
          test_no_function:                                            OK
          test_no_macro_remains:                                       OK
      TestSubSimple:
          test_rise_early_greedy:                                      OK
          test_rise_multiple_greedy:                                   OK
          test_strip_multiple_acquires:                                OK
          test_sub_count_parameter:                                    OK
          test_sub_mixed_placeholders:                                 OK
          test_sub_multiple_placeholders:                              OK
          test_sub_no_placeholder:                                     OK
          test_sub_single_placeholder:                                 OK
          test_sub_with_capture:                                       OK
          test_sub_zero_placeholder:                                   OK
      TestSubWithLocalXforms:
          test_functions_with_acquires_and_releases:                   OK
          test_raw_struct_group:                                       OK
          test_raw_struct_group_tagged:                                OK
          test_struct_group:                                           OK
          test_struct_group_attr:                                      OK
          test_struct_group_tagged_with_private:                       OK
          test_struct_kcov:                                            OK
          test_vars_stackdepot:                                        OK
  
  test_tokenizer:
      TestPublicPrivate:
          test_balanced_inner_private:                                 OK
          test_balanced_non_greddy_private:                            OK
          test_balanced_private:                                       OK
          test_no private:                                             OK
          test_unbalanced_inner_private:                               OK
          test_unbalanced_private:                                     OK
          test_unbalanced_struct_group_tagged_with_private:            OK
          test_unbalanced_two_struct_group_tagged_first_with_private:  OK
          test_unbalanced_without_end_of_line:                         OK
      TestTokenizer:
          test_basic_tokens:                                           OK
          test_depth_counters:                                         OK
          test_mismatch_error:                                         OK
  
  
  Ran 47 tests

PS.: This series contain the contents of the previous /8 series:
    https://lore.kernel.org/linux-doc/cover.1773074166.git.mchehab+huawei@kernel.org/

---

v2:
  - Added 8 more patches fixing several bugs and modifying unittests
    accordingly:
    - don't raise exceptions when not needed;
    - don't report errors reporting lack of END if there's no BEGIN
      at the last replacement string;
    - document private scope propagation;
    - some changes at unittests to reflect current status;
    - addition of two unittests to check error raise logic at c_lex.

Mauro Carvalho Chehab (28):
  docs: python: add helpers to run unit tests
  unittests: add a testbench to check public/private kdoc comments
  docs: kdoc: don't add broken comments inside prototypes
  docs: kdoc: properly handle empty enum arguments
  docs: kdoc_re: add a C tokenizer
  docs: kdoc: use tokenizer to handle comments on structs
  docs: kdoc: move C Tokenizer to c_lex module
  unittests: test_private: modify it to use CTokenizer directly
  unittests: test_tokenizer: check if the tokenizer works
  unittests: add a runner to execute all unittests
  docs: kdoc: create a CMatch to match nested C blocks
  tools: unittests: add tests for CMatch
  docs: c_lex: properly implement a sub() method for CMatch
  unittests: test_cmatch: add tests for sub()
  docs: kdoc: replace NestedMatch with CMatch
  docs: kdoc_re: get rid of NestedMatch class
  docs: xforms_lists: handle struct_group directly
  docs: xforms_lists: better evaluate struct_group macros
  docs: c_lex: add support to work with pure name ids
  docs: xforms_lists: use CMatch for all identifiers
  docs: c_lex: add "@" operator
  docs: c_lex: don't exclude an extra token
  docs: c_lex: setup a logger to report tokenizer issues
  docs: unittests: add and adjust tests to check for errors
  docs: c_lex: better handle BEGIN/END at search
  docs: kernel-doc.rst: document private: scope propagation
  docs: c_lex: produce a cleaner str() representation
  unittests: test_cmatch: remove weird stuff from expected results

 Documentation/doc-guide/kernel-doc.rst |   6 +
 Documentation/tools/python.rst         |   2 +
 Documentation/tools/unittest.rst       |  24 +
 tools/lib/python/kdoc/c_lex.py         | 645 +++++++++++++++++++
 tools/lib/python/kdoc/kdoc_parser.py   |  29 +-
 tools/lib/python/kdoc/kdoc_re.py       | 201 ------
 tools/lib/python/kdoc/xforms_lists.py  | 209 +++----
 tools/lib/python/unittest_helper.py    | 353 +++++++++++
 tools/unittests/run.py                 |  17 +
 tools/unittests/test_cmatch.py         | 821 +++++++++++++++++++++++++
 tools/unittests/test_tokenizer.py      | 462 ++++++++++++++
 11 files changed, 2434 insertions(+), 335 deletions(-)
 create mode 100644 Documentation/tools/unittest.rst
 create mode 100644 tools/lib/python/kdoc/c_lex.py
 create mode 100755 tools/lib/python/unittest_helper.py
 create mode 100755 tools/unittests/run.py
 create mode 100755 tools/unittests/test_cmatch.py
 create mode 100755 tools/unittests/test_tokenizer.py

-- 
2.52.0


^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2026-03-17 18:57 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-12 14:54 [PATCH v2 00/28] kernel-doc: use a C lexical tokenizer for transforms Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 01/28] docs: python: add helpers to run unit tests Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 02/28] unittests: add a testbench to check public/private kdoc comments Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 03/28] docs: kdoc: don't add broken comments inside prototypes Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 04/28] docs: kdoc: properly handle empty enum arguments Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 05/28] docs: kdoc_re: add a C tokenizer Mauro Carvalho Chehab
2026-03-16 23:01   ` Jonathan Corbet
2026-03-17  7:59     ` Mauro Carvalho Chehab
2026-03-16 23:03   ` Jonathan Corbet
2026-03-16 23:29     ` Randy Dunlap
2026-03-16 23:40       ` Jonathan Corbet
2026-03-17  8:21         ` Mauro Carvalho Chehab
2026-03-17 17:04           ` Jonathan Corbet
2026-03-17  7:03       ` Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 06/28] docs: kdoc: use tokenizer to handle comments on structs Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 07/28] docs: kdoc: move C Tokenizer to c_lex module Mauro Carvalho Chehab
2026-03-16 23:30   ` Jonathan Corbet
2026-03-17  8:02     ` Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 08/28] unittests: test_private: modify it to use CTokenizer directly Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 09/28] unittests: test_tokenizer: check if the tokenizer works Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 10/28] unittests: add a runner to execute all unittests Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 11/28] docs: kdoc: create a CMatch to match nested C blocks Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 12/28] tools: unittests: add tests for CMatch Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 13/28] docs: c_lex: properly implement a sub() method " Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 14/28] unittests: test_cmatch: add tests for sub() Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 15/28] docs: kdoc: replace NestedMatch with CMatch Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 16/28] docs: kdoc_re: get rid of NestedMatch class Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 17/28] docs: xforms_lists: handle struct_group directly Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 18/28] docs: xforms_lists: better evaluate struct_group macros Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 19/28] docs: c_lex: add support to work with pure name ids Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 20/28] docs: xforms_lists: use CMatch for all identifiers Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 21/28] docs: c_lex: add "@" operator Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 22/28] docs: c_lex: don't exclude an extra token Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 23/28] docs: c_lex: setup a logger to report tokenizer issues Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 24/28] docs: unittests: add and adjust tests to check for errors Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 25/28] docs: c_lex: better handle BEGIN/END at search Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 26/28] docs: kernel-doc.rst: document private: scope propagation Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 27/28] docs: c_lex: produce a cleaner str() representation Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 28/28] unittests: test_cmatch: remove weird stuff from expected results Mauro Carvalho Chehab
2026-03-13  8:34 ` [PATCH v2 29/28] docs: kdoc: ensure that comments are dropped before calling split_struct_proto() Mauro Carvalho Chehab
2026-03-13  8:34   ` [PATCH v2 30/28] docs: kdoc_parser: avoid tokenizing structs everytime Mauro Carvalho Chehab
2026-03-13 11:05     ` Loktionov, Aleksandr
2026-03-13 11:05   ` [PATCH v2 29/28] docs: kdoc: ensure that comments are dropped before calling split_struct_proto() Loktionov, Aleksandr
2026-03-13  9:17 ` [PATCH v2 00/28] kernel-doc: use a C lexical tokenizer for transforms Mauro Carvalho Chehab
2026-03-17 17:12 ` Jonathan Corbet
2026-03-17 18:00   ` Mauro Carvalho Chehab
2026-03-17 18:57   ` Mauro Carvalho Chehab

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox