[PATCH 00/38] docs: several improvements to kernel-doc

public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 00/38] docs: several improvements to kernel-doc
@ 2026-02-18 10:12 Mauro Carvalho Chehab
  2026-02-18 10:12 ` [PATCH 01/38] docs: kdoc_re: add support for groups() Mauro Carvalho Chehab
                   ` (40 more replies)
  0 siblings, 41 replies; 55+ messages in thread
From: Mauro Carvalho Chehab @ 2026-02-18 10:12 UTC (permalink / raw)
  To: Alexander Lobakin, Jonathan Corbet, Kees Cook,
	Mauro Carvalho Chehab
  Cc: Mauro Carvalho Chehab, intel-wired-lan, linux-doc,
	linux-hardening, linux-kernel, netdev, Gustavo A. R. Silva,
	Aleksandr Loktionov, Randy Dunlap, Shuah Khan

Hi Jon,

This series contain several improvements for kernel-doc.

Most of the patches came from v4 of this series:
	https://lore.kernel.org/linux-doc/cover.1769867953.git.mchehab+huawei@kernel.org/

But I dropped from this series the unit tests part. I'll
be sumitting it on a separate series.

The rationale is that, when I converted kernel-doc from Perl,
the goal were to produce a bug-compatible version.

As anyone that worked before with kernel-doc are aware, using regex to
handle C input is not great. Instead, we need something closer to how
C statements and declarations are handled.

Yet, to avoid breaking  docs, I avoided touching the regex-based algorithms
inside it with one exception: struct_group logic was using very complex
regexes that are incompatible with Python internal "re" module.

So, I came up with a different approach: NestedMatch. The logic inside
it is meant to properly handle brackets, square brackets and parenthesis,
which is closer to what C lexical parser does. On that time, I added
a TODO about the need to extend that.

The first part of this series do exactly that: it extends it to parse
comma-separated arguments, respecting brackets and parenthesis.

It then adds an "alias" to it at class CFunction. With that, specifying
functions/macros to be handled becomes much easier.

With such infra in place, it moves the transform functions to a separate
file, making it hopefully easier to maintain. As a side effect, it also
makes easier for other projects to use kernel-doc (I tested it on QEMU).

Then, it adds support for newer kref annotations.

The remaining patches on this series improve the man page output, making
them more compatible with other man pages.

-

I wrote several unit tests to check kernel-doc behavior. I intend to
submit them on the top of this series later on.

Regards,
Mauro

Mauro Carvalho Chehab (36):
  docs: kdoc_re: add support for groups()
  docs: kdoc_re: don't go past the end of a line
  docs: kdoc_parser: move var transformers to the beginning
  docs: kdoc_parser: don't mangle with function defines
  docs: kdoc_parser: add functions support for NestedMatch
  docs: kdoc_parser: use NestedMatch to handle __attribute__ on
    functions
  docs: kdoc_parser: fix variable regexes to work with size_t
  docs: kdoc_parser: fix the default_value logic for variables
  docs: kdoc_parser: add some debug for variable parsing
  docs: kdoc_parser: don't exclude defaults from prototype
  docs: kdoc_parser: fix parser to support multi-word types
  docs: kdoc_parser: add support for LIST_HEAD
  docs: kdoc_re: properly handle strings and escape chars on it
  docs: kdoc_re: better show KernRe() at documentation
  docs: kdoc_re: don't recompile NestedMatch regex every time
  docs: kdoc_re: Change NestedMath args replacement to \0
  docs: kdoc_re: make NestedMatch use KernRe
  docs: kdoc_re: add support on NestedMatch for argument replacement
  docs: kdoc_parser: better handle struct_group macros
  docs: kdoc_re: fix a parse bug on struct page_pool_params
  docs: kdoc_re: add a helper class to declare C function matches
  docs: kdoc_parser: use the new CFunction class
  docs: kdoc_parser: minimize differences with struct_group_tagged
  docs: kdoc_parser: move transform lists to a separate file
  docs: kdoc_re: don't remove the trailing ";" with NestedMatch
  docs: kdoc_re: prevent adding whitespaces on sub replacements
  docs: xforms_lists.py: use CFuntion to handle all function macros
  docs: kdoc_files: allows the caller to use a different xforms class
  docs: kdoc_re: Fix NestedMatch.sub() which causes PDF builds to break
  docs: kdoc_files: document KernelFiles() ABI
  docs: kdoc_output: add optional args to ManOutput class
  docs: sphinx-build-wrapper: better handle troff .TH markups
  docs: kdoc_output: use a more standard order for .TH on man pages
  docs: sphinx-build-wrapper: don't allow "/" on file names
  docs: kdoc_output: describe the class init parameters
  docs: kdoc_output: pick a better default for modulename

Randy Dunlap (2):
  docs: kdoc_parser: ignore context analysis and lock attributes
  docs: kdoc_parser: handle struct member macro
    VIRTIO_DECLARE_FEATURES(name)

 Documentation/tools/kdoc_parser.rst   |   8 +
 tools/docs/kernel-doc                 |   1 -
 tools/docs/sphinx-build-wrapper       |   9 +-
 tools/lib/python/kdoc/kdoc_files.py   |  54 +++++-
 tools/lib/python/kdoc/kdoc_output.py  |  73 ++++++--
 tools/lib/python/kdoc/kdoc_parser.py  | 183 ++++---------------
 tools/lib/python/kdoc/kdoc_re.py      | 242 ++++++++++++++++++++------
 tools/lib/python/kdoc/xforms_lists.py | 109 ++++++++++++
 8 files changed, 451 insertions(+), 228 deletions(-)
 create mode 100644 tools/lib/python/kdoc/xforms_lists.py

-- 
2.52.0

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 01/38] docs: kdoc_re: add support for groups()
  2026-02-18 10:12 [PATCH 00/38] docs: several improvements to kernel-doc Mauro Carvalho Chehab
@ 2026-02-18 10:12 ` Mauro Carvalho Chehab
  2026-02-18 10:12 ` [PATCH 02/38] docs: kdoc_re: don't go past the end of a line Mauro Carvalho Chehab
                   ` (39 subsequent siblings)
  40 siblings, 0 replies; 55+ messages in thread
From: Mauro Carvalho Chehab @ 2026-02-18 10:12 UTC (permalink / raw)
  To: Jonathan Corbet, Linux Doc Mailing List
  Cc: Mauro Carvalho Chehab, intel-wired-lan, linux-hardening,
	linux-kernel, netdev, Aleksandr Loktionov, Randy Dunlap

Add an equivalent to re groups() method.
This is useful on debug messages.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
---
 tools/lib/python/kdoc/kdoc_re.py | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/tools/lib/python/kdoc/kdoc_re.py b/tools/lib/python/kdoc/kdoc_re.py
index 0bf9e01cdc57..774dd747ecb0 100644
--- a/tools/lib/python/kdoc/kdoc_re.py
+++ b/tools/lib/python/kdoc/kdoc_re.py
@@ -106,6 +106,13 @@ class KernRe:
 
         return self.last_match.group(num)
 
+    def groups(self):
+        """
+        Returns the group results of the last match
+        """
+
+        return self.last_match.groups()
+
 
 class NestedMatch:
     """
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 02/38] docs: kdoc_re: don't go past the end of a line
  2026-02-18 10:12 [PATCH 00/38] docs: several improvements to kernel-doc Mauro Carvalho Chehab
  2026-02-18 10:12 ` [PATCH 01/38] docs: kdoc_re: add support for groups() Mauro Carvalho Chehab
@ 2026-02-18 10:12 ` Mauro Carvalho Chehab
  2026-02-18 10:12 ` [PATCH 03/38] docs: kdoc_parser: move var transformers to the beginning Mauro Carvalho Chehab
                   ` (38 subsequent siblings)
  40 siblings, 0 replies; 55+ messages in thread
From: Mauro Carvalho Chehab @ 2026-02-18 10:12 UTC (permalink / raw)
  To: Jonathan Corbet, Linux Doc Mailing List
  Cc: Mauro Carvalho Chehab, intel-wired-lan, linux-hardening,
	linux-kernel, netdev, Aleksandr Loktionov, Randy Dunlap

The logic which checks if the line ends with ";" is currently
broken: it may try to read past the buffer.

Fix it by checking before trying to access line[pos].

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
---
 tools/lib/python/kdoc/kdoc_re.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/lib/python/kdoc/kdoc_re.py b/tools/lib/python/kdoc/kdoc_re.py
index 774dd747ecb0..6c44fcce0415 100644
--- a/tools/lib/python/kdoc/kdoc_re.py
+++ b/tools/lib/python/kdoc/kdoc_re.py
@@ -269,7 +269,7 @@ class NestedMatch:
             out += new_sub
 
             # Drop end ';' if any
-            if line[pos] == ';':
+            if pos < len(line) and line[pos] == ';':
                 pos += 1
 
             cur_pos = pos
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 03/38] docs: kdoc_parser: move var transformers to the beginning
  2026-02-18 10:12 [PATCH 00/38] docs: several improvements to kernel-doc Mauro Carvalho Chehab
  2026-02-18 10:12 ` [PATCH 01/38] docs: kdoc_re: add support for groups() Mauro Carvalho Chehab
  2026-02-18 10:12 ` [PATCH 02/38] docs: kdoc_re: don't go past the end of a line Mauro Carvalho Chehab
@ 2026-02-18 10:12 ` Mauro Carvalho Chehab
  2026-02-18 10:12 ` [PATCH 04/38] docs: kdoc_parser: don't mangle with function defines Mauro Carvalho Chehab
                   ` (37 subsequent siblings)
  40 siblings, 0 replies; 55+ messages in thread
From: Mauro Carvalho Chehab @ 2026-02-18 10:12 UTC (permalink / raw)
  To: Jonathan Corbet, Linux Doc Mailing List
  Cc: Mauro Carvalho Chehab, intel-wired-lan, linux-hardening,
	linux-kernel, netdev, Aleksandr Loktionov, Randy Dunlap

Just like functions and structs had their transform variables
placed at the beginning, move variable transforms to there
as well.

No functional changes.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
---
 tools/lib/python/kdoc/kdoc_parser.py | 23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/tools/lib/python/kdoc/kdoc_parser.py b/tools/lib/python/kdoc/kdoc_parser.py
index ca00695b47b3..68a5aea9175d 100644
--- a/tools/lib/python/kdoc/kdoc_parser.py
+++ b/tools/lib/python/kdoc/kdoc_parser.py
@@ -192,6 +192,18 @@ function_xforms  = [
     (KernRe(r"__attribute__\s*\(\((?:[\w\s]+(?:\([^)]*\))?\s*,?)+\)\)\s+"), ""),
 ]
 
+#
+# Transforms for variable prototypes
+#
+var_xforms = [
+    (KernRe(r"__read_mostly"), ""),
+    (KernRe(r"__ro_after_init"), ""),
+    (KernRe(r"(?://.*)$"), ""),
+    (KernRe(r"(?:/\*.*\*/)"), ""),
+    (KernRe(r";$"), ""),
+    (KernRe(r"=.*"), ""),
+]
+
 #
 # Ancillary functions
 #
@@ -972,15 +984,6 @@ class KernelDoc:
         ]
         OPTIONAL_VAR_ATTR = "^(?:" + "|".join(VAR_ATTRIBS) + ")?"
 
-        sub_prefixes = [
-            (KernRe(r"__read_mostly"), ""),
-            (KernRe(r"__ro_after_init"), ""),
-            (KernRe(r"(?://.*)$"), ""),
-            (KernRe(r"(?:/\*.*\*/)"), ""),
-            (KernRe(r";$"), ""),
-            (KernRe(r"=.*"), ""),
-        ]
-
         #
         # Store the full prototype before modifying it
         #
@@ -1004,7 +1007,7 @@ class KernelDoc:
         # Drop comments and macros to have a pure C prototype
         #
         if not declaration_name:
-            for r, sub in sub_prefixes:
+            for r, sub in var_xforms:
                 proto = r.sub(sub, proto)
 
         proto = proto.rstrip()
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 04/38] docs: kdoc_parser: don't mangle with function defines
  2026-02-18 10:12 [PATCH 00/38] docs: several improvements to kernel-doc Mauro Carvalho Chehab
                   ` (2 preceding siblings ...)
  2026-02-18 10:12 ` [PATCH 03/38] docs: kdoc_parser: move var transformers to the beginning Mauro Carvalho Chehab
@ 2026-02-18 10:12 ` Mauro Carvalho Chehab
  2026-02-18 10:12 ` [PATCH 05/38] docs: kdoc_parser: add functions support for NestedMatch Mauro Carvalho Chehab
                   ` (36 subsequent siblings)
  40 siblings, 0 replies; 55+ messages in thread
From: Mauro Carvalho Chehab @ 2026-02-18 10:12 UTC (permalink / raw)
  To: Jonathan Corbet, Linux Doc Mailing List
  Cc: Mauro Carvalho Chehab, intel-wired-lan, linux-hardening,
	linux-kernel, netdev, Aleksandr Loktionov, Randy Dunlap

Mangling with #defines is not nice, as we may end removing
the macro names, preventing several macros from being properly
documented.

Also, on defines, we have something like:

	#define foo(a1, a2, a3, ...)			 \
		/* some real implementation */

The prototype part (first line on this example) won't contain
any macros, so no need to apply any regexes on it.

With that, move the apply_transforms() logic to ensure that
it will be called only on functions.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
---
 tools/lib/python/kdoc/kdoc_parser.py | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/tools/lib/python/kdoc/kdoc_parser.py b/tools/lib/python/kdoc/kdoc_parser.py
index 68a5aea9175d..9643ffb7584a 100644
--- a/tools/lib/python/kdoc/kdoc_parser.py
+++ b/tools/lib/python/kdoc/kdoc_parser.py
@@ -163,7 +163,7 @@ struct_nested_prefixes = [
 #
 # Transforms for function prototypes
 #
-function_xforms  = [
+function_xforms = [
     (KernRe(r"^static +"), ""),
     (KernRe(r"^extern +"), ""),
     (KernRe(r"^asmlinkage +"), ""),
@@ -1066,10 +1066,7 @@ class KernelDoc:
         found = func_macro = False
         return_type = ''
         decl_type = 'function'
-        #
-        # Apply the initial transformations.
-        #
-        prototype = apply_transforms(function_xforms, prototype)
+
         #
         # If we have a macro, remove the "#define" at the front.
         #
@@ -1088,6 +1085,11 @@ class KernelDoc:
                 declaration_name = r.group(1)
                 func_macro = True
                 found = True
+        else:
+            #
+            # Apply the initial transformations.
+            #
+            prototype = apply_transforms(function_xforms, prototype)
 
         # Yes, this truly is vile.  We are looking for:
         # 1. Return type (may be nothing if we're looking at a macro)
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 05/38] docs: kdoc_parser: add functions support for NestedMatch
  2026-02-18 10:12 [PATCH 00/38] docs: several improvements to kernel-doc Mauro Carvalho Chehab
                   ` (3 preceding siblings ...)
  2026-02-18 10:12 ` [PATCH 04/38] docs: kdoc_parser: don't mangle with function defines Mauro Carvalho Chehab
@ 2026-02-18 10:12 ` Mauro Carvalho Chehab
  2026-02-18 10:12 ` [PATCH 06/38] docs: kdoc_parser: use NestedMatch to handle __attribute__ on functions Mauro Carvalho Chehab
                   ` (35 subsequent siblings)
  40 siblings, 0 replies; 55+ messages in thread
From: Mauro Carvalho Chehab @ 2026-02-18 10:12 UTC (permalink / raw)
  To: Jonathan Corbet, Linux Doc Mailing List
  Cc: Mauro Carvalho Chehab, intel-wired-lan, linux-hardening,
	linux-kernel, netdev, Aleksandr Loktionov, Randy Dunlap

Some annotations macros may have nested parenthesis, causing normal
regex parsing to fail.

Extend apply_transforms to also use NestedMatch and add support
for nested functions.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
---
 tools/lib/python/kdoc/kdoc_parser.py | 38 ++++++++++++++++++----------
 1 file changed, 25 insertions(+), 13 deletions(-)

diff --git a/tools/lib/python/kdoc/kdoc_parser.py b/tools/lib/python/kdoc/kdoc_parser.py
index 9643ffb7584a..af0ab732048b 100644
--- a/tools/lib/python/kdoc/kdoc_parser.py
+++ b/tools/lib/python/kdoc/kdoc_parser.py
@@ -152,7 +152,7 @@ struct_xforms = [
     (KernRe(r'DEFINE_DMA_UNMAP_LEN\s*\(' + struct_args_pattern + r'\)', re.S), r'__u32 \1'),
 ]
 #
-# Regexes here are guaranteed to have the end delimiter matching
+# Struct regexes here are guaranteed to have the end delimiter matching
 # the start delimiter. Yet, right now, only one replace group
 # is allowed.
 #
@@ -160,6 +160,13 @@ struct_nested_prefixes = [
     (re.compile(r'\bSTRUCT_GROUP\('), r'\1'),
 ]
 
+#
+# Function Regexes here are guaranteed to have the end delimiter matching
+# the start delimiter.
+#
+function_nested_prefixes = [
+]
+
 #
 # Transforms for function prototypes
 #
@@ -208,13 +215,6 @@ var_xforms = [
 # Ancillary functions
 #
 
-def apply_transforms(xforms, text):
-    """
-    Apply a set of transforms to a block of text.
-    """
-    for search, subst in xforms:
-        text = search.sub(subst, text)
-    return text
 
 multi_space = KernRe(r'\s\s+')
 def trim_whitespace(s):
@@ -409,6 +409,8 @@ class KernelDoc:
         # Place all potential outputs into an array
         self.entries = []
 
+        self.nested = NestedMatch()
+
         #
         # We need Python 3.7 for its "dicts remember the insertion
         # order" guarantee
@@ -506,6 +508,16 @@ class KernelDoc:
         # State flags
         self.state = state.NORMAL
 
+    def apply_transforms(self, regex_xforms, nested_xforms, text):
+        """Apply a set of transforms to a block of text."""
+        for search, subst in regex_xforms:
+            text = search.sub(subst, text)
+
+        for search, sub in nested_xforms:
+            text = self.nested.sub(search, sub, text)
+
+        return text.strip()
+
     def push_parameter(self, ln, decl_type, param, dtype,
                        org_arg, declaration_name):
         """
@@ -882,11 +894,9 @@ class KernelDoc:
         # Go through the list of members applying all of our transformations.
         #
         members = trim_private_members(members)
-        members = apply_transforms(struct_xforms, members)
+        members = self.apply_transforms(struct_xforms, struct_nested_prefixes,
+                                        members)
 
-        nested = NestedMatch()
-        for search, sub in struct_nested_prefixes:
-            members = nested.sub(search, sub, members)
         #
         # Deal with embedded struct and union members, and drop enums entirely.
         #
@@ -1089,7 +1099,9 @@ class KernelDoc:
             #
             # Apply the initial transformations.
             #
-            prototype = apply_transforms(function_xforms, prototype)
+            prototype = self.apply_transforms(function_xforms,
+                                              function_nested_prefixes,
+                                              prototype)
 
         # Yes, this truly is vile.  We are looking for:
         # 1. Return type (may be nothing if we're looking at a macro)
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 06/38] docs: kdoc_parser: use NestedMatch to handle __attribute__ on functions
  2026-02-18 10:12 [PATCH 00/38] docs: several improvements to kernel-doc Mauro Carvalho Chehab
                   ` (4 preceding siblings ...)
  2026-02-18 10:12 ` [PATCH 05/38] docs: kdoc_parser: add functions support for NestedMatch Mauro Carvalho Chehab
@ 2026-02-18 10:12 ` Mauro Carvalho Chehab
  2026-02-18 10:12 ` [PATCH 07/38] docs: kdoc_parser: fix variable regexes to work with size_t Mauro Carvalho Chehab
                   ` (34 subsequent siblings)
  40 siblings, 0 replies; 55+ messages in thread
From: Mauro Carvalho Chehab @ 2026-02-18 10:12 UTC (permalink / raw)
  To: Jonathan Corbet, Linux Doc Mailing List
  Cc: Mauro Carvalho Chehab, intel-wired-lan, linux-hardening,
	linux-kernel, netdev, Aleksandr Loktionov, Randy Dunlap

Some annotations macros may have nested parenthesis, causing normal
regex parsing to fail. The __attribute__ regex is currently very
complex to try to avoid that, but it doesn't catch all cases.

Ensure that the parenthesis will be properly handled by using
the NestedMatch() logic.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
---
 tools/lib/python/kdoc/kdoc_parser.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/lib/python/kdoc/kdoc_parser.py b/tools/lib/python/kdoc/kdoc_parser.py
index af0ab732048b..b704755d2f0a 100644
--- a/tools/lib/python/kdoc/kdoc_parser.py
+++ b/tools/lib/python/kdoc/kdoc_parser.py
@@ -165,6 +165,7 @@ struct_nested_prefixes = [
 # the start delimiter.
 #
 function_nested_prefixes = [
+    (re.compile(r"__attribute__\s*\("), ""),
 ]
 
 #
@@ -196,7 +197,6 @@ function_xforms = [
     (KernRe(r"__diagnose_as\s*\(\s*\S+\s*(?:,\s*\d+\s*)*\) +"), ""),
     (KernRe(r"DECL_BUCKET_PARAMS\s*\(\s*(\S+)\s*,\s*(\S+)\s*\)"), r"\1, \2"),
     (KernRe(r"__attribute_const__ +"), ""),
-    (KernRe(r"__attribute__\s*\(\((?:[\w\s]+(?:\([^)]*\))?\s*,?)+\)\)\s+"), ""),
 ]
 
 #
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 07/38] docs: kdoc_parser: fix variable regexes to work with size_t
  2026-02-18 10:12 [PATCH 00/38] docs: several improvements to kernel-doc Mauro Carvalho Chehab
                   ` (5 preceding siblings ...)
  2026-02-18 10:12 ` [PATCH 06/38] docs: kdoc_parser: use NestedMatch to handle __attribute__ on functions Mauro Carvalho Chehab
@ 2026-02-18 10:12 ` Mauro Carvalho Chehab
  2026-02-18 10:12 ` [PATCH 08/38] docs: kdoc_parser: fix the default_value logic for variables Mauro Carvalho Chehab
                   ` (33 subsequent siblings)
  40 siblings, 0 replies; 55+ messages in thread
From: Mauro Carvalho Chehab @ 2026-02-18 10:12 UTC (permalink / raw)
  To: Jonathan Corbet, Linux Doc Mailing List
  Cc: Mauro Carvalho Chehab, intel-wired-lan, linux-hardening,
	linux-kernel, netdev, Aleksandr Loktionov, Randy Dunlap

The regular expressions meant to pick variable types are too
naive: they forgot that the type word may contain underlines.

Co-developed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
---
 tools/lib/python/kdoc/kdoc_parser.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/lib/python/kdoc/kdoc_parser.py b/tools/lib/python/kdoc/kdoc_parser.py
index b704755d2f0a..b63d91b7f79e 100644
--- a/tools/lib/python/kdoc/kdoc_parser.py
+++ b/tools/lib/python/kdoc/kdoc_parser.py
@@ -1028,14 +1028,14 @@ class KernelDoc:
 
         default_val = None
 
-        r= KernRe(OPTIONAL_VAR_ATTR + r"\w.*\s+(?:\*+)?([\w_]+)\s*[\d\]\[]*\s*(=.*)?")
+        r= KernRe(OPTIONAL_VAR_ATTR + r"[\w_]*\s+(?:\*+)?([\w_]+)\s*[\d\]\[]*\s*(=.*)?")
         if r.match(proto):
             if not declaration_name:
                 declaration_name = r.group(1)
 
             default_val = r.group(2)
         else:
-            r= KernRe(OPTIONAL_VAR_ATTR + r"(?:\w.*)?\s+(?:\*+)?(?:[\w_]+)\s*[\d\]\[]*\s*(=.*)?")
+            r= KernRe(OPTIONAL_VAR_ATTR + r"(?:[\w_]*)?\s+(?:\*+)?(?:[\w_]+)\s*[\d\]\[]*\s*(=.*)?")
         if r.match(proto):
             default_val = r.group(1)
 
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 08/38] docs: kdoc_parser: fix the default_value logic for variables
  2026-02-18 10:12 [PATCH 00/38] docs: several improvements to kernel-doc Mauro Carvalho Chehab
                   ` (6 preceding siblings ...)
  2026-02-18 10:12 ` [PATCH 07/38] docs: kdoc_parser: fix variable regexes to work with size_t Mauro Carvalho Chehab
@ 2026-02-18 10:12 ` Mauro Carvalho Chehab
  2026-02-18 10:12 ` [PATCH 09/38] docs: kdoc_parser: add some debug for variable parsing Mauro Carvalho Chehab
                   ` (32 subsequent siblings)
  40 siblings, 0 replies; 55+ messages in thread
From: Mauro Carvalho Chehab @ 2026-02-18 10:12 UTC (permalink / raw)
  To: Jonathan Corbet, Linux Doc Mailing List
  Cc: Mauro Carvalho Chehab, intel-wired-lan, linux-hardening,
	linux-kernel, netdev, Aleksandr Loktionov, Randy Dunlap

The indentation is wrong for the second regex, which causes
problems on variables with defaults.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
---
 tools/lib/python/kdoc/kdoc_parser.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/lib/python/kdoc/kdoc_parser.py b/tools/lib/python/kdoc/kdoc_parser.py
index b63d91b7f79e..abfa693051cb 100644
--- a/tools/lib/python/kdoc/kdoc_parser.py
+++ b/tools/lib/python/kdoc/kdoc_parser.py
@@ -1036,9 +1036,9 @@ class KernelDoc:
             default_val = r.group(2)
         else:
             r= KernRe(OPTIONAL_VAR_ATTR + r"(?:[\w_]*)?\s+(?:\*+)?(?:[\w_]+)\s*[\d\]\[]*\s*(=.*)?")
-        if r.match(proto):
-            default_val = r.group(1)
 
+            if r.match(proto):
+                default_val = r.group(1)
         if not declaration_name:
            self.emit_msg(ln,f"{proto}: can't parse variable")
            return
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 09/38] docs: kdoc_parser: add some debug for variable parsing
  2026-02-18 10:12 [PATCH 00/38] docs: several improvements to kernel-doc Mauro Carvalho Chehab
                   ` (7 preceding siblings ...)
  2026-02-18 10:12 ` [PATCH 08/38] docs: kdoc_parser: fix the default_value logic for variables Mauro Carvalho Chehab
@ 2026-02-18 10:12 ` Mauro Carvalho Chehab
  2026-02-18 10:12 ` [PATCH 10/38] docs: kdoc_parser: don't exclude defaults from prototype Mauro Carvalho Chehab
                   ` (31 subsequent siblings)
  40 siblings, 0 replies; 55+ messages in thread
From: Mauro Carvalho Chehab @ 2026-02-18 10:12 UTC (permalink / raw)
  To: Jonathan Corbet, Linux Doc Mailing List
  Cc: Mauro Carvalho Chehab, intel-wired-lan, linux-hardening,
	linux-kernel, netdev, Aleksandr Loktionov, Randy Dunlap

This is a new parser that we're still fine-tuning. Add some
extra debug messages to help addressing issues over there.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
---
 tools/lib/python/kdoc/kdoc_parser.py | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/tools/lib/python/kdoc/kdoc_parser.py b/tools/lib/python/kdoc/kdoc_parser.py
index abfa693051cb..9559cbfd5e4c 100644
--- a/tools/lib/python/kdoc/kdoc_parser.py
+++ b/tools/lib/python/kdoc/kdoc_parser.py
@@ -1034,11 +1034,19 @@ class KernelDoc:
                 declaration_name = r.group(1)
 
             default_val = r.group(2)
+
+            self.config.log.debug("Variable proto parser: %s from '%s'",
+                                  r.groups(), proto)
+
         else:
             r= KernRe(OPTIONAL_VAR_ATTR + r"(?:[\w_]*)?\s+(?:\*+)?(?:[\w_]+)\s*[\d\]\[]*\s*(=.*)?")
 
             if r.match(proto):
                 default_val = r.group(1)
+
+        if default_val:
+            self.config.log.debug("default: '%s'", default_val)
+
         if not declaration_name:
            self.emit_msg(ln,f"{proto}: can't parse variable")
            return
@@ -1046,6 +1054,9 @@ class KernelDoc:
         if default_val:
             default_val = default_val.lstrip("=").strip()
 
+        self.config.log.debug("'%s' variable prototype: '%s', default: %s",
+                              declaration_name, proto, default_val)
+
         self.output_declaration("var", declaration_name,
                                 full_proto=full_proto,
                                 default_val=default_val,
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 10/38] docs: kdoc_parser: don't exclude defaults from prototype
  2026-02-18 10:12 [PATCH 00/38] docs: several improvements to kernel-doc Mauro Carvalho Chehab
                   ` (8 preceding siblings ...)
  2026-02-18 10:12 ` [PATCH 09/38] docs: kdoc_parser: add some debug for variable parsing Mauro Carvalho Chehab
@ 2026-02-18 10:12 ` Mauro Carvalho Chehab
  2026-02-18 10:12 ` [PATCH 11/38] docs: kdoc_parser: fix parser to support multi-word types Mauro Carvalho Chehab
                   ` (30 subsequent siblings)
  40 siblings, 0 replies; 55+ messages in thread
From: Mauro Carvalho Chehab @ 2026-02-18 10:12 UTC (permalink / raw)
  To: Jonathan Corbet, Linux Doc Mailing List
  Cc: Mauro Carvalho Chehab, intel-wired-lan, linux-hardening,
	linux-kernel, netdev, Aleksandr Loktionov, Randy Dunlap

If we do that, the defaults won't be parsed.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
---
 tools/lib/python/kdoc/kdoc_parser.py | 1 -
 1 file changed, 1 deletion(-)

diff --git a/tools/lib/python/kdoc/kdoc_parser.py b/tools/lib/python/kdoc/kdoc_parser.py
index 9559cbfd5e4c..d8e96c6c4ebc 100644
--- a/tools/lib/python/kdoc/kdoc_parser.py
+++ b/tools/lib/python/kdoc/kdoc_parser.py
@@ -208,7 +208,6 @@ var_xforms = [
     (KernRe(r"(?://.*)$"), ""),
     (KernRe(r"(?:/\*.*\*/)"), ""),
     (KernRe(r";$"), ""),
-    (KernRe(r"=.*"), ""),
 ]
 
 #
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 11/38] docs: kdoc_parser: fix parser to support multi-word types
  2026-02-18 10:12 [PATCH 00/38] docs: several improvements to kernel-doc Mauro Carvalho Chehab
                   ` (9 preceding siblings ...)
  2026-02-18 10:12 ` [PATCH 10/38] docs: kdoc_parser: don't exclude defaults from prototype Mauro Carvalho Chehab
@ 2026-02-18 10:12 ` Mauro Carvalho Chehab
  2026-02-18 10:12 ` [PATCH 12/38] docs: kdoc_parser: ignore context analysis and lock attributes Mauro Carvalho Chehab
                   ` (29 subsequent siblings)
  40 siblings, 0 replies; 55+ messages in thread
From: Mauro Carvalho Chehab @ 2026-02-18 10:12 UTC (permalink / raw)
  To: Jonathan Corbet, Linux Doc Mailing List
  Cc: Mauro Carvalho Chehab, intel-wired-lan, linux-hardening,
	linux-kernel, netdev, Aleksandr Loktionov, Randy Dunlap

The regular expression currently expects a single word for the
type, but it may be something like  "struct foo".

Add support for it.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
---
 tools/lib/python/kdoc/kdoc_parser.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/lib/python/kdoc/kdoc_parser.py b/tools/lib/python/kdoc/kdoc_parser.py
index d8e96c6c4ebc..f524385543a6 100644
--- a/tools/lib/python/kdoc/kdoc_parser.py
+++ b/tools/lib/python/kdoc/kdoc_parser.py
@@ -1027,7 +1027,7 @@ class KernelDoc:
 
         default_val = None
 
-        r= KernRe(OPTIONAL_VAR_ATTR + r"[\w_]*\s+(?:\*+)?([\w_]+)\s*[\d\]\[]*\s*(=.*)?")
+        r= KernRe(OPTIONAL_VAR_ATTR + r"\s*[\w_\s]*\s+(?:\*+)?([\w_]+)\s*[\d\]\[]*\s*(=.*)?")
         if r.match(proto):
             if not declaration_name:
                 declaration_name = r.group(1)
@@ -1038,7 +1038,7 @@ class KernelDoc:
                                   r.groups(), proto)
 
         else:
-            r= KernRe(OPTIONAL_VAR_ATTR + r"(?:[\w_]*)?\s+(?:\*+)?(?:[\w_]+)\s*[\d\]\[]*\s*(=.*)?")
+            r= KernRe(OPTIONAL_VAR_ATTR + r"(?:[\w_\s]*)?\s+(?:\*+)?(?:[\w_]+)\s*[\d\]\[]*\s*(=.*)?")
 
             if r.match(proto):
                 default_val = r.group(1)
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 12/38] docs: kdoc_parser: ignore context analysis and lock attributes
  2026-02-18 10:12 [PATCH 00/38] docs: several improvements to kernel-doc Mauro Carvalho Chehab
                   ` (10 preceding siblings ...)
  2026-02-18 10:12 ` [PATCH 11/38] docs: kdoc_parser: fix parser to support multi-word types Mauro Carvalho Chehab
@ 2026-02-18 10:12 ` Mauro Carvalho Chehab
  2026-02-18 10:12 ` [PATCH 13/38] docs: kdoc_parser: add support for LIST_HEAD Mauro Carvalho Chehab
                   ` (28 subsequent siblings)
  40 siblings, 0 replies; 55+ messages in thread
From: Mauro Carvalho Chehab @ 2026-02-18 10:12 UTC (permalink / raw)
  To: Jonathan Corbet, Linux Doc Mailing List
  Cc: Mauro Carvalho Chehab, intel-wired-lan, linux-hardening,
	linux-kernel, netdev, Aleksandr Loktionov, Randy Dunlap,
	Stephen Rothwell

From: Randy Dunlap <rdunlap@infradead.org>

Drop all context analysis and lock (tracking) attributes to avoid
kernel-doc warnings.

Documentation/core-api/kref:328: ../include/linux/kref.h:72: WARNING: Invalid C declaration: Expected end of definition. [error at 96]
  int kref_put_mutex (struct kref *kref, void (*release)(struct kref *kref), struct mutex *mutex) __cond_acquires(true# mutex)
  ------------------------------------------------------------------------------------------------^
Documentation/core-api/kref:328: ../include/linux/kref.h:94: WARNING: Invalid C declaration: Expected end of definition. [error at 92]
  int kref_put_lock (struct kref *kref, void (*release)(struct kref *kref), spinlock_t *lock) __cond_acquires(true# lock)
  --------------------------------------------------------------------------------------------^

The regex is suggested by Mauro; mine was too greedy. Thanks.
Updated context analysis and lock macros list provided by PeterZ. Thanks.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/all/20260107161548.45530e1c@canb.auug.org.au/
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 tools/lib/python/kdoc/kdoc_parser.py | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/tools/lib/python/kdoc/kdoc_parser.py b/tools/lib/python/kdoc/kdoc_parser.py
index f524385543a6..25d8a89f32b2 100644
--- a/tools/lib/python/kdoc/kdoc_parser.py
+++ b/tools/lib/python/kdoc/kdoc_parser.py
@@ -81,6 +81,8 @@ struct_xforms = [
     (KernRe(r'\s*__aligned\s*\([^;]*\)', re.S), ' '),
     (KernRe(r'\s*__counted_by\s*\([^;]*\)', re.S), ' '),
     (KernRe(r'\s*__counted_by_(le|be)\s*\([^;]*\)', re.S), ' '),
+    (KernRe(r'\s*__guarded_by\s*\([^\)]*\)', re.S), ' '),
+    (KernRe(r'\s*__pt_guarded_by\s*\([^\)]*\)', re.S), ' '),
     (KernRe(r'\s*__packed\s*', re.S), ' '),
     (KernRe(r'\s*CRYPTO_MINALIGN_ATTR', re.S), ' '),
     (KernRe(r'\s*__private', re.S), ' '),
@@ -165,6 +167,16 @@ struct_nested_prefixes = [
 # the start delimiter.
 #
 function_nested_prefixes = [
+    (re.compile(r"__cond_acquires\s*\("), ""),
+    (re.compile(r"__cond_releases\s*\("), ""),
+    (re.compile(r"__acquires\s*\("), ""),
+    (re.compile(r"__releases\s*\("), ""),
+    (re.compile(r"__must_hold\s*\("), ""),
+    (re.compile(r"__must_not_hold\s*\("), ""),
+    (re.compile(r"__must_hold_shared\s*\("), ""),
+    (re.compile(r"__cond_acquires_shared\s*\("), ""),
+    (re.compile(r"__acquires_shared\s*\("), ""),
+    (re.compile(r"__releases_shared\s*\("), ""),
     (re.compile(r"__attribute__\s*\("), ""),
 ]
 
@@ -196,6 +208,7 @@ function_xforms = [
     (KernRe(r"__(?:re)?alloc_size\s*\(\s*\d+\s*(?:,\s*\d+\s*)?\) +"), ""),
     (KernRe(r"__diagnose_as\s*\(\s*\S+\s*(?:,\s*\d+\s*)*\) +"), ""),
     (KernRe(r"DECL_BUCKET_PARAMS\s*\(\s*(\S+)\s*,\s*(\S+)\s*\)"), r"\1, \2"),
+    (KernRe(r"__no_context_analysis\s*"), ""),
     (KernRe(r"__attribute_const__ +"), ""),
 ]
 
@@ -205,6 +218,8 @@ function_xforms = [
 var_xforms = [
     (KernRe(r"__read_mostly"), ""),
     (KernRe(r"__ro_after_init"), ""),
+    (KernRe(r'\s*__guarded_by\s*\([^\)]*\)', re.S), ""),
+    (KernRe(r'\s*__pt_guarded_by\s*\([^\)]*\)', re.S), ""),
     (KernRe(r"(?://.*)$"), ""),
     (KernRe(r"(?:/\*.*\*/)"), ""),
     (KernRe(r";$"), ""),
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 13/38] docs: kdoc_parser: add support for LIST_HEAD
  2026-02-18 10:12 [PATCH 00/38] docs: several improvements to kernel-doc Mauro Carvalho Chehab
                   ` (11 preceding siblings ...)
  2026-02-18 10:12 ` [PATCH 12/38] docs: kdoc_parser: ignore context analysis and lock attributes Mauro Carvalho Chehab
@ 2026-02-18 10:12 ` Mauro Carvalho Chehab
  2026-02-18 10:12 ` [PATCH 14/38] docs: kdoc_parser: handle struct member macro VIRTIO_DECLARE_FEATURES(name) Mauro Carvalho Chehab
                   ` (27 subsequent siblings)
  40 siblings, 0 replies; 55+ messages in thread
From: Mauro Carvalho Chehab @ 2026-02-18 10:12 UTC (permalink / raw)
  To: Jonathan Corbet, Linux Doc Mailing List
  Cc: Mauro Carvalho Chehab, intel-wired-lan, linux-hardening,
	linux-kernel, netdev, Aleksandr Loktionov, Randy Dunlap

Convert LIST_HEAD into struct list_head when handling its
prototype.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
---
 tools/lib/python/kdoc/kdoc_parser.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/lib/python/kdoc/kdoc_parser.py b/tools/lib/python/kdoc/kdoc_parser.py
index 25d8a89f32b2..6fe2fa032900 100644
--- a/tools/lib/python/kdoc/kdoc_parser.py
+++ b/tools/lib/python/kdoc/kdoc_parser.py
@@ -220,6 +220,7 @@ var_xforms = [
     (KernRe(r"__ro_after_init"), ""),
     (KernRe(r'\s*__guarded_by\s*\([^\)]*\)', re.S), ""),
     (KernRe(r'\s*__pt_guarded_by\s*\([^\)]*\)', re.S), ""),
+    (KernRe(r"LIST_HEAD\(([\w_]+)\)"), r"struct list_head \1"),
     (KernRe(r"(?://.*)$"), ""),
     (KernRe(r"(?:/\*.*\*/)"), ""),
     (KernRe(r";$"), ""),
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 14/38] docs: kdoc_parser: handle struct member macro VIRTIO_DECLARE_FEATURES(name)
  2026-02-18 10:12 [PATCH 00/38] docs: several improvements to kernel-doc Mauro Carvalho Chehab
                   ` (12 preceding siblings ...)
  2026-02-18 10:12 ` [PATCH 13/38] docs: kdoc_parser: add support for LIST_HEAD Mauro Carvalho Chehab
@ 2026-02-18 10:12 ` Mauro Carvalho Chehab
  2026-02-18 10:12 ` [PATCH 15/38] docs: kdoc_re: properly handle strings and escape chars on it Mauro Carvalho Chehab
                   ` (26 subsequent siblings)
  40 siblings, 0 replies; 55+ messages in thread
From: Mauro Carvalho Chehab @ 2026-02-18 10:12 UTC (permalink / raw)
  To: Jonathan Corbet, Linux Doc Mailing List
  Cc: Mauro Carvalho Chehab, intel-wired-lan, linux-hardening,
	linux-kernel, netdev, Aleksandr Loktionov, Randy Dunlap

From: Randy Dunlap <rdunlap@infradead.org>

Parse the macro VIRTIO_DECLARE_FEATURES(name) and expand it to its
definition. These prevents one build warning:

WARNING: include/linux/virtio.h:188 struct member 'VIRTIO_DECLARE_FEATURES(features' not described in 'virtio_device'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
---
 tools/lib/python/kdoc/kdoc_parser.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/lib/python/kdoc/kdoc_parser.py b/tools/lib/python/kdoc/kdoc_parser.py
index 6fe2fa032900..32a30851db08 100644
--- a/tools/lib/python/kdoc/kdoc_parser.py
+++ b/tools/lib/python/kdoc/kdoc_parser.py
@@ -152,6 +152,7 @@ struct_xforms = [
             struct_args_pattern + r'\)', re.S), r'\1 \2[]'),
     (KernRe(r'DEFINE_DMA_UNMAP_ADDR\s*\(' + struct_args_pattern + r'\)', re.S), r'dma_addr_t \1'),
     (KernRe(r'DEFINE_DMA_UNMAP_LEN\s*\(' + struct_args_pattern + r'\)', re.S), r'__u32 \1'),
+    (KernRe(r'VIRTIO_DECLARE_FEATURES\(([\w_]+)\)'), r'union { u64 \1; u64 \1_array[VIRTIO_FEATURES_U64S]; }'),
 ]
 #
 # Struct regexes here are guaranteed to have the end delimiter matching
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 15/38] docs: kdoc_re: properly handle strings and escape chars on it
  2026-02-18 10:12 [PATCH 00/38] docs: several improvements to kernel-doc Mauro Carvalho Chehab
                   ` (13 preceding siblings ...)
  2026-02-18 10:12 ` [PATCH 14/38] docs: kdoc_parser: handle struct member macro VIRTIO_DECLARE_FEATURES(name) Mauro Carvalho Chehab
@ 2026-02-18 10:12 ` Mauro Carvalho Chehab
  2026-02-18 10:12 ` [PATCH 16/38] docs: kdoc_re: better show KernRe() at documentation Mauro Carvalho Chehab
                   ` (25 subsequent siblings)
  40 siblings, 0 replies; 55+ messages in thread
From: Mauro Carvalho Chehab @ 2026-02-18 10:12 UTC (permalink / raw)
  To: Jonathan Corbet, Linux Doc Mailing List
  Cc: Mauro Carvalho Chehab, intel-wired-lan, linux-hardening,
	linux-kernel, netdev, Aleksandr Loktionov, Randy Dunlap

The logic inside NestedMatch currently doesn't consider that
function arguments may have chars and strings, which may
eventually contain delimiters.

Add logic to handle strings and escape characters on them.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
---
 tools/lib/python/kdoc/kdoc_re.py | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/tools/lib/python/kdoc/kdoc_re.py b/tools/lib/python/kdoc/kdoc_re.py
index 6c44fcce0415..420cb8879ba3 100644
--- a/tools/lib/python/kdoc/kdoc_re.py
+++ b/tools/lib/python/kdoc/kdoc_re.py
@@ -195,6 +195,8 @@ class NestedMatch:
         for match_re in regex.finditer(line):
             start = match_re.start()
             offset = match_re.end()
+            string_char = None
+            escape = False
 
             d = line[offset - 1]
             if d not in self.DELIMITER_PAIRS:
@@ -208,6 +210,22 @@ class NestedMatch:
 
                 d = line[pos]
 
+                if escape:
+                    escape = False
+                    continue
+
+                if string_char:
+                    if d == '\\':
+                        escape = True
+                    elif d == string_char:
+                        string_char = None
+
+                    continue
+
+                if d in ('"', "'"):
+                    string_char = d
+                    continue
+
                 if d in self.DELIMITER_PAIRS:
                     end = self.DELIMITER_PAIRS[d]
 
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 16/38] docs: kdoc_re: better show KernRe() at documentation
  2026-02-18 10:12 [PATCH 00/38] docs: several improvements to kernel-doc Mauro Carvalho Chehab
                   ` (14 preceding siblings ...)
  2026-02-18 10:12 ` [PATCH 15/38] docs: kdoc_re: properly handle strings and escape chars on it Mauro Carvalho Chehab
@ 2026-02-18 10:12 ` Mauro Carvalho Chehab
  2026-02-18 10:12 ` [PATCH 17/38] docs: kdoc_re: don't recompile NestedMatch regex every time Mauro Carvalho Chehab
                   ` (24 subsequent siblings)
  40 siblings, 0 replies; 55+ messages in thread
From: Mauro Carvalho Chehab @ 2026-02-18 10:12 UTC (permalink / raw)
  To: Jonathan Corbet, Linux Doc Mailing List
  Cc: Mauro Carvalho Chehab, intel-wired-lan, linux-hardening,
	linux-kernel, netdev, Aleksandr Loktionov, Randy Dunlap

the __repr__() function is used by autodoc to document macro
initialization.

Add a better representation for them.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
---
 tools/lib/python/kdoc/kdoc_re.py | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/tools/lib/python/kdoc/kdoc_re.py b/tools/lib/python/kdoc/kdoc_re.py
index 420cb8879ba3..0a7f12616f9f 100644
--- a/tools/lib/python/kdoc/kdoc_re.py
+++ b/tools/lib/python/kdoc/kdoc_re.py
@@ -52,7 +52,28 @@ class KernRe:
         return self.regex.pattern
 
     def __repr__(self):
-        return f're.compile("{self.regex.pattern}")'
+        """
+        Returns a displayable version of the class init.
+        """
+
+        flag_map = {
+            re.IGNORECASE: "re.I",
+            re.MULTILINE: "re.M",
+            re.DOTALL: "re.S",
+            re.VERBOSE: "re.X",
+        }
+
+        flags = []
+        for flag, name in flag_map.items():
+            if self.regex.flags & flag:
+                flags.append(name)
+
+        flags_name = " | ".join(flags)
+
+        if flags_name:
+            return f'KernRe("{self.regex.pattern}", {flags_name})'
+        else:
+            return f'KernRe("{self.regex.pattern}")'
 
     def __add__(self, other):
         """
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 17/38] docs: kdoc_re: don't recompile NestedMatch regex every time
  2026-02-18 10:12 [PATCH 00/38] docs: several improvements to kernel-doc Mauro Carvalho Chehab
                   ` (15 preceding siblings ...)
  2026-02-18 10:12 ` [PATCH 16/38] docs: kdoc_re: better show KernRe() at documentation Mauro Carvalho Chehab
@ 2026-02-18 10:12 ` Mauro Carvalho Chehab
  2026-02-18 10:12 ` [PATCH 18/38] docs: kdoc_re: Change NestedMath args replacement to \0 Mauro Carvalho Chehab
                   ` (23 subsequent siblings)
  40 siblings, 0 replies; 55+ messages in thread
From: Mauro Carvalho Chehab @ 2026-02-18 10:12 UTC (permalink / raw)
  To: Jonathan Corbet, Linux Doc Mailing List
  Cc: Mauro Carvalho Chehab, intel-wired-lan, linux-hardening,
	linux-kernel, netdev, Aleksandr Loktionov, Randy Dunlap

Store delimiters and its regex-compiled version as const vars.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
---
 tools/lib/python/kdoc/kdoc_re.py | 35 ++++++++++++++++++++------------
 1 file changed, 22 insertions(+), 13 deletions(-)

diff --git a/tools/lib/python/kdoc/kdoc_re.py b/tools/lib/python/kdoc/kdoc_re.py
index 0a7f12616f9f..00afa5bccd6d 100644
--- a/tools/lib/python/kdoc/kdoc_re.py
+++ b/tools/lib/python/kdoc/kdoc_re.py
@@ -99,6 +99,13 @@ class KernRe:
         self.last_match = self.regex.search(string)
         return self.last_match
 
+    def finditer(self,  string):
+        """
+        Alias to re.finditer.
+        """
+
+        return self.regex.finditer(string)
+
     def findall(self, string):
         """
         Alias to re.findall.
@@ -134,6 +141,16 @@ class KernRe:
 
         return self.last_match.groups()
 
+#: Nested delimited pairs (brackets and parenthesis)
+DELIMITER_PAIRS = {
+    '{': '}',
+    '(': ')',
+    '[': ']',
+}
+
+#: compiled delimiters
+RE_DELIM = KernRe(r'[\{\}\[\]\(\)]')
+
 
 class NestedMatch:
     """
@@ -183,14 +200,6 @@ class NestedMatch:
     #
     #   FOO(arg1, arg2, arg3)
 
-    DELIMITER_PAIRS = {
-        '{': '}',
-        '(': ')',
-        '[': ']',
-    }
-
-    RE_DELIM = re.compile(r'[\{\}\[\]\(\)]')
-
     def _search(self, regex, line):
         """
         Finds paired blocks for a regex that ends with a delimiter.
@@ -220,13 +229,13 @@ class NestedMatch:
             escape = False
 
             d = line[offset - 1]
-            if d not in self.DELIMITER_PAIRS:
+            if d not in DELIMITER_PAIRS:
                 continue
 
-            end = self.DELIMITER_PAIRS[d]
+            end = DELIMITER_PAIRS[d]
             stack.append(end)
 
-            for match in self.RE_DELIM.finditer(line[offset:]):
+            for match in RE_DELIM.finditer(line[offset:]):
                 pos = match.start() + offset
 
                 d = line[pos]
@@ -247,8 +256,8 @@ class NestedMatch:
                     string_char = d
                     continue
 
-                if d in self.DELIMITER_PAIRS:
-                    end = self.DELIMITER_PAIRS[d]
+                if d in DELIMITER_PAIRS:
+                    end = DELIMITER_PAIRS[d]
 
                     stack.append(end)
                     continue
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 18/38] docs: kdoc_re: Change NestedMath args replacement to \0
  2026-02-18 10:12 [PATCH 00/38] docs: several improvements to kernel-doc Mauro Carvalho Chehab
                   ` (16 preceding siblings ...)
  2026-02-18 10:12 ` [PATCH 17/38] docs: kdoc_re: don't recompile NestedMatch regex every time Mauro Carvalho Chehab
@ 2026-02-18 10:12 ` Mauro Carvalho Chehab
  2026-02-18 10:12 ` [PATCH 19/38] docs: kdoc_re: make NestedMatch use KernRe Mauro Carvalho Chehab
                   ` (22 subsequent siblings)
  40 siblings, 0 replies; 55+ messages in thread
From: Mauro Carvalho Chehab @ 2026-02-18 10:12 UTC (permalink / raw)
  To: Jonathan Corbet, Linux Doc Mailing List
  Cc: Mauro Carvalho Chehab, intel-wired-lan, linux-hardening,
	linux-kernel, netdev, Aleksandr Loktionov, Randy Dunlap

Future patches will allow parsing each argument instead of the
hole set. Prepare for it by changing the replace all args from
\1 to \0.

No functional changes.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
---
 tools/lib/python/kdoc/kdoc_parser.py | 2 +-
 tools/lib/python/kdoc/kdoc_re.py     | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/lib/python/kdoc/kdoc_parser.py b/tools/lib/python/kdoc/kdoc_parser.py
index 32a30851db08..3ee169b505d3 100644
--- a/tools/lib/python/kdoc/kdoc_parser.py
+++ b/tools/lib/python/kdoc/kdoc_parser.py
@@ -160,7 +160,7 @@ struct_xforms = [
 # is allowed.
 #
 struct_nested_prefixes = [
-    (re.compile(r'\bSTRUCT_GROUP\('), r'\1'),
+    (re.compile(r'\bSTRUCT_GROUP\('), r'\0'),
 ]
 
 #
diff --git a/tools/lib/python/kdoc/kdoc_re.py b/tools/lib/python/kdoc/kdoc_re.py
index 00afa5bccd6d..2d83b6fb1cd6 100644
--- a/tools/lib/python/kdoc/kdoc_re.py
+++ b/tools/lib/python/kdoc/kdoc_re.py
@@ -291,7 +291,7 @@ class NestedMatch:
 
         if the sub argument contains::
 
-            r'\1'
+            r'\0'
 
         it will work just like re: it places there the matched paired data
         with the delimiter stripped.
@@ -310,9 +310,9 @@ class NestedMatch:
             # Value, ignoring start/end delimiters
             value = line[end:pos - 1]
 
-            # replaces \1 at the sub string, if \1 is used there
+            # replaces \0 at the sub string, if \0 is used there
             new_sub = sub
-            new_sub = new_sub.replace(r'\1', value)
+            new_sub = new_sub.replace(r'\0', value)
 
             out += new_sub
 
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 19/38] docs: kdoc_re: make NestedMatch use KernRe
  2026-02-18 10:12 [PATCH 00/38] docs: several improvements to kernel-doc Mauro Carvalho Chehab
                   ` (17 preceding siblings ...)
  2026-02-18 10:12 ` [PATCH 18/38] docs: kdoc_re: Change NestedMath args replacement to \0 Mauro Carvalho Chehab
@ 2026-02-18 10:12 ` Mauro Carvalho Chehab
  2026-02-18 10:12 ` [PATCH 20/38] docs: kdoc_re: add support on NestedMatch for argument replacement Mauro Carvalho Chehab
                   ` (21 subsequent siblings)
  40 siblings, 0 replies; 55+ messages in thread
From: Mauro Carvalho Chehab @ 2026-02-18 10:12 UTC (permalink / raw)
  To: Jonathan Corbet, Linux Doc Mailing List
  Cc: Mauro Carvalho Chehab, intel-wired-lan, linux-hardening,
	linux-kernel, netdev, Aleksandr Loktionov, Randy Dunlap

Instead of using re_compile, let's create the class with the
regex and use KernRe to keep it cached.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
---
 tools/lib/python/kdoc/kdoc_parser.py | 55 ++++++++--------------------
 tools/lib/python/kdoc/kdoc_re.py     | 24 ++++++++----
 2 files changed, 33 insertions(+), 46 deletions(-)

diff --git a/tools/lib/python/kdoc/kdoc_parser.py b/tools/lib/python/kdoc/kdoc_parser.py
index 3ee169b505d3..06a7af4bfa57 100644
--- a/tools/lib/python/kdoc/kdoc_parser.py
+++ b/tools/lib/python/kdoc/kdoc_parser.py
@@ -153,32 +153,7 @@ struct_xforms = [
     (KernRe(r'DEFINE_DMA_UNMAP_ADDR\s*\(' + struct_args_pattern + r'\)', re.S), r'dma_addr_t \1'),
     (KernRe(r'DEFINE_DMA_UNMAP_LEN\s*\(' + struct_args_pattern + r'\)', re.S), r'__u32 \1'),
     (KernRe(r'VIRTIO_DECLARE_FEATURES\(([\w_]+)\)'), r'union { u64 \1; u64 \1_array[VIRTIO_FEATURES_U64S]; }'),
-]
-#
-# Struct regexes here are guaranteed to have the end delimiter matching
-# the start delimiter. Yet, right now, only one replace group
-# is allowed.
-#
-struct_nested_prefixes = [
-    (re.compile(r'\bSTRUCT_GROUP\('), r'\0'),
-]
-
-#
-# Function Regexes here are guaranteed to have the end delimiter matching
-# the start delimiter.
-#
-function_nested_prefixes = [
-    (re.compile(r"__cond_acquires\s*\("), ""),
-    (re.compile(r"__cond_releases\s*\("), ""),
-    (re.compile(r"__acquires\s*\("), ""),
-    (re.compile(r"__releases\s*\("), ""),
-    (re.compile(r"__must_hold\s*\("), ""),
-    (re.compile(r"__must_not_hold\s*\("), ""),
-    (re.compile(r"__must_hold_shared\s*\("), ""),
-    (re.compile(r"__cond_acquires_shared\s*\("), ""),
-    (re.compile(r"__acquires_shared\s*\("), ""),
-    (re.compile(r"__releases_shared\s*\("), ""),
-    (re.compile(r"__attribute__\s*\("), ""),
+    (NestedMatch(r'\bSTRUCT_GROUP\('), r'\0'),
 ]
 
 #
@@ -211,6 +186,17 @@ function_xforms = [
     (KernRe(r"DECL_BUCKET_PARAMS\s*\(\s*(\S+)\s*,\s*(\S+)\s*\)"), r"\1, \2"),
     (KernRe(r"__no_context_analysis\s*"), ""),
     (KernRe(r"__attribute_const__ +"), ""),
+    (NestedMatch(r"__cond_acquires\s*\("), ""),
+    (NestedMatch(r"__cond_releases\s*\("), ""),
+    (NestedMatch(r"__acquires\s*\("), ""),
+    (NestedMatch(r"__releases\s*\("), ""),
+    (NestedMatch(r"__must_hold\s*\("), ""),
+    (NestedMatch(r"__must_not_hold\s*\("), ""),
+    (NestedMatch(r"__must_hold_shared\s*\("), ""),
+    (NestedMatch(r"__cond_acquires_shared\s*\("), ""),
+    (NestedMatch(r"__acquires_shared\s*\("), ""),
+    (NestedMatch(r"__releases_shared\s*\("), ""),
+    (NestedMatch(r"__attribute__\s*\("), ""),
 ]
 
 #
@@ -231,7 +217,6 @@ var_xforms = [
 # Ancillary functions
 #
 
-
 multi_space = KernRe(r'\s\s+')
 def trim_whitespace(s):
     """
@@ -425,8 +410,6 @@ class KernelDoc:
         # Place all potential outputs into an array
         self.entries = []
 
-        self.nested = NestedMatch()
-
         #
         # We need Python 3.7 for its "dicts remember the insertion
         # order" guarantee
@@ -524,14 +507,11 @@ class KernelDoc:
         # State flags
         self.state = state.NORMAL
 
-    def apply_transforms(self, regex_xforms, nested_xforms, text):
+    def apply_transforms(self, xforms, text):
         """Apply a set of transforms to a block of text."""
-        for search, subst in regex_xforms:
+        for search, subst in xforms:
             text = search.sub(subst, text)
 
-        for search, sub in nested_xforms:
-            text = self.nested.sub(search, sub, text)
-
         return text.strip()
 
     def push_parameter(self, ln, decl_type, param, dtype,
@@ -910,8 +890,7 @@ class KernelDoc:
         # Go through the list of members applying all of our transformations.
         #
         members = trim_private_members(members)
-        members = self.apply_transforms(struct_xforms, struct_nested_prefixes,
-                                        members)
+        members = self.apply_transforms(struct_xforms, members)
 
         #
         # Deal with embedded struct and union members, and drop enums entirely.
@@ -1126,9 +1105,7 @@ class KernelDoc:
             #
             # Apply the initial transformations.
             #
-            prototype = self.apply_transforms(function_xforms,
-                                              function_nested_prefixes,
-                                              prototype)
+            prototype = self.apply_transforms(function_xforms, prototype)
 
         # Yes, this truly is vile.  We are looking for:
         # 1. Return type (may be nothing if we're looking at a macro)
diff --git a/tools/lib/python/kdoc/kdoc_re.py b/tools/lib/python/kdoc/kdoc_re.py
index 2d83b6fb1cd6..fed9894a5c71 100644
--- a/tools/lib/python/kdoc/kdoc_re.py
+++ b/tools/lib/python/kdoc/kdoc_re.py
@@ -200,7 +200,10 @@ class NestedMatch:
     #
     #   FOO(arg1, arg2, arg3)
 
-    def _search(self, regex, line):
+    def __init__(self, regex):
+        self.regex = KernRe(regex)
+
+    def _search(self, line):
         """
         Finds paired blocks for a regex that ends with a delimiter.
 
@@ -222,7 +225,7 @@ class NestedMatch:
 
         stack = []
 
-        for match_re in regex.finditer(line):
+        for match_re in self.regex.finditer(line):
             start = match_re.start()
             offset = match_re.end()
             string_char = None
@@ -270,7 +273,7 @@ class NestedMatch:
                         yield start, offset, pos + 1
                         break
 
-    def search(self, regex, line):
+    def search(self, line):
         """
         This is similar to re.search:
 
@@ -278,12 +281,12 @@ class NestedMatch:
         returning occurrences only if all delimiters are paired.
         """
 
-        for t in self._search(regex, line):
+        for t in self._search(line):
 
             yield line[t[0]:t[2]]
 
-    def sub(self, regex, sub, line, count=0):
-        r"""
+    def sub(self, sub, line, count=0):
+        """
         This is similar to re.sub:
 
         It matches a regex that it is followed by a delimiter,
@@ -304,7 +307,7 @@ class NestedMatch:
         cur_pos = 0
         n = 0
 
-        for start, end, pos in self._search(regex, line):
+        for start, end, pos in self._search(line):
             out += line[cur_pos:start]
 
             # Value, ignoring start/end delimiters
@@ -331,3 +334,10 @@ class NestedMatch:
         out += line[cur_pos:l]
 
         return out
+
+    def __repr__(self):
+        """
+        Returns a displayable version of the class init.
+        """
+
+        return f'NestedMatch("{self.regex.regex.pattern}")'
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 20/38] docs: kdoc_re: add support on NestedMatch for argument replacement
  2026-02-18 10:12 [PATCH 00/38] docs: several improvements to kernel-doc Mauro Carvalho Chehab
                   ` (18 preceding siblings ...)
  2026-02-18 10:12 ` [PATCH 19/38] docs: kdoc_re: make NestedMatch use KernRe Mauro Carvalho Chehab
@ 2026-02-18 10:12 ` Mauro Carvalho Chehab
  2026-02-18 10:12 ` [PATCH 21/38] docs: kdoc_parser: better handle struct_group macros Mauro Carvalho Chehab
                   ` (20 subsequent siblings)
  40 siblings, 0 replies; 55+ messages in thread
From: Mauro Carvalho Chehab @ 2026-02-18 10:12 UTC (permalink / raw)
  To: Jonathan Corbet, Linux Doc Mailing List
  Cc: Mauro Carvalho Chehab, intel-wired-lan, linux-hardening,
	linux-kernel, netdev, Aleksandr Loktionov, Randy Dunlap

Currently, NestedMatch has very limited support for aguments
replacement: it is all or nothing.

Add support to allow replacing individual arguments as well.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
---
 tools/lib/python/kdoc/kdoc_re.py | 84 ++++++++++++++++++++++----------
 1 file changed, 59 insertions(+), 25 deletions(-)

diff --git a/tools/lib/python/kdoc/kdoc_re.py b/tools/lib/python/kdoc/kdoc_re.py
index fed9894a5c71..05f36d665b70 100644
--- a/tools/lib/python/kdoc/kdoc_re.py
+++ b/tools/lib/python/kdoc/kdoc_re.py
@@ -177,29 +177,6 @@ class NestedMatch:
     will ignore the search string.
     """
 
-    # TODO: make NestedMatch handle multiple match groups
-    #
-    # Right now, regular expressions to match it are defined only up to
-    #       the start delimiter, e.g.:
-    #
-    #       \bSTRUCT_GROUP\(
-    #
-    # is similar to: STRUCT_GROUP\((.*)\)
-    # except that the content inside the match group is delimiter-aligned.
-    #
-    # The content inside parentheses is converted into a single replace
-    # group (e.g. r`\1').
-    #
-    # It would be nice to change such definition to support multiple
-    # match groups, allowing a regex equivalent to:
-    #
-    #   FOO\((.*), (.*), (.*)\)
-    #
-    # it is probably easier to define it not as a regular expression, but
-    # with some lexical definition like:
-    #
-    #   FOO(arg1, arg2, arg3)
-
     def __init__(self, regex):
         self.regex = KernRe(regex)
 
@@ -285,6 +262,59 @@ class NestedMatch:
 
             yield line[t[0]:t[2]]
 
+    @staticmethod
+    def _split_args(all_args, delim=","):
+        """
+        Helper method to split comma-separated function arguments
+        or struct elements, if delim is set to ";".
+
+        It returns a list of arguments that can be used later on by
+        the sub() method.
+        """
+        args = [all_args]
+        stack = []
+        arg_start = 0
+        string_char = None
+        escape = False
+
+        for idx, d in enumerate(all_args):
+            if escape:
+                escape = False
+                continue
+
+            if string_char:
+                if d == '\\':
+                    escape = True
+                elif d == string_char:
+                    string_char = None
+
+                continue
+
+            if d in ('"', "'"):
+                string_char = d
+                continue
+
+            if d in DELIMITER_PAIRS:
+                end = DELIMITER_PAIRS[d]
+
+                stack.append(end)
+                continue
+
+            if stack and d == stack[-1]:
+                stack.pop()
+                continue
+
+            if d == delim and not stack:
+                args.append(all_args[arg_start:idx].strip())
+                arg_start = idx + 1
+
+        # Add the last argument (if any)
+        last = all_args[arg_start:].strip()
+        if last:
+            args.append(last)
+
+        return args
+
     def sub(self, sub, line, count=0):
         """
         This is similar to re.sub:
@@ -313,9 +343,13 @@ class NestedMatch:
             # Value, ignoring start/end delimiters
             value = line[end:pos - 1]
 
-            # replaces \0 at the sub string, if \0 is used there
+            # replace arguments
             new_sub = sub
-            new_sub = new_sub.replace(r'\0', value)
+            if "\\" in sub:
+                args = self._split_args(value)
+
+                new_sub = re.sub(r'\\(\d+)',
+                                 lambda m: args[int(m.group(1))], new_sub)
 
             out += new_sub
 
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 21/38] docs: kdoc_parser: better handle struct_group macros
  2026-02-18 10:12 [PATCH 00/38] docs: several improvements to kernel-doc Mauro Carvalho Chehab
                   ` (19 preceding siblings ...)
  2026-02-18 10:12 ` [PATCH 20/38] docs: kdoc_re: add support on NestedMatch for argument replacement Mauro Carvalho Chehab
@ 2026-02-18 10:12 ` Mauro Carvalho Chehab
  2026-02-18 10:12 ` [PATCH 22/38] docs: kdoc_re: fix a parse bug on struct page_pool_params Mauro Carvalho Chehab
                   ` (19 subsequent siblings)
  40 siblings, 0 replies; 55+ messages in thread
From: Mauro Carvalho Chehab @ 2026-02-18 10:12 UTC (permalink / raw)
  To: Jonathan Corbet, Linux Doc Mailing List
  Cc: Mauro Carvalho Chehab, intel-wired-lan, linux-hardening,
	linux-kernel, netdev, Aleksandr Loktionov, Randy Dunlap

Instead of converting them on two steps, implement a single
logic to parse them using the new sub functionality of
NestedMatch.sub().

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
---
 tools/lib/python/kdoc/kdoc_parser.py | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/tools/lib/python/kdoc/kdoc_parser.py b/tools/lib/python/kdoc/kdoc_parser.py
index 06a7af4bfa57..b63a70f184eb 100644
--- a/tools/lib/python/kdoc/kdoc_parser.py
+++ b/tools/lib/python/kdoc/kdoc_parser.py
@@ -124,10 +124,11 @@ struct_xforms = [
     # matched. So, the implementation to drop STRUCT_GROUP() will be
     # handled in separate.
     #
-    (KernRe(r'\bstruct_group\s*\(([^,]*,)', re.S), r'STRUCT_GROUP('),
-    (KernRe(r'\bstruct_group_attr\s*\(([^,]*,){2}', re.S), r'STRUCT_GROUP('),
-    (KernRe(r'\bstruct_group_tagged\s*\(([^,]*),([^,]*),', re.S), r'struct \1 \2; STRUCT_GROUP('),
-    (KernRe(r'\b__struct_group\s*\(([^,]*,){3}', re.S), r'STRUCT_GROUP('),
+    (NestedMatch(r'\bstruct_group\s*\('), r'\2'),
+    (NestedMatch(r'\bstruct_group_attr\s*\('), r'\3'),
+    (NestedMatch(r'\bstruct_group_tagged\s*\('), r'struct \1 { \3 } \2;'),
+    (NestedMatch(r'\b__struct_group\s*\('), r'\4'),
+
     #
     # Replace macros
     #
@@ -153,7 +154,6 @@ struct_xforms = [
     (KernRe(r'DEFINE_DMA_UNMAP_ADDR\s*\(' + struct_args_pattern + r'\)', re.S), r'dma_addr_t \1'),
     (KernRe(r'DEFINE_DMA_UNMAP_LEN\s*\(' + struct_args_pattern + r'\)', re.S), r'__u32 \1'),
     (KernRe(r'VIRTIO_DECLARE_FEATURES\(([\w_]+)\)'), r'union { u64 \1; u64 \1_array[VIRTIO_FEATURES_U64S]; }'),
-    (NestedMatch(r'\bSTRUCT_GROUP\('), r'\0'),
 ]
 
 #
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 22/38] docs: kdoc_re: fix a parse bug on struct page_pool_params
  2026-02-18 10:12 [PATCH 00/38] docs: several improvements to kernel-doc Mauro Carvalho Chehab
                   ` (20 preceding siblings ...)
  2026-02-18 10:12 ` [PATCH 21/38] docs: kdoc_parser: better handle struct_group macros Mauro Carvalho Chehab
@ 2026-02-18 10:12 ` Mauro Carvalho Chehab
  2026-02-18 10:12 ` [PATCH 23/38] docs: kdoc_re: add a helper class to declare C function matches Mauro Carvalho Chehab
                   ` (18 subsequent siblings)
  40 siblings, 0 replies; 55+ messages in thread
From: Mauro Carvalho Chehab @ 2026-02-18 10:12 UTC (permalink / raw)
  To: Jonathan Corbet, Linux Doc Mailing List
  Cc: Mauro Carvalho Chehab, intel-wired-lan, linux-hardening,
	linux-kernel, netdev, Aleksandr Loktionov, Randy Dunlap

The struct page_pool_params definition has a private
definition on it:

    struct page_pool_params {
	struct_group_tagged(page_pool_params_fast, fast,
		unsigned int	order;
		unsigned int	pool_size;
		int		nid;
		struct device	*dev;
		struct napi_struct *napi;
		enum dma_data_direction dma_dir;
		unsigned int	max_len;
		unsigned int	offset;
	);
	struct_group_tagged(page_pool_params_slow, slow,
		struct net_device *netdev;
		unsigned int queue_idx;
		unsigned int	flags;
    /* private: used by test code only */
		void (*init_callback)(netmem_ref netmem, void *arg);
		void *init_arg;
	);
   };

This makes kernel-doc parser to miss the end parenthesis of
the second struct_group_tagged, causing documentation issues.

Address it by ensuring that, if are there anything at the stack,
it will be placed as the last part of the argument.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
---
 tools/lib/python/kdoc/kdoc_re.py | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/tools/lib/python/kdoc/kdoc_re.py b/tools/lib/python/kdoc/kdoc_re.py
index 05f36d665b70..cdc842f5fc8f 100644
--- a/tools/lib/python/kdoc/kdoc_re.py
+++ b/tools/lib/python/kdoc/kdoc_re.py
@@ -201,6 +201,9 @@ class NestedMatch:
         """
 
         stack = []
+        start = 0
+        offset = 0
+        pos = 0
 
         for match_re in self.regex.finditer(line):
             start = match_re.start()
@@ -250,6 +253,11 @@ class NestedMatch:
                         yield start, offset, pos + 1
                         break
 
+        # When /* private */ is used, it may end the end delimiterq
+        if stack:
+            stack.pop()
+            yield start, offset, len(line) + 1
+
     def search(self, line):
         """
         This is similar to re.search:
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 23/38] docs: kdoc_re: add a helper class to declare C function matches
  2026-02-18 10:12 [PATCH 00/38] docs: several improvements to kernel-doc Mauro Carvalho Chehab
                   ` (21 preceding siblings ...)
  2026-02-18 10:12 ` [PATCH 22/38] docs: kdoc_re: fix a parse bug on struct page_pool_params Mauro Carvalho Chehab
@ 2026-02-18 10:12 ` Mauro Carvalho Chehab
  2026-02-18 10:12 ` [PATCH 24/38] docs: kdoc_parser: use the new CFunction class Mauro Carvalho Chehab
                   ` (17 subsequent siblings)
  40 siblings, 0 replies; 55+ messages in thread
From: Mauro Carvalho Chehab @ 2026-02-18 10:12 UTC (permalink / raw)
  To: Jonathan Corbet, Linux Doc Mailing List
  Cc: Mauro Carvalho Chehab, intel-wired-lan, linux-hardening,
	linux-kernel, netdev, Aleksandr Loktionov, Randy Dunlap

Add a more convenient class to match C functions and avoiding
issues at the beginning and ending of NestedMatch inits.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
---
 tools/lib/python/kdoc/kdoc_re.py | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/tools/lib/python/kdoc/kdoc_re.py b/tools/lib/python/kdoc/kdoc_re.py
index cdc842f5fc8f..f72b80ea4f1b 100644
--- a/tools/lib/python/kdoc/kdoc_re.py
+++ b/tools/lib/python/kdoc/kdoc_re.py
@@ -383,3 +383,14 @@ class NestedMatch:
         """
 
         return f'NestedMatch("{self.regex.regex.pattern}")'
+
+
+class CFunction(NestedMatch):
+    r"""
+    Variant of NestedMatch.
+
+    It overrides the init method to ensure that the regular expression will
+    start with a ``\b`` and end with a C function delimiter (open parenthesis).
+    """
+    def __init__(self, regex):
+        self.regex = KernRe(r"\b" + regex + r"\s*\(")
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 24/38] docs: kdoc_parser: use the new CFunction class
  2026-02-18 10:12 [PATCH 00/38] docs: several improvements to kernel-doc Mauro Carvalho Chehab
                   ` (22 preceding siblings ...)
  2026-02-18 10:12 ` [PATCH 23/38] docs: kdoc_re: add a helper class to declare C function matches Mauro Carvalho Chehab
@ 2026-02-18 10:12 ` Mauro Carvalho Chehab
  2026-02-18 10:12 ` [PATCH 25/38] docs: kdoc_parser: minimize differences with struct_group_tagged Mauro Carvalho Chehab
                   ` (16 subsequent siblings)
  40 siblings, 0 replies; 55+ messages in thread
From: Mauro Carvalho Chehab @ 2026-02-18 10:12 UTC (permalink / raw)
  To: Jonathan Corbet, Linux Doc Mailing List
  Cc: Mauro Carvalho Chehab, intel-wired-lan, linux-hardening,
	linux-kernel, netdev, Aleksandr Loktionov, Randy Dunlap

The match logic for transforms becomes a lot clearer if we use
CFunction convenient alias class instead of NestedMatch.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
---
 tools/lib/python/kdoc/kdoc_parser.py | 38 ++++++++++++++--------------
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/tools/lib/python/kdoc/kdoc_parser.py b/tools/lib/python/kdoc/kdoc_parser.py
index b63a70f184eb..e1914b2a6ab7 100644
--- a/tools/lib/python/kdoc/kdoc_parser.py
+++ b/tools/lib/python/kdoc/kdoc_parser.py
@@ -13,7 +13,7 @@ import sys
 import re
 from pprint import pformat
 
-from kdoc.kdoc_re import NestedMatch, KernRe
+from kdoc.kdoc_re import CFunction, KernRe
 from kdoc.kdoc_item import KdocItem
 
 #
@@ -119,22 +119,22 @@ struct_xforms = [
     #
     # As it doesn't properly match the end parenthesis on some cases.
     #
-    # So, a better solution was crafted: there's now a NestedMatch
+    # So, a better solution was crafted: there's now a CFunction
     # class that ensures that delimiters after a search are properly
     # matched. So, the implementation to drop STRUCT_GROUP() will be
     # handled in separate.
     #
-    (NestedMatch(r'\bstruct_group\s*\('), r'\2'),
-    (NestedMatch(r'\bstruct_group_attr\s*\('), r'\3'),
-    (NestedMatch(r'\bstruct_group_tagged\s*\('), r'struct \1 { \3 } \2;'),
-    (NestedMatch(r'\b__struct_group\s*\('), r'\4'),
+    (CFunction('struct_group'), r'\2'),
+    (CFunction('struct_group_attr'), r'\3'),
+    (CFunction('struct_group_tagged'), r'struct \1 { \3 } \2;'),
+    (CFunction('__struct_group'), r'\4'),
 
     #
     # Replace macros
     #
-    # TODO: use NestedMatch for FOO($1, $2, ...) matches
+    # TODO: use CFunction on all FOO($1, $2, ...) matches
     #
-    # it is better to also move those to the NestedMatch logic,
+    # it is better to also move those to the CFunction logic,
     # to ensure that parentheses will be properly matched.
     #
     (KernRe(r'__ETHTOOL_DECLARE_LINK_MODE_MASK\s*\(([^\)]+)\)', re.S),
@@ -186,17 +186,17 @@ function_xforms = [
     (KernRe(r"DECL_BUCKET_PARAMS\s*\(\s*(\S+)\s*,\s*(\S+)\s*\)"), r"\1, \2"),
     (KernRe(r"__no_context_analysis\s*"), ""),
     (KernRe(r"__attribute_const__ +"), ""),
-    (NestedMatch(r"__cond_acquires\s*\("), ""),
-    (NestedMatch(r"__cond_releases\s*\("), ""),
-    (NestedMatch(r"__acquires\s*\("), ""),
-    (NestedMatch(r"__releases\s*\("), ""),
-    (NestedMatch(r"__must_hold\s*\("), ""),
-    (NestedMatch(r"__must_not_hold\s*\("), ""),
-    (NestedMatch(r"__must_hold_shared\s*\("), ""),
-    (NestedMatch(r"__cond_acquires_shared\s*\("), ""),
-    (NestedMatch(r"__acquires_shared\s*\("), ""),
-    (NestedMatch(r"__releases_shared\s*\("), ""),
-    (NestedMatch(r"__attribute__\s*\("), ""),
+    (CFunction("__cond_acquires"), ""),
+    (CFunction("__cond_releases"), ""),
+    (CFunction("__acquires"), ""),
+    (CFunction("__releases"), ""),
+    (CFunction("__must_hold"), ""),
+    (CFunction("__must_not_hold"), ""),
+    (CFunction("__must_hold_shared"), ""),
+    (CFunction("__cond_acquires_shared"), ""),
+    (CFunction("__acquires_shared"), ""),
+    (CFunction("__releases_shared"), ""),
+    (CFunction("__attribute__"), ""),
 ]
 
 #
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 25/38] docs: kdoc_parser: minimize differences with struct_group_tagged
  2026-02-18 10:12 [PATCH 00/38] docs: several improvements to kernel-doc Mauro Carvalho Chehab
                   ` (23 preceding siblings ...)
  2026-02-18 10:12 ` [PATCH 24/38] docs: kdoc_parser: use the new CFunction class Mauro Carvalho Chehab
@ 2026-02-18 10:12 ` Mauro Carvalho Chehab
  2026-02-18 10:12 ` [PATCH 26/38] docs: kdoc_parser: move transform lists to a separate file Mauro Carvalho Chehab
                   ` (15 subsequent siblings)
  40 siblings, 0 replies; 55+ messages in thread
From: Mauro Carvalho Chehab @ 2026-02-18 10:12 UTC (permalink / raw)
  To: Alexander Lobakin, Jonathan Corbet, Linux Doc Mailing List
  Cc: Mauro Carvalho Chehab, intel-wired-lan, linux-hardening,
	linux-kernel, netdev, Aleksandr Loktionov, Randy Dunlap

While the previous version does a better job representing
the actual struct, it ends losing documentation from each
member.

Change the replacements to minimize such changes. With that,
the only differences before/after using NestedMatch new
replacement logic are (at man page output):

    --- before.log  2026-01-29 06:14:20.163592584 +0100
    +++ after.log   2026-01-29 06:32:04.811370234 +0100
    @@ -1573701 +1573701 @@
    -.BI "    struct ice_health_tx_hang_buf  tx_hang_buf;"
    +.BI "    struct ice_health_tx_hang_buf tx_hang_buf;"
    @@ -4156451 +4156451 @@
    -.BI "    struct libeth_fq_fp  fp;"
    +.BI "    struct libeth_fq_fp fp;"
    @@ -4164041 +4164041 @@
    -.BI "    struct libeth_xskfq_fp  fp;"
    +.BI "    struct libeth_xskfq_fp fp;"
    @@ -4269434 +4269434 @@
    -.BI "    struct page_pool_params_fast  fast;"
    +.BI "    struct page_pool_params_fast fast;"
    @@ -4269452 +4269452 @@
    -.BI "    struct page_pool_params_slow  slow;"
    +.BI "    struct page_pool_params_slow slow;"
    @@ -4269454 +4269454 @@
    -.BI "    STRUCT_GROUP( struct net_device *netdev;"
    +.BI "    struct net_device *netdev;"

e.g. basically whitespaces, plus a fix NestedMatch to
better handle /* private */ comments.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
---
 tools/lib/python/kdoc/kdoc_parser.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/lib/python/kdoc/kdoc_parser.py b/tools/lib/python/kdoc/kdoc_parser.py
index e1914b2a6ab7..e735e79b5061 100644
--- a/tools/lib/python/kdoc/kdoc_parser.py
+++ b/tools/lib/python/kdoc/kdoc_parser.py
@@ -126,7 +126,7 @@ struct_xforms = [
     #
     (CFunction('struct_group'), r'\2'),
     (CFunction('struct_group_attr'), r'\3'),
-    (CFunction('struct_group_tagged'), r'struct \1 { \3 } \2;'),
+    (CFunction('struct_group_tagged'), r'struct \1 \2; \3'),
     (CFunction('__struct_group'), r'\4'),
 
     #
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 26/38] docs: kdoc_parser: move transform lists to a separate file
  2026-02-18 10:12 [PATCH 00/38] docs: several improvements to kernel-doc Mauro Carvalho Chehab
                   ` (24 preceding siblings ...)
  2026-02-18 10:12 ` [PATCH 25/38] docs: kdoc_parser: minimize differences with struct_group_tagged Mauro Carvalho Chehab
@ 2026-02-18 10:12 ` Mauro Carvalho Chehab
  2026-02-18 10:12 ` [PATCH 27/38] docs: kdoc_re: don't remove the trailing ";" with NestedMatch Mauro Carvalho Chehab
                   ` (14 subsequent siblings)
  40 siblings, 0 replies; 55+ messages in thread
From: Mauro Carvalho Chehab @ 2026-02-18 10:12 UTC (permalink / raw)
  To: Jonathan Corbet, Linux Doc Mailing List
  Cc: Mauro Carvalho Chehab, intel-wired-lan, linux-hardening,
	linux-kernel, netdev, Gustavo A. R. Silva, Aleksandr Loktionov,
	Kees Cook, Randy Dunlap, Shuah Khan

Over the time, most of the changes at kernel-doc are related
to maintaining a list of transforms to convert macros into pure
C code.

Place such transforms on a separate module, to cleanup the
parser module.

While here, drop the now obsolete comment about the two-steps
logic to handle struct_group macros.

There is an advantage on that: QEMU also uses our own kernel-doc,
but the xforms list there is different. By placing it on a
separate module, we can minimize the differences and make it
easier to keep QEMU in sync with Kernel upstream.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
---
 Documentation/tools/kdoc_parser.rst   |   8 ++
 tools/lib/python/kdoc/kdoc_files.py   |   3 +-
 tools/lib/python/kdoc/kdoc_parser.py  | 148 ++------------------------
 tools/lib/python/kdoc/xforms_lists.py | 118 ++++++++++++++++++++
 4 files changed, 134 insertions(+), 143 deletions(-)
 create mode 100644 tools/lib/python/kdoc/xforms_lists.py

diff --git a/Documentation/tools/kdoc_parser.rst b/Documentation/tools/kdoc_parser.rst
index 03ee54a1b1cc..55b202173195 100644
--- a/Documentation/tools/kdoc_parser.rst
+++ b/Documentation/tools/kdoc_parser.rst
@@ -4,6 +4,14 @@
 Kernel-doc parser stage
 =======================
 
+C replacement rules used by the parser
+======================================
+
+.. automodule:: lib.python.kdoc.xforms_lists
+   :members:
+   :show-inheritance:
+   :undoc-members:
+
 File handler classes
 ====================
 
diff --git a/tools/lib/python/kdoc/kdoc_files.py b/tools/lib/python/kdoc/kdoc_files.py
index 022487ea2cc6..7357c97a4b01 100644
--- a/tools/lib/python/kdoc/kdoc_files.py
+++ b/tools/lib/python/kdoc/kdoc_files.py
@@ -15,6 +15,7 @@ import os
 import re
 
 from kdoc.kdoc_parser import KernelDoc
+from kdoc.xforms_lists import CTransforms
 from kdoc.kdoc_output import OutputFormat
 
 
@@ -117,7 +118,7 @@ class KernelFiles():
         if fname in self.files:
             return
 
-        doc = KernelDoc(self.config, fname)
+        doc = KernelDoc(self.config, fname, CTransforms)
         export_table, entries = doc.parse_kdoc()
 
         self.export_table[fname] = export_table
diff --git a/tools/lib/python/kdoc/kdoc_parser.py b/tools/lib/python/kdoc/kdoc_parser.py
index e735e79b5061..a280fe581937 100644
--- a/tools/lib/python/kdoc/kdoc_parser.py
+++ b/tools/lib/python/kdoc/kdoc_parser.py
@@ -75,143 +75,6 @@ doc_begin_func = KernRe(str(doc_com) +			# initial " * '
 #
 struct_args_pattern = r'([^,)]+)'
 
-struct_xforms = [
-    # Strip attributes
-    (KernRe(r"__attribute__\s*\(\([a-z0-9,_\*\s\(\)]*\)\)", flags=re.I | re.S, cache=False), ' '),
-    (KernRe(r'\s*__aligned\s*\([^;]*\)', re.S), ' '),
-    (KernRe(r'\s*__counted_by\s*\([^;]*\)', re.S), ' '),
-    (KernRe(r'\s*__counted_by_(le|be)\s*\([^;]*\)', re.S), ' '),
-    (KernRe(r'\s*__guarded_by\s*\([^\)]*\)', re.S), ' '),
-    (KernRe(r'\s*__pt_guarded_by\s*\([^\)]*\)', re.S), ' '),
-    (KernRe(r'\s*__packed\s*', re.S), ' '),
-    (KernRe(r'\s*CRYPTO_MINALIGN_ATTR', re.S), ' '),
-    (KernRe(r'\s*__private', re.S), ' '),
-    (KernRe(r'\s*__rcu', re.S), ' '),
-    (KernRe(r'\s*____cacheline_aligned_in_smp', re.S), ' '),
-    (KernRe(r'\s*____cacheline_aligned', re.S), ' '),
-    (KernRe(r'\s*__cacheline_group_(begin|end)\([^\)]+\);'), ''),
-    #
-    # Unwrap struct_group macros based on this definition:
-    # __struct_group(TAG, NAME, ATTRS, MEMBERS...)
-    # which has variants like: struct_group(NAME, MEMBERS...)
-    # Only MEMBERS arguments require documentation.
-    #
-    # Parsing them happens on two steps:
-    #
-    # 1. drop struct group arguments that aren't at MEMBERS,
-    #    storing them as STRUCT_GROUP(MEMBERS)
-    #
-    # 2. remove STRUCT_GROUP() ancillary macro.
-    #
-    # The original logic used to remove STRUCT_GROUP() using an
-    # advanced regex:
-    #
-    #   \bSTRUCT_GROUP(\(((?:(?>[^)(]+)|(?1))*)\))[^;]*;
-    #
-    # with two patterns that are incompatible with
-    # Python re module, as it has:
-    #
-    #   - a recursive pattern: (?1)
-    #   - an atomic grouping: (?>...)
-    #
-    # I tried a simpler version: but it didn't work either:
-    #   \bSTRUCT_GROUP\(([^\)]+)\)[^;]*;
-    #
-    # As it doesn't properly match the end parenthesis on some cases.
-    #
-    # So, a better solution was crafted: there's now a CFunction
-    # class that ensures that delimiters after a search are properly
-    # matched. So, the implementation to drop STRUCT_GROUP() will be
-    # handled in separate.
-    #
-    (CFunction('struct_group'), r'\2'),
-    (CFunction('struct_group_attr'), r'\3'),
-    (CFunction('struct_group_tagged'), r'struct \1 \2; \3'),
-    (CFunction('__struct_group'), r'\4'),
-
-    #
-    # Replace macros
-    #
-    # TODO: use CFunction on all FOO($1, $2, ...) matches
-    #
-    # it is better to also move those to the CFunction logic,
-    # to ensure that parentheses will be properly matched.
-    #
-    (KernRe(r'__ETHTOOL_DECLARE_LINK_MODE_MASK\s*\(([^\)]+)\)', re.S),
-     r'DECLARE_BITMAP(\1, __ETHTOOL_LINK_MODE_MASK_NBITS)'),
-    (KernRe(r'DECLARE_PHY_INTERFACE_MASK\s*\(([^\)]+)\)', re.S),
-     r'DECLARE_BITMAP(\1, PHY_INTERFACE_MODE_MAX)'),
-    (KernRe(r'DECLARE_BITMAP\s*\(' + struct_args_pattern + r',\s*' + struct_args_pattern + r'\)',
-            re.S), r'unsigned long \1[BITS_TO_LONGS(\2)]'),
-    (KernRe(r'DECLARE_HASHTABLE\s*\(' + struct_args_pattern + r',\s*' + struct_args_pattern + r'\)',
-            re.S), r'unsigned long \1[1 << ((\2) - 1)]'),
-    (KernRe(r'DECLARE_KFIFO\s*\(' + struct_args_pattern + r',\s*' + struct_args_pattern +
-            r',\s*' + struct_args_pattern + r'\)', re.S), r'\2 *\1'),
-    (KernRe(r'DECLARE_KFIFO_PTR\s*\(' + struct_args_pattern + r',\s*' +
-            struct_args_pattern + r'\)', re.S), r'\2 *\1'),
-    (KernRe(r'(?:__)?DECLARE_FLEX_ARRAY\s*\(' + struct_args_pattern + r',\s*' +
-            struct_args_pattern + r'\)', re.S), r'\1 \2[]'),
-    (KernRe(r'DEFINE_DMA_UNMAP_ADDR\s*\(' + struct_args_pattern + r'\)', re.S), r'dma_addr_t \1'),
-    (KernRe(r'DEFINE_DMA_UNMAP_LEN\s*\(' + struct_args_pattern + r'\)', re.S), r'__u32 \1'),
-    (KernRe(r'VIRTIO_DECLARE_FEATURES\(([\w_]+)\)'), r'union { u64 \1; u64 \1_array[VIRTIO_FEATURES_U64S]; }'),
-]
-
-#
-# Transforms for function prototypes
-#
-function_xforms = [
-    (KernRe(r"^static +"), ""),
-    (KernRe(r"^extern +"), ""),
-    (KernRe(r"^asmlinkage +"), ""),
-    (KernRe(r"^inline +"), ""),
-    (KernRe(r"^__inline__ +"), ""),
-    (KernRe(r"^__inline +"), ""),
-    (KernRe(r"^__always_inline +"), ""),
-    (KernRe(r"^noinline +"), ""),
-    (KernRe(r"^__FORTIFY_INLINE +"), ""),
-    (KernRe(r"__init +"), ""),
-    (KernRe(r"__init_or_module +"), ""),
-    (KernRe(r"__exit +"), ""),
-    (KernRe(r"__deprecated +"), ""),
-    (KernRe(r"__flatten +"), ""),
-    (KernRe(r"__meminit +"), ""),
-    (KernRe(r"__must_check +"), ""),
-    (KernRe(r"__weak +"), ""),
-    (KernRe(r"__sched +"), ""),
-    (KernRe(r"_noprof"), ""),
-    (KernRe(r"__always_unused *"), ""),
-    (KernRe(r"__printf\s*\(\s*\d*\s*,\s*\d*\s*\) +"), ""),
-    (KernRe(r"__(?:re)?alloc_size\s*\(\s*\d+\s*(?:,\s*\d+\s*)?\) +"), ""),
-    (KernRe(r"__diagnose_as\s*\(\s*\S+\s*(?:,\s*\d+\s*)*\) +"), ""),
-    (KernRe(r"DECL_BUCKET_PARAMS\s*\(\s*(\S+)\s*,\s*(\S+)\s*\)"), r"\1, \2"),
-    (KernRe(r"__no_context_analysis\s*"), ""),
-    (KernRe(r"__attribute_const__ +"), ""),
-    (CFunction("__cond_acquires"), ""),
-    (CFunction("__cond_releases"), ""),
-    (CFunction("__acquires"), ""),
-    (CFunction("__releases"), ""),
-    (CFunction("__must_hold"), ""),
-    (CFunction("__must_not_hold"), ""),
-    (CFunction("__must_hold_shared"), ""),
-    (CFunction("__cond_acquires_shared"), ""),
-    (CFunction("__acquires_shared"), ""),
-    (CFunction("__releases_shared"), ""),
-    (CFunction("__attribute__"), ""),
-]
-
-#
-# Transforms for variable prototypes
-#
-var_xforms = [
-    (KernRe(r"__read_mostly"), ""),
-    (KernRe(r"__ro_after_init"), ""),
-    (KernRe(r'\s*__guarded_by\s*\([^\)]*\)', re.S), ""),
-    (KernRe(r'\s*__pt_guarded_by\s*\([^\)]*\)', re.S), ""),
-    (KernRe(r"LIST_HEAD\(([\w_]+)\)"), r"struct list_head \1"),
-    (KernRe(r"(?://.*)$"), ""),
-    (KernRe(r"(?:/\*.*\*/)"), ""),
-    (KernRe(r";$"), ""),
-]
 
 #
 # Ancillary functions
@@ -395,11 +258,12 @@ class KernelDoc:
     #: String to write when a parameter is not described.
     undescribed = "-- undescribed --"
 
-    def __init__(self, config, fname):
+    def __init__(self, config, fname, xforms):
         """Initialize internal variables"""
 
         self.fname = fname
         self.config = config
+        self.xforms = xforms
 
         # Initial state for the state machines
         self.state = state.NORMAL
@@ -890,7 +754,7 @@ class KernelDoc:
         # Go through the list of members applying all of our transformations.
         #
         members = trim_private_members(members)
-        members = self.apply_transforms(struct_xforms, members)
+        members = self.apply_transforms(self.xforms.struct_xforms, members)
 
         #
         # Deal with embedded struct and union members, and drop enums entirely.
@@ -1012,8 +876,7 @@ class KernelDoc:
         # Drop comments and macros to have a pure C prototype
         #
         if not declaration_name:
-            for r, sub in var_xforms:
-                proto = r.sub(sub, proto)
+            proto = self.apply_transforms(self.xforms.var_xforms, proto)
 
         proto = proto.rstrip()
 
@@ -1105,7 +968,8 @@ class KernelDoc:
             #
             # Apply the initial transformations.
             #
-            prototype = self.apply_transforms(function_xforms, prototype)
+            prototype = self.apply_transforms(self.xforms.function_xforms,
+                                              prototype)
 
         # Yes, this truly is vile.  We are looking for:
         # 1. Return type (may be nothing if we're looking at a macro)
diff --git a/tools/lib/python/kdoc/xforms_lists.py b/tools/lib/python/kdoc/xforms_lists.py
new file mode 100644
index 000000000000..2e7b470c4e65
--- /dev/null
+++ b/tools/lib/python/kdoc/xforms_lists.py
@@ -0,0 +1,118 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+# Copyright(c) 2026: Mauro Carvalho Chehab <mchehab@kernel.org>.
+
+import re
+
+from kdoc.kdoc_re import CFunction, KernRe
+
+struct_args_pattern = r'([^,)]+)'
+
+class CTransforms:
+    """
+    Data class containing a long set of transformations to turn
+    structure member prefixes, and macro invocations and variables
+    into something we can parse and generate kdoc for.
+    """
+
+    #: Transforms for structs and unions.
+    struct_xforms = [
+        # Strip attributes
+        (KernRe(r"__attribute__\s*\(\([a-z0-9,_\*\s\(\)]*\)\)", flags=re.I | re.S, cache=False), ' '),
+        (KernRe(r'\s*__aligned\s*\([^;]*\)', re.S), ' '),
+        (KernRe(r'\s*__counted_by\s*\([^;]*\)', re.S), ' '),
+        (KernRe(r'\s*__counted_by_(le|be)\s*\([^;]*\)', re.S), ' '),
+        (KernRe(r'\s*__guarded_by\s*\([^\)]*\)', re.S), ' '),
+        (KernRe(r'\s*__pt_guarded_by\s*\([^\)]*\)', re.S), ' '),
+        (KernRe(r'\s*__packed\s*', re.S), ' '),
+        (KernRe(r'\s*CRYPTO_MINALIGN_ATTR', re.S), ' '),
+        (KernRe(r'\s*__private', re.S), ' '),
+        (KernRe(r'\s*__rcu', re.S), ' '),
+        (KernRe(r'\s*____cacheline_aligned_in_smp', re.S), ' '),
+        (KernRe(r'\s*____cacheline_aligned', re.S), ' '),
+        (KernRe(r'\s*__cacheline_group_(begin|end)\([^\)]+\);'), ''),
+
+        (CFunction('struct_group'), r'\2'),
+        (CFunction('struct_group_attr'), r'\3'),
+        (CFunction('struct_group_tagged'), r'struct \1 \2; \3'),
+        (CFunction('__struct_group'), r'\4'),
+
+        #
+        # Replace macros
+        #
+        # TODO: use CFunction on all FOO($1, $2, ...) matches
+        #
+        # it is better to also move those to the CFunction logic,
+        # to ensure that parentheses will be properly matched.
+        #
+        (KernRe(r'__ETHTOOL_DECLARE_LINK_MODE_MASK\s*\(([^\)]+)\)', re.S),
+        r'DECLARE_BITMAP(\1, __ETHTOOL_LINK_MODE_MASK_NBITS)'),
+        (KernRe(r'DECLARE_PHY_INTERFACE_MASK\s*\(([^\)]+)\)', re.S),
+        r'DECLARE_BITMAP(\1, PHY_INTERFACE_MODE_MAX)'),
+        (KernRe(r'DECLARE_BITMAP\s*\(' + struct_args_pattern + r',\s*' + struct_args_pattern + r'\)',
+                re.S), r'unsigned long \1[BITS_TO_LONGS(\2)]'),
+        (KernRe(r'DECLARE_HASHTABLE\s*\(' + struct_args_pattern + r',\s*' + struct_args_pattern + r'\)',
+                re.S), r'unsigned long \1[1 << ((\2) - 1)]'),
+        (KernRe(r'DECLARE_KFIFO\s*\(' + struct_args_pattern + r',\s*' + struct_args_pattern +
+                r',\s*' + struct_args_pattern + r'\)', re.S), r'\2 *\1'),
+        (KernRe(r'DECLARE_KFIFO_PTR\s*\(' + struct_args_pattern + r',\s*' +
+                struct_args_pattern + r'\)', re.S), r'\2 *\1'),
+        (KernRe(r'(?:__)?DECLARE_FLEX_ARRAY\s*\(' + struct_args_pattern + r',\s*' +
+                struct_args_pattern + r'\)', re.S), r'\1 \2[]'),
+        (KernRe(r'DEFINE_DMA_UNMAP_ADDR\s*\(' + struct_args_pattern + r'\)', re.S), r'dma_addr_t \1'),
+        (KernRe(r'DEFINE_DMA_UNMAP_LEN\s*\(' + struct_args_pattern + r'\)', re.S), r'__u32 \1'),
+        (KernRe(r'VIRTIO_DECLARE_FEATURES\(([\w_]+)\)'), r'union { u64 \1; u64 \1_array[VIRTIO_FEATURES_U64S]; }'),
+    ]
+
+    #: Transforms for function prototypes.
+    function_xforms = [
+        (KernRe(r"^static +"), ""),
+        (KernRe(r"^extern +"), ""),
+        (KernRe(r"^asmlinkage +"), ""),
+        (KernRe(r"^inline +"), ""),
+        (KernRe(r"^__inline__ +"), ""),
+        (KernRe(r"^__inline +"), ""),
+        (KernRe(r"^__always_inline +"), ""),
+        (KernRe(r"^noinline +"), ""),
+        (KernRe(r"^__FORTIFY_INLINE +"), ""),
+        (KernRe(r"__init +"), ""),
+        (KernRe(r"__init_or_module +"), ""),
+        (KernRe(r"__exit +"), ""),
+        (KernRe(r"__deprecated +"), ""),
+        (KernRe(r"__flatten +"), ""),
+        (KernRe(r"__meminit +"), ""),
+        (KernRe(r"__must_check +"), ""),
+        (KernRe(r"__weak +"), ""),
+        (KernRe(r"__sched +"), ""),
+        (KernRe(r"_noprof"), ""),
+        (KernRe(r"__always_unused *"), ""),
+        (KernRe(r"__printf\s*\(\s*\d*\s*,\s*\d*\s*\) +"), ""),
+        (KernRe(r"__(?:re)?alloc_size\s*\(\s*\d+\s*(?:,\s*\d+\s*)?\) +"), ""),
+        (KernRe(r"__diagnose_as\s*\(\s*\S+\s*(?:,\s*\d+\s*)*\) +"), ""),
+        (KernRe(r"DECL_BUCKET_PARAMS\s*\(\s*(\S+)\s*,\s*(\S+)\s*\)"), r"\1, \2"),
+        (KernRe(r"__no_context_analysis\s*"), ""),
+        (KernRe(r"__attribute_const__ +"), ""),
+        (CFunction("__cond_acquires"), ""),
+        (CFunction("__cond_releases"), ""),
+        (CFunction("__acquires"), ""),
+        (CFunction("__releases"), ""),
+        (CFunction("__must_hold"), ""),
+        (CFunction("__must_not_hold"), ""),
+        (CFunction("__must_hold_shared"), ""),
+        (CFunction("__cond_acquires_shared"), ""),
+        (CFunction("__acquires_shared"), ""),
+        (CFunction("__releases_shared"), ""),
+        (CFunction("__attribute__"), ""),
+    ]
+
+    #: Transforms for variables.
+    var_xforms = [
+        (KernRe(r"__read_mostly"), ""),
+        (KernRe(r"__ro_after_init"), ""),
+        (KernRe(r'\s*__guarded_by\s*\([^\)]*\)', re.S), ""),
+        (KernRe(r'\s*__pt_guarded_by\s*\([^\)]*\)', re.S), ""),
+        (KernRe(r"LIST_HEAD\(([\w_]+)\)"), r"struct list_head \1"),
+        (KernRe(r"(?://.*)$"), ""),
+        (KernRe(r"(?:/\*.*\*/)"), ""),
+        (KernRe(r";$"), ""),
+    ]
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 27/38] docs: kdoc_re: don't remove the trailing ";" with NestedMatch
  2026-02-18 10:12 [PATCH 00/38] docs: several improvements to kernel-doc Mauro Carvalho Chehab
                   ` (25 preceding siblings ...)
  2026-02-18 10:12 ` [PATCH 26/38] docs: kdoc_parser: move transform lists to a separate file Mauro Carvalho Chehab
@ 2026-02-18 10:12 ` Mauro Carvalho Chehab
  2026-02-18 10:12 ` [PATCH 28/38] docs: kdoc_re: prevent adding whitespaces on sub replacements Mauro Carvalho Chehab
                   ` (13 subsequent siblings)
  40 siblings, 0 replies; 55+ messages in thread
From: Mauro Carvalho Chehab @ 2026-02-18 10:12 UTC (permalink / raw)
  To: Jonathan Corbet, Linux Doc Mailing List
  Cc: Mauro Carvalho Chehab, intel-wired-lan, linux-hardening,
	linux-kernel, netdev, Aleksandr Loktionov, Randy Dunlap

Removing it causes the parse to break some conversions, when
NestedMatch is used on macros like __attribute__().

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
---
 tools/lib/python/kdoc/kdoc_re.py | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/tools/lib/python/kdoc/kdoc_re.py b/tools/lib/python/kdoc/kdoc_re.py
index f72b80ea4f1b..e3809aaa0310 100644
--- a/tools/lib/python/kdoc/kdoc_re.py
+++ b/tools/lib/python/kdoc/kdoc_re.py
@@ -361,10 +361,6 @@ class NestedMatch:
 
             out += new_sub
 
-            # Drop end ';' if any
-            if pos < len(line) and line[pos] == ';':
-                pos += 1
-
             cur_pos = pos
             n += 1
 
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 28/38] docs: kdoc_re: prevent adding whitespaces on sub replacements
  2026-02-18 10:12 [PATCH 00/38] docs: several improvements to kernel-doc Mauro Carvalho Chehab
                   ` (26 preceding siblings ...)
  2026-02-18 10:12 ` [PATCH 27/38] docs: kdoc_re: don't remove the trailing ";" with NestedMatch Mauro Carvalho Chehab
@ 2026-02-18 10:12 ` Mauro Carvalho Chehab
  2026-02-18 10:12 ` [PATCH 29/38] docs: xforms_lists.py: use CFuntion to handle all function macros Mauro Carvalho Chehab
                   ` (12 subsequent siblings)
  40 siblings, 0 replies; 55+ messages in thread
From: Mauro Carvalho Chehab @ 2026-02-18 10:12 UTC (permalink / raw)
  To: Jonathan Corbet, Linux Doc Mailing List
  Cc: Mauro Carvalho Chehab, intel-wired-lan, linux-hardening,
	linux-kernel, netdev, Aleksandr Loktionov, Randy Dunlap

When NestedMatch is used, blank whitespaces may be placed after
substitutions. As such spaces are part of the C syntax, we can
safelly drop them, improving the quality of the output.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 tools/lib/python/kdoc/kdoc_re.py | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/tools/lib/python/kdoc/kdoc_re.py b/tools/lib/python/kdoc/kdoc_re.py
index e3809aaa0310..44af43aa1e93 100644
--- a/tools/lib/python/kdoc/kdoc_re.py
+++ b/tools/lib/python/kdoc/kdoc_re.py
@@ -344,8 +344,12 @@ class NestedMatch:
 
         cur_pos = 0
         n = 0
+        l = len(line)
 
         for start, end, pos in self._search(line):
+            while cur_pos < l and line[cur_pos] == ' ':
+                cur_pos += 1
+
             out += line[cur_pos:start]
 
             # Value, ignoring start/end delimiters
@@ -368,7 +372,9 @@ class NestedMatch:
                 break
 
         # Append the remaining string
-        l = len(line)
+        while cur_pos < l and line[cur_pos] == ' ':
+            cur_pos += 1
+
         out += line[cur_pos:l]
 
         return out
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 29/38] docs: xforms_lists.py: use CFuntion to handle all function macros
  2026-02-18 10:12 [PATCH 00/38] docs: several improvements to kernel-doc Mauro Carvalho Chehab
                   ` (27 preceding siblings ...)
  2026-02-18 10:12 ` [PATCH 28/38] docs: kdoc_re: prevent adding whitespaces on sub replacements Mauro Carvalho Chehab
@ 2026-02-18 10:12 ` Mauro Carvalho Chehab
  2026-02-18 10:13 ` [PATCH 30/38] docs: kdoc_files: allows the caller to use a different xforms class Mauro Carvalho Chehab
                   ` (11 subsequent siblings)
  40 siblings, 0 replies; 55+ messages in thread
From: Mauro Carvalho Chehab @ 2026-02-18 10:12 UTC (permalink / raw)
  To: Jonathan Corbet, Kees Cook, Linux Doc Mailing List
  Cc: Mauro Carvalho Chehab, intel-wired-lan, linux-hardening,
	linux-kernel, netdev, Gustavo A. R. Silva, Aleksandr Loktionov

The new CFunction class handles better macros, as it works the same
way C compilers do, handling delimiters tha right way.

This allows removing complex regular expressions, placing instead
just a simple one with the name(s) of the functions to be replaced.

Doing a before/after check using "kernel-doc -man ." shows only
cosmetic changes (whitespaces, mostly).

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 tools/lib/python/kdoc/xforms_lists.py | 71 ++++++++++++---------------
 1 file changed, 31 insertions(+), 40 deletions(-)

diff --git a/tools/lib/python/kdoc/xforms_lists.py b/tools/lib/python/kdoc/xforms_lists.py
index 2e7b470c4e65..6455850fee2d 100644
--- a/tools/lib/python/kdoc/xforms_lists.py
+++ b/tools/lib/python/kdoc/xforms_lists.py
@@ -17,51 +17,38 @@ class CTransforms:
 
     #: Transforms for structs and unions.
     struct_xforms = [
-        # Strip attributes
-        (KernRe(r"__attribute__\s*\(\([a-z0-9,_\*\s\(\)]*\)\)", flags=re.I | re.S, cache=False), ' '),
-        (KernRe(r'\s*__aligned\s*\([^;]*\)', re.S), ' '),
-        (KernRe(r'\s*__counted_by\s*\([^;]*\)', re.S), ' '),
-        (KernRe(r'\s*__counted_by_(le|be)\s*\([^;]*\)', re.S), ' '),
-        (KernRe(r'\s*__guarded_by\s*\([^\)]*\)', re.S), ' '),
-        (KernRe(r'\s*__pt_guarded_by\s*\([^\)]*\)', re.S), ' '),
+        (CFunction("__attribute__"), ' '),
+        (CFunction('__aligned'), ' '),
+        (CFunction('__counted_by'), ' '),
+        (CFunction('__counted_by_(le|be)'), ' '),
+        (CFunction('__guarded_by'), ' '),
+        (CFunction('__pt_guarded_by'), ' '),
+
         (KernRe(r'\s*__packed\s*', re.S), ' '),
         (KernRe(r'\s*CRYPTO_MINALIGN_ATTR', re.S), ' '),
         (KernRe(r'\s*__private', re.S), ' '),
         (KernRe(r'\s*__rcu', re.S), ' '),
         (KernRe(r'\s*____cacheline_aligned_in_smp', re.S), ' '),
         (KernRe(r'\s*____cacheline_aligned', re.S), ' '),
-        (KernRe(r'\s*__cacheline_group_(begin|end)\([^\)]+\);'), ''),
+
+        (CFunction('__cacheline_group_(begin|end)'), ''),
 
         (CFunction('struct_group'), r'\2'),
         (CFunction('struct_group_attr'), r'\3'),
         (CFunction('struct_group_tagged'), r'struct \1 \2; \3'),
         (CFunction('__struct_group'), r'\4'),
 
-        #
-        # Replace macros
-        #
-        # TODO: use CFunction on all FOO($1, $2, ...) matches
-        #
-        # it is better to also move those to the CFunction logic,
-        # to ensure that parentheses will be properly matched.
-        #
-        (KernRe(r'__ETHTOOL_DECLARE_LINK_MODE_MASK\s*\(([^\)]+)\)', re.S),
-        r'DECLARE_BITMAP(\1, __ETHTOOL_LINK_MODE_MASK_NBITS)'),
-        (KernRe(r'DECLARE_PHY_INTERFACE_MASK\s*\(([^\)]+)\)', re.S),
-        r'DECLARE_BITMAP(\1, PHY_INTERFACE_MODE_MAX)'),
-        (KernRe(r'DECLARE_BITMAP\s*\(' + struct_args_pattern + r',\s*' + struct_args_pattern + r'\)',
-                re.S), r'unsigned long \1[BITS_TO_LONGS(\2)]'),
-        (KernRe(r'DECLARE_HASHTABLE\s*\(' + struct_args_pattern + r',\s*' + struct_args_pattern + r'\)',
-                re.S), r'unsigned long \1[1 << ((\2) - 1)]'),
-        (KernRe(r'DECLARE_KFIFO\s*\(' + struct_args_pattern + r',\s*' + struct_args_pattern +
-                r',\s*' + struct_args_pattern + r'\)', re.S), r'\2 *\1'),
-        (KernRe(r'DECLARE_KFIFO_PTR\s*\(' + struct_args_pattern + r',\s*' +
-                struct_args_pattern + r'\)', re.S), r'\2 *\1'),
-        (KernRe(r'(?:__)?DECLARE_FLEX_ARRAY\s*\(' + struct_args_pattern + r',\s*' +
-                struct_args_pattern + r'\)', re.S), r'\1 \2[]'),
-        (KernRe(r'DEFINE_DMA_UNMAP_ADDR\s*\(' + struct_args_pattern + r'\)', re.S), r'dma_addr_t \1'),
-        (KernRe(r'DEFINE_DMA_UNMAP_LEN\s*\(' + struct_args_pattern + r'\)', re.S), r'__u32 \1'),
-        (KernRe(r'VIRTIO_DECLARE_FEATURES\(([\w_]+)\)'), r'union { u64 \1; u64 \1_array[VIRTIO_FEATURES_U64S]; }'),
+        (CFunction('__ETHTOOL_DECLARE_LINK_MODE_MASK'), r'DECLARE_BITMAP(\1, __ETHTOOL_LINK_MODE_MASK_NBITS)'),
+        (CFunction('DECLARE_PHY_INTERFACE_MASK',), r'DECLARE_BITMAP(\1, PHY_INTERFACE_MODE_MAX)'),
+        (CFunction('DECLARE_BITMAP'), r'unsigned long \1[BITS_TO_LONGS(\2)]'),
+
+        (CFunction('DECLARE_HASHTABLE'), r'unsigned long \1[1 << ((\2) - 1)]'),
+        (CFunction('DECLARE_KFIFO'), r'\2 *\1'),
+        (CFunction('DECLARE_KFIFO_PTR'), r'\2 *\1'),
+        (CFunction('(?:__)?DECLARE_FLEX_ARRAY'), r'\1 \2[]'),
+        (CFunction('DEFINE_DMA_UNMAP_ADDR'), r'dma_addr_t \1'),
+        (CFunction('DEFINE_DMA_UNMAP_LEN'), r'__u32 \1'),
+        (CFunction('VIRTIO_DECLARE_FEATURES'), r'union { u64 \1; u64 \1_array[VIRTIO_FEATURES_U64S]; }'),
     ]
 
     #: Transforms for function prototypes.
@@ -86,12 +73,14 @@ class CTransforms:
         (KernRe(r"__sched +"), ""),
         (KernRe(r"_noprof"), ""),
         (KernRe(r"__always_unused *"), ""),
-        (KernRe(r"__printf\s*\(\s*\d*\s*,\s*\d*\s*\) +"), ""),
-        (KernRe(r"__(?:re)?alloc_size\s*\(\s*\d+\s*(?:,\s*\d+\s*)?\) +"), ""),
-        (KernRe(r"__diagnose_as\s*\(\s*\S+\s*(?:,\s*\d+\s*)*\) +"), ""),
-        (KernRe(r"DECL_BUCKET_PARAMS\s*\(\s*(\S+)\s*,\s*(\S+)\s*\)"), r"\1, \2"),
         (KernRe(r"__no_context_analysis\s*"), ""),
         (KernRe(r"__attribute_const__ +"), ""),
+
+        (CFunction('__printf'), ""),
+        (CFunction('__(?:re)?alloc_size'), ""),
+        (CFunction("__diagnose_as"), ""),
+        (CFunction("DECL_BUCKET_PARAMS"), r"\1, \2"),
+
         (CFunction("__cond_acquires"), ""),
         (CFunction("__cond_releases"), ""),
         (CFunction("__acquires"), ""),
@@ -109,9 +98,11 @@ class CTransforms:
     var_xforms = [
         (KernRe(r"__read_mostly"), ""),
         (KernRe(r"__ro_after_init"), ""),
-        (KernRe(r'\s*__guarded_by\s*\([^\)]*\)', re.S), ""),
-        (KernRe(r'\s*__pt_guarded_by\s*\([^\)]*\)', re.S), ""),
-        (KernRe(r"LIST_HEAD\(([\w_]+)\)"), r"struct list_head \1"),
+
+        (CFunction('__guarded_by'), ""),
+        (CFunction('__pt_guarded_by'), ""),
+        (CFunction("LIST_HEAD"), r"struct list_head \1"),
+
         (KernRe(r"(?://.*)$"), ""),
         (KernRe(r"(?:/\*.*\*/)"), ""),
         (KernRe(r";$"), ""),
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 30/38] docs: kdoc_files: allows the caller to use a different xforms class
  2026-02-18 10:12 [PATCH 00/38] docs: several improvements to kernel-doc Mauro Carvalho Chehab
                   ` (28 preceding siblings ...)
  2026-02-18 10:12 ` [PATCH 29/38] docs: xforms_lists.py: use CFuntion to handle all function macros Mauro Carvalho Chehab
@ 2026-02-18 10:13 ` Mauro Carvalho Chehab
  2026-02-18 10:13 ` [PATCH 31/38] docs: kdoc_re: Fix NestedMatch.sub() which causes PDF builds to break Mauro Carvalho Chehab
                   ` (10 subsequent siblings)
  40 siblings, 0 replies; 55+ messages in thread
From: Mauro Carvalho Chehab @ 2026-02-18 10:13 UTC (permalink / raw)
  To: Jonathan Corbet, Linux Doc Mailing List
  Cc: Mauro Carvalho Chehab, intel-wired-lan, linux-hardening,
	linux-kernel, netdev, Randy Dunlap

While the main goal for kernel-doc is to be used inside the Linux
Kernel, other open source projects could benefit for it. That's
currently the case of QEMU, which has a fork, mainly due to two
reasons:

  - they need an extra C function transform rule;
  - they handle the html output a little bit different.

Add an extra optional argument to make easier for the code to be
shared, as, with that, QEMU can just create a new derivated class
that will contain its specific rulesets, and just copy the
remaining kernel-doc files as-is.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 tools/lib/python/kdoc/kdoc_files.py | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/tools/lib/python/kdoc/kdoc_files.py b/tools/lib/python/kdoc/kdoc_files.py
index 7357c97a4b01..c35e033cf123 100644
--- a/tools/lib/python/kdoc/kdoc_files.py
+++ b/tools/lib/python/kdoc/kdoc_files.py
@@ -118,7 +118,7 @@ class KernelFiles():
         if fname in self.files:
             return
 
-        doc = KernelDoc(self.config, fname, CTransforms)
+        doc = KernelDoc(self.config, fname, self.xforms)
         export_table, entries = doc.parse_kdoc()
 
         self.export_table[fname] = export_table
@@ -154,7 +154,7 @@ class KernelFiles():
 
         self.error(f"Cannot find file {fname}")
 
-    def __init__(self, verbose=False, out_style=None,
+    def __init__(self, verbose=False, out_style=None, xforms=None,
                  werror=False, wreturn=False, wshort_desc=False,
                  wcontents_before_sections=False,
                  logger=None):
@@ -193,6 +193,11 @@ class KernelFiles():
         self.config.wshort_desc = wshort_desc
         self.config.wcontents_before_sections = wcontents_before_sections
 
+        if xforms:
+            self.xforms = xforms
+        else:
+            self.xforms = CTransforms()
+
         if not logger:
             self.config.log = logging.getLogger("kernel-doc")
         else:
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 31/38] docs: kdoc_re: Fix NestedMatch.sub() which causes PDF builds to break
  2026-02-18 10:12 [PATCH 00/38] docs: several improvements to kernel-doc Mauro Carvalho Chehab
                   ` (29 preceding siblings ...)
  2026-02-18 10:13 ` [PATCH 30/38] docs: kdoc_files: allows the caller to use a different xforms class Mauro Carvalho Chehab
@ 2026-02-18 10:13 ` Mauro Carvalho Chehab
  2026-02-18 10:13 ` [PATCH 32/38] docs: kdoc_files: document KernelFiles() ABI Mauro Carvalho Chehab
                   ` (9 subsequent siblings)
  40 siblings, 0 replies; 55+ messages in thread
From: Mauro Carvalho Chehab @ 2026-02-18 10:13 UTC (permalink / raw)
  To: Jonathan Corbet, Linux Doc Mailing List
  Cc: Mauro Carvalho Chehab, intel-wired-lan, linux-hardening,
	linux-kernel, netdev, Aleksandr Loktionov, Randy Dunlap,
	Akira Yokosawa

Having a "\digit"  inside a docstring with normal strings causes
PDF output to break, as it will add a weird character inside the
string. It should be using a raw string instead.

Yet, having r"\0" won't solve, as this would be converted in
Sphinx as "0". So, this has to be inside a pre formatted text.

That's said, the comment itself is probably not the best one.

Rewrite the entire comment to properly document each parameter
and add a "delim" parameter that will be passed to the
ancillary function.

Reported-by: Akira Yokosawa <akiyks@gmail.com>
Closes: https://lore.kernel.org/linux-doc/63e99049-cc72-4156-83af-414fdde34312@gmail.com/
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 tools/lib/python/kdoc/kdoc_re.py | 30 ++++++++++++++++++------------
 1 file changed, 18 insertions(+), 12 deletions(-)

diff --git a/tools/lib/python/kdoc/kdoc_re.py b/tools/lib/python/kdoc/kdoc_re.py
index 44af43aa1e93..f67ebe86c458 100644
--- a/tools/lib/python/kdoc/kdoc_re.py
+++ b/tools/lib/python/kdoc/kdoc_re.py
@@ -323,22 +323,28 @@ class NestedMatch:
 
         return args
 
-    def sub(self, sub, line, count=0):
-        """
-        This is similar to re.sub:
+    def sub(self, sub, line, delim=",", count=0):
+        r"""
+        Perform a regex‑based replacement on ``line`` for all matches with
+        the ``self.regex`` pattern. It uses the following parameters:
 
-        It matches a regex that it is followed by a delimiter,
-        replacing occurrences only if all delimiters are paired.
+        ``sub``
+            Replacement string that may contain placeholders in the form
+            ``\{digit}``, where  ``digit`` is an integer referring to the regex
+            capture group number.
 
-        if the sub argument contains::
+            ``\{0}`` is a special case that expands to the entire matched text.
 
-            r'\0'
+        ``line``
+            The string to operate on.
 
-        it will work just like re: it places there the matched paired data
-        with the delimiter stripped.
+        ``delim``
+            The delimiter used by identify the placeholder groups
+            (defaults to ",").
 
-        If count is different than zero, it will replace at most count
-        items.
+        ``count``
+            Maximum number of replacements per match.  If 0 or omitted,
+            all matches are replaced.
         """
         out = ""
 
@@ -358,7 +364,7 @@ class NestedMatch:
             # replace arguments
             new_sub = sub
             if "\\" in sub:
-                args = self._split_args(value)
+                args = self._split_args(value, delim=delim)
 
                 new_sub = re.sub(r'\\(\d+)',
                                  lambda m: args[int(m.group(1))], new_sub)
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 32/38] docs: kdoc_files: document KernelFiles() ABI
  2026-02-18 10:12 [PATCH 00/38] docs: several improvements to kernel-doc Mauro Carvalho Chehab
                   ` (30 preceding siblings ...)
  2026-02-18 10:13 ` [PATCH 31/38] docs: kdoc_re: Fix NestedMatch.sub() which causes PDF builds to break Mauro Carvalho Chehab
@ 2026-02-18 10:13 ` Mauro Carvalho Chehab
  2026-02-18 10:13 ` [PATCH 33/38] docs: kdoc_output: add optional args to ManOutput class Mauro Carvalho Chehab
                   ` (8 subsequent siblings)
  40 siblings, 0 replies; 55+ messages in thread
From: Mauro Carvalho Chehab @ 2026-02-18 10:13 UTC (permalink / raw)
  To: Jonathan Corbet, Linux Doc Mailing List
  Cc: Mauro Carvalho Chehab, intel-wired-lan, linux-hardening,
	linux-kernel, netdev, Randy Dunlap

The KernelFiles is the main entry point to run kernel-doc,
being used by both tools/docs/kernel-doc and
Documentation/sphinx/kerneldoc.py.

It is also used on QEMU, which also uses the kernel-doc
libraries from tools/lib/python/kdoc.

Properly describe its ABI contract.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 tools/lib/python/kdoc/kdoc_files.py | 44 ++++++++++++++++++++++++++++-
 1 file changed, 43 insertions(+), 1 deletion(-)

diff --git a/tools/lib/python/kdoc/kdoc_files.py b/tools/lib/python/kdoc/kdoc_files.py
index c35e033cf123..8c2059623949 100644
--- a/tools/lib/python/kdoc/kdoc_files.py
+++ b/tools/lib/python/kdoc/kdoc_files.py
@@ -91,7 +91,49 @@ class KernelFiles():
     """
     Parse kernel-doc tags on multiple kernel source files.
 
-    There are two type of parsers defined here:
+    This is the main entry point to run kernel-doc. This class is initialized
+    using a series of optional arguments:
+
+    ``verbose``
+        If True, enables kernel-doc verbosity. Default: False.
+
+    ``out_style``
+        Class to be used to format output. If None (default),
+        only report errors.
+
+    ``xforms``
+        Transforms to be applied to C prototypes and data structs.
+        If not specified, defaults to xforms = CFunction()
+
+    ``werror``
+        If True, treat warnings as errors, retuning an error code on warnings.
+
+        Default: False.
+
+    ``wreturn``
+        If True, warns about the lack of a return markup on functions.
+
+        Default: False.
+    ``wshort_desc``
+        If True, warns if initial short description is missing.
+
+        Default: False.
+
+    ``wcontents_before_sections``
+        If True, warn if there are contents before sections (deprecated).
+        This option is kept just for backward-compatibility, but it does
+        nothing, neither here nor at the original Perl script.
+
+        Default: False.
+
+    ``logger``
+        Optional logger class instance.
+
+        If not specified, defaults to use: ``logging.getLogger("kernel-doc")``
+
+    Note:
+        There are two type of parsers defined here:
+
         - self.parse_file(): parses both kernel-doc markups and
           ``EXPORT_SYMBOL*`` macros;
         - self.process_export_file(): parses only ``EXPORT_SYMBOL*`` macros.
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 33/38] docs: kdoc_output: add optional args to ManOutput class
  2026-02-18 10:12 [PATCH 00/38] docs: several improvements to kernel-doc Mauro Carvalho Chehab
                   ` (31 preceding siblings ...)
  2026-02-18 10:13 ` [PATCH 32/38] docs: kdoc_files: document KernelFiles() ABI Mauro Carvalho Chehab
@ 2026-02-18 10:13 ` Mauro Carvalho Chehab
  2026-02-18 10:13 ` [PATCH 34/38] docs: sphinx-build-wrapper: better handle troff .TH markups Mauro Carvalho Chehab
                   ` (7 subsequent siblings)
  40 siblings, 0 replies; 55+ messages in thread
From: Mauro Carvalho Chehab @ 2026-02-18 10:13 UTC (permalink / raw)
  To: Jonathan Corbet, Linux Doc Mailing List
  Cc: Mauro Carvalho Chehab, intel-wired-lan, linux-hardening,
	linux-kernel, netdev

The current logic hardcodes several values that are placed
inside troff's title header (.TH). Place them as parameters
to make the class more flexible.

While here, remove the extra unused "LINUX" parameter at the
end of the .TH header.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 tools/lib/python/kdoc/kdoc_output.py | 27 +++++++++++++++++++--------
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/tools/lib/python/kdoc/kdoc_output.py b/tools/lib/python/kdoc/kdoc_output.py
index 4210b91dde5f..fe3fc0dfd02b 100644
--- a/tools/lib/python/kdoc/kdoc_output.py
+++ b/tools/lib/python/kdoc/kdoc_output.py
@@ -607,7 +607,14 @@ class ManFormat(OutputFormat):
         "%m %d %Y",
     ]
 
-    def __init__(self, modulename):
+    def emit_th(self, name):
+        """Emit a title header line."""
+        name = name.strip()
+
+        self.data += f'.TH "{self.modulename}" {self.section} "{name}" '
+        self.data += f' "{self.date}" "{self.manual}"\n'
+
+    def __init__(self, modulename, section="9", manual="Kernel API Manual"):
         """
         Creates class variables.
 
@@ -616,7 +623,11 @@ class ManFormat(OutputFormat):
         """
 
         super().__init__()
+
         self.modulename = modulename
+        self.section = section
+        self.manual = manual
+
         self.symbols = []
 
         dt = None
@@ -632,7 +643,7 @@ class ManFormat(OutputFormat):
         if not dt:
             dt = datetime.now()
 
-        self.man_date = dt.strftime("%B %Y")
+        self.date = dt.strftime("%B %Y")
 
     def arg_name(self, args, name):
         """
@@ -724,7 +735,7 @@ class ManFormat(OutputFormat):
 
         out_name = self.arg_name(args, name)
 
-        self.data += f'.TH "{self.modulename}" 9 "{out_name}" "{self.man_date}" "API Manual" LINUX' + "\n"
+        self.emit_th(out_name)
 
         for section, text in args.sections.items():
             self.data += f'.SH "{section}"' + "\n"
@@ -734,7 +745,7 @@ class ManFormat(OutputFormat):
 
         out_name = self.arg_name(args, name)
 
-        self.data += f'.TH "{name}" 9 "{out_name}" "{self.man_date}" "Kernel Hacker\'s Manual" LINUX' + "\n"
+        self.emit_th(out_name)
 
         self.data += ".SH NAME\n"
         self.data += f"{name} \\- {args['purpose']}\n"
@@ -780,7 +791,7 @@ class ManFormat(OutputFormat):
     def out_enum(self, fname, name, args):
         out_name = self.arg_name(args, name)
 
-        self.data += f'.TH "{self.modulename}" 9 "{out_name}" "{self.man_date}" "API Manual" LINUX' + "\n"
+        self.emit_th(out_name)
 
         self.data += ".SH NAME\n"
         self.data += f"enum {name} \\- {args['purpose']}\n"
@@ -813,7 +824,7 @@ class ManFormat(OutputFormat):
         out_name = self.arg_name(args, name)
         full_proto = args.other_stuff["full_proto"]
 
-        self.data += f'.TH "{self.modulename}" 9 "{out_name}" "{self.man_date}" "API Manual" LINUX' + "\n"
+        self.emit_th(out_name)
 
         self.data += ".SH NAME\n"
         self.data += f"{name} \\- {args['purpose']}\n"
@@ -834,7 +845,7 @@ class ManFormat(OutputFormat):
         purpose = args.get('purpose')
         out_name = self.arg_name(args, name)
 
-        self.data += f'.TH "{module}" 9 "{out_name}" "{self.man_date}" "API Manual" LINUX' + "\n"
+        self.emit_th(out_name)
 
         self.data += ".SH NAME\n"
         self.data += f"typedef {name} \\- {purpose}\n"
@@ -849,7 +860,7 @@ class ManFormat(OutputFormat):
         definition = args.get('definition')
         out_name = self.arg_name(args, name)
 
-        self.data += f'.TH "{module}" 9 "{out_name}" "{self.man_date}" "API Manual" LINUX' + "\n"
+        self.emit_th(out_name)
 
         self.data += ".SH NAME\n"
         self.data += f"{args.type} {name} \\- {purpose}\n"
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 34/38] docs: sphinx-build-wrapper: better handle troff .TH markups
  2026-02-18 10:12 [PATCH 00/38] docs: several improvements to kernel-doc Mauro Carvalho Chehab
                   ` (32 preceding siblings ...)
  2026-02-18 10:13 ` [PATCH 33/38] docs: kdoc_output: add optional args to ManOutput class Mauro Carvalho Chehab
@ 2026-02-18 10:13 ` Mauro Carvalho Chehab
  2026-02-18 10:13 ` [PATCH 35/38] docs: kdoc_output: use a more standard order for .TH on man pages Mauro Carvalho Chehab
                   ` (6 subsequent siblings)
  40 siblings, 0 replies; 55+ messages in thread
From: Mauro Carvalho Chehab @ 2026-02-18 10:13 UTC (permalink / raw)
  To: Jonathan Corbet, Linux Doc Mailing List, Mauro Carvalho Chehab
  Cc: Mauro Carvalho Chehab, intel-wired-lan, linux-hardening,
	linux-kernel, netdev, Shuah Khan

Using a regular expression to match .TH is problematic, as it
doesn't handle well quotation marks.

Use shlex instead.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 tools/docs/sphinx-build-wrapper | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/docs/sphinx-build-wrapper b/tools/docs/sphinx-build-wrapper
index b7c149dff06b..e6418e22e2ff 100755
--- a/tools/docs/sphinx-build-wrapper
+++ b/tools/docs/sphinx-build-wrapper
@@ -576,7 +576,6 @@ class SphinxBuilder:
         """
 
         re_kernel_doc = re.compile(r"^\.\.\s+kernel-doc::\s*(\S+)")
-        re_man = re.compile(r'^\.TH "[^"]*" (\d+) "([^"]*)"')
 
         if docs_dir == src_dir:
             #
@@ -616,8 +615,7 @@ class SphinxBuilder:
         fp = None
         try:
             for line in result.stdout.split("\n"):
-                match = re_man.match(line)
-                if not match:
+                if not line.startswith(".TH"):
                     if fp:
                         fp.write(line + '\n')
                     continue
@@ -625,7 +623,9 @@ class SphinxBuilder:
                 if fp:
                     fp.close()
 
-                fname = f"{output_dir}/{match.group(2)}.{match.group(1)}"
+                # Use shlex here, as it handles well parameters with commas
+                args = shlex.split(line)
+                fname = f"{output_dir}/{args[3]}.{args[2]}"
 
                 if self.verbose:
                     print(f"Creating {fname}")
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 35/38] docs: kdoc_output: use a more standard order for .TH on man pages
  2026-02-18 10:12 [PATCH 00/38] docs: several improvements to kernel-doc Mauro Carvalho Chehab
                   ` (33 preceding siblings ...)
  2026-02-18 10:13 ` [PATCH 34/38] docs: sphinx-build-wrapper: better handle troff .TH markups Mauro Carvalho Chehab
@ 2026-02-18 10:13 ` Mauro Carvalho Chehab
  2026-02-18 10:13 ` [PATCH 36/38] docs: sphinx-build-wrapper: don't allow "/" on file names Mauro Carvalho Chehab
                   ` (5 subsequent siblings)
  40 siblings, 0 replies; 55+ messages in thread
From: Mauro Carvalho Chehab @ 2026-02-18 10:13 UTC (permalink / raw)
  To: Jonathan Corbet, Linux Doc Mailing List
  Cc: Mauro Carvalho Chehab, intel-wired-lan, linux-hardening,
	linux-kernel, netdev, Mauro Carvalho Chehab, Shuah Khan

The generated man pages are not following the current standards
for Linux documentation. Reorder .TH fields for them to look
like other Linux man pages.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 tools/docs/sphinx-build-wrapper      | 2 +-
 tools/lib/python/kdoc/kdoc_output.py | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/docs/sphinx-build-wrapper b/tools/docs/sphinx-build-wrapper
index e6418e22e2ff..ac6852e3dd8c 100755
--- a/tools/docs/sphinx-build-wrapper
+++ b/tools/docs/sphinx-build-wrapper
@@ -625,7 +625,7 @@ class SphinxBuilder:
 
                 # Use shlex here, as it handles well parameters with commas
                 args = shlex.split(line)
-                fname = f"{output_dir}/{args[3]}.{args[2]}"
+                fname = f"{output_dir}/{args[1]}.{args[2]}"
 
                 if self.verbose:
                     print(f"Creating {fname}")
diff --git a/tools/lib/python/kdoc/kdoc_output.py b/tools/lib/python/kdoc/kdoc_output.py
index fe3fc0dfd02b..fb44cc8e0770 100644
--- a/tools/lib/python/kdoc/kdoc_output.py
+++ b/tools/lib/python/kdoc/kdoc_output.py
@@ -611,8 +611,8 @@ class ManFormat(OutputFormat):
         """Emit a title header line."""
         name = name.strip()
 
-        self.data += f'.TH "{self.modulename}" {self.section} "{name}" '
-        self.data += f' "{self.date}" "{self.manual}"\n'
+        self.data += f'.TH "{name}" {self.section} "{self.date}" '
+        self.data += f' "{self.modulename}" "{self.manual}"\n'
 
     def __init__(self, modulename, section="9", manual="Kernel API Manual"):
         """
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 36/38] docs: sphinx-build-wrapper: don't allow "/" on file names
  2026-02-18 10:12 [PATCH 00/38] docs: several improvements to kernel-doc Mauro Carvalho Chehab
                   ` (34 preceding siblings ...)
  2026-02-18 10:13 ` [PATCH 35/38] docs: kdoc_output: use a more standard order for .TH on man pages Mauro Carvalho Chehab
@ 2026-02-18 10:13 ` Mauro Carvalho Chehab
  2026-02-18 10:13 ` [PATCH 37/38] docs: kdoc_output: describe the class init parameters Mauro Carvalho Chehab
                   ` (4 subsequent siblings)
  40 siblings, 0 replies; 55+ messages in thread
From: Mauro Carvalho Chehab @ 2026-02-18 10:13 UTC (permalink / raw)
  To: Jonathan Corbet, Linux Doc Mailing List, Mauro Carvalho Chehab
  Cc: Mauro Carvalho Chehab, intel-wired-lan, linux-hardening,
	linux-kernel, netdev, Shuah Khan

When handling "DOC:" sections, slash characters may be there.
Prevent using it at the file names, as this is used for directory
separator.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 tools/docs/sphinx-build-wrapper | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/docs/sphinx-build-wrapper b/tools/docs/sphinx-build-wrapper
index ac6852e3dd8c..d4bb1175fe32 100755
--- a/tools/docs/sphinx-build-wrapper
+++ b/tools/docs/sphinx-build-wrapper
@@ -625,7 +625,8 @@ class SphinxBuilder:
 
                 # Use shlex here, as it handles well parameters with commas
                 args = shlex.split(line)
-                fname = f"{output_dir}/{args[1]}.{args[2]}"
+                name = args[1].replace("/", " ")
+                fname = f"{output_dir}/{name}.{args[2]}"
 
                 if self.verbose:
                     print(f"Creating {fname}")
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 37/38] docs: kdoc_output: describe the class init parameters
  2026-02-18 10:12 [PATCH 00/38] docs: several improvements to kernel-doc Mauro Carvalho Chehab
                   ` (35 preceding siblings ...)
  2026-02-18 10:13 ` [PATCH 36/38] docs: sphinx-build-wrapper: don't allow "/" on file names Mauro Carvalho Chehab
@ 2026-02-18 10:13 ` Mauro Carvalho Chehab
  2026-02-18 10:13 ` [PATCH 38/38] docs: kdoc_output: pick a better default for modulename Mauro Carvalho Chehab
                   ` (3 subsequent siblings)
  40 siblings, 0 replies; 55+ messages in thread
From: Mauro Carvalho Chehab @ 2026-02-18 10:13 UTC (permalink / raw)
  To: Jonathan Corbet, Linux Doc Mailing List
  Cc: Mauro Carvalho Chehab, intel-wired-lan, linux-hardening,
	linux-kernel, netdev

As this class is part of the ABI used by both Sphinx kerneldoc
extension and docs/tools/kernel-doc, better describe what
parmeters are used to initialize ManOutput class.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 tools/lib/python/kdoc/kdoc_output.py | 29 +++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/tools/lib/python/kdoc/kdoc_output.py b/tools/lib/python/kdoc/kdoc_output.py
index fb44cc8e0770..1e3dc47bc696 100644
--- a/tools/lib/python/kdoc/kdoc_output.py
+++ b/tools/lib/python/kdoc/kdoc_output.py
@@ -580,7 +580,34 @@ class RestFormat(OutputFormat):
 
 
 class ManFormat(OutputFormat):
-    """Consts and functions used by man pages output."""
+    """
+    Consts and functions used by man pages output.
+
+    This class has one mandatory parameter and some optional ones, which
+    are needed to define the title header contents:
+
+    ``modulename``
+        Defines the module name to be used at the troff ``.TH`` output.
+
+        This argument is mandatory.
+
+    ``section``
+        Usually a numeric value from 0 to 9, but man pages also accept
+        some strings like "p".
+
+        Defauls to ``9``
+
+    ``manual``
+        Defaults to ``Kernel API Manual``.
+
+    The above controls the output of teh corresponding fields on troff
+    title headers, which will be filled like this::
+
+        .TH "{name}" {section} "{date}" "{modulename}" "{manual}"
+
+    where ``name``` will match the API symbol name, and ``date`` will be
+    either the date where the Kernel was compiled or the current date
+    """
 
     highlights = (
         (type_constant, r"\1"),
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 38/38] docs: kdoc_output: pick a better default for modulename
  2026-02-18 10:12 [PATCH 00/38] docs: several improvements to kernel-doc Mauro Carvalho Chehab
                   ` (36 preceding siblings ...)
  2026-02-18 10:13 ` [PATCH 37/38] docs: kdoc_output: describe the class init parameters Mauro Carvalho Chehab
@ 2026-02-18 10:13 ` Mauro Carvalho Chehab
  2026-02-21  1:24 ` [PATCH 00/38] docs: several improvements to kernel-doc Randy Dunlap
                   ` (2 subsequent siblings)
  40 siblings, 0 replies; 55+ messages in thread
From: Mauro Carvalho Chehab @ 2026-02-18 10:13 UTC (permalink / raw)
  To: Jonathan Corbet, Linux Doc Mailing List, Mauro Carvalho Chehab
  Cc: Mauro Carvalho Chehab, intel-wired-lan, linux-hardening,
	linux-kernel, netdev, Shuah Khan

Instead of placing the same data for modulename for all generated
man pages, use the directory from the filename used to produce
kernel docs as basis.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 tools/docs/kernel-doc                |  1 -
 tools/lib/python/kdoc/kdoc_output.py | 41 +++++++++++++++++-----------
 2 files changed, 25 insertions(+), 17 deletions(-)

diff --git a/tools/docs/kernel-doc b/tools/docs/kernel-doc
index aed09f9a54dd..3a932f95bdf5 100755
--- a/tools/docs/kernel-doc
+++ b/tools/docs/kernel-doc
@@ -210,7 +210,6 @@ def main():
                         help="Enable debug messages")
 
     parser.add_argument("-M", "-modulename", "--modulename",
-                        default="Kernel API",
                         help="Allow setting a module name at the output.")
 
     parser.add_argument("-l", "-enable-lineno", "--enable_lineno",
diff --git a/tools/lib/python/kdoc/kdoc_output.py b/tools/lib/python/kdoc/kdoc_output.py
index 1e3dc47bc696..44e40a6e8ca6 100644
--- a/tools/lib/python/kdoc/kdoc_output.py
+++ b/tools/lib/python/kdoc/kdoc_output.py
@@ -589,7 +589,8 @@ class ManFormat(OutputFormat):
     ``modulename``
         Defines the module name to be used at the troff ``.TH`` output.
 
-        This argument is mandatory.
+        This argument is optional. If not specified, it will be filled
+        with the directory which contains the documented file.
 
     ``section``
         Usually a numeric value from 0 to 9, but man pages also accept
@@ -634,14 +635,21 @@ class ManFormat(OutputFormat):
         "%m %d %Y",
     ]
 
-    def emit_th(self, name):
+    def emit_th(self, name, args):
         """Emit a title header line."""
-        name = name.strip()
+        title = name.strip()
+        module = self.modulename(args)
 
-        self.data += f'.TH "{name}" {self.section} "{self.date}" '
-        self.data += f' "{self.modulename}" "{self.manual}"\n'
+        self.data += f'.TH "{title}" {self.section} "{self.date}" '
+        self.data += f' "{module}" "{self.manual}"\n'
 
-    def __init__(self, modulename, section="9", manual="Kernel API Manual"):
+    def modulename(self, args):
+        if self._modulename:
+            return self._modulename
+
+        return os.path.dirname(args.fname)
+
+    def __init__(self, modulename=None, section="9", manual="Kernel API Manual"):
         """
         Creates class variables.
 
@@ -651,7 +659,7 @@ class ManFormat(OutputFormat):
 
         super().__init__()
 
-        self.modulename = modulename
+        self._modulename = modulename
         self.section = section
         self.manual = manual
 
@@ -685,7 +693,8 @@ class ManFormat(OutputFormat):
         dtype = args.type
 
         if dtype == "doc":
-            return self.modulename
+            return name
+#            return os.path.basename(self.modulename(args))
 
         if dtype in ["function", "typedef"]:
             return name
@@ -762,7 +771,7 @@ class ManFormat(OutputFormat):
 
         out_name = self.arg_name(args, name)
 
-        self.emit_th(out_name)
+        self.emit_th(out_name, args)
 
         for section, text in args.sections.items():
             self.data += f'.SH "{section}"' + "\n"
@@ -772,7 +781,7 @@ class ManFormat(OutputFormat):
 
         out_name = self.arg_name(args, name)
 
-        self.emit_th(out_name)
+        self.emit_th(out_name, args)
 
         self.data += ".SH NAME\n"
         self.data += f"{name} \\- {args['purpose']}\n"
@@ -818,7 +827,7 @@ class ManFormat(OutputFormat):
     def out_enum(self, fname, name, args):
         out_name = self.arg_name(args, name)
 
-        self.emit_th(out_name)
+        self.emit_th(out_name, args)
 
         self.data += ".SH NAME\n"
         self.data += f"enum {name} \\- {args['purpose']}\n"
@@ -851,7 +860,7 @@ class ManFormat(OutputFormat):
         out_name = self.arg_name(args, name)
         full_proto = args.other_stuff["full_proto"]
 
-        self.emit_th(out_name)
+        self.emit_th(out_name, args)
 
         self.data += ".SH NAME\n"
         self.data += f"{name} \\- {args['purpose']}\n"
@@ -868,11 +877,11 @@ class ManFormat(OutputFormat):
             self.output_highlight(text)
 
     def out_typedef(self, fname, name, args):
-        module = self.modulename
+        module = self.modulename(args)
         purpose = args.get('purpose')
         out_name = self.arg_name(args, name)
 
-        self.emit_th(out_name)
+        self.emit_th(out_name, args)
 
         self.data += ".SH NAME\n"
         self.data += f"typedef {name} \\- {purpose}\n"
@@ -882,12 +891,12 @@ class ManFormat(OutputFormat):
             self.output_highlight(text)
 
     def out_struct(self, fname, name, args):
-        module = self.modulename
+        module = self.modulename(args)
         purpose = args.get('purpose')
         definition = args.get('definition')
         out_name = self.arg_name(args, name)
 
-        self.emit_th(out_name)
+        self.emit_th(out_name, args)
 
         self.data += ".SH NAME\n"
         self.data += f"{args.type} {name} \\- {purpose}\n"
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH 00/38] docs: several improvements to kernel-doc
  2026-02-18 10:12 [PATCH 00/38] docs: several improvements to kernel-doc Mauro Carvalho Chehab
                   ` (37 preceding siblings ...)
  2026-02-18 10:13 ` [PATCH 38/38] docs: kdoc_output: pick a better default for modulename Mauro Carvalho Chehab
@ 2026-02-21  1:24 ` Randy Dunlap
  2026-02-22  1:24   ` Randy Dunlap
  2026-02-23 13:47 ` Jani Nikula
  2026-02-23 21:58 ` Jonathan Corbet
  40 siblings, 1 reply; 55+ messages in thread
From: Randy Dunlap @ 2026-02-21  1:24 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Alexander Lobakin, Jonathan Corbet,
	Kees Cook, Mauro Carvalho Chehab
  Cc: intel-wired-lan, linux-doc, linux-hardening, linux-kernel, netdev,
	Gustavo A. R. Silva, Aleksandr Loktionov, Shuah Khan



On 2/18/26 2:12 AM, Mauro Carvalho Chehab wrote:
> Hi Jon,
> 
> This series contain several improvements for kernel-doc.
> 
> Most of the patches came from v4 of this series:
> 	https://lore.kernel.org/linux-doc/cover.1769867953.git.mchehab+huawei@kernel.org/
> 

Mauro,
Is this series available as a git tree/branch?

Or what is the base for applying this series?

Thanks.

-- 
~Randy


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 00/38] docs: several improvements to kernel-doc
  2026-02-21  1:24 ` [PATCH 00/38] docs: several improvements to kernel-doc Randy Dunlap
@ 2026-02-22  1:24   ` Randy Dunlap
  0 siblings, 0 replies; 55+ messages in thread
From: Randy Dunlap @ 2026-02-22  1:24 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Alexander Lobakin, Jonathan Corbet,
	Kees Cook, Mauro Carvalho Chehab
  Cc: intel-wired-lan, linux-doc, linux-hardening, linux-kernel, netdev,
	Gustavo A. R. Silva, Aleksandr Loktionov, Shuah Khan

[-- Attachment #1: Type: text/plain, Size: 1421 bytes --]


On 2/20/26 5:24 PM, Randy Dunlap wrote:
> 
> 
> On 2/18/26 2:12 AM, Mauro Carvalho Chehab wrote:
>> Hi Jon,
>>
>> This series contain several improvements for kernel-doc.
>>
>> Most of the patches came from v4 of this series:
>> 	https://lore.kernel.org/linux-doc/cover.1769867953.git.mchehab+huawei@kernel.org/
>>
> 
> Mauro,
> Is this series available as a git tree/branch?
> 
> Or what is the base for applying this series?

I applied the series to linux-next-20260220. It applies cleanly
except for one gotcha (using 'patch'):

  In patch 25, in the commit description, I had to change the
  example before/after diff to have leading "//" ('patch' was
  treating them as part of the diff).

I am still seeing kernel-doc warnings being duplicated.
Seems like there a patch for that but it's not applied yet and not part
of this series...?

The results on linux-next-20260220 look good.
I do have one issue on a test file that I had sent to you (Mauro)
earlier:  kdoc-nested.c

In struct super_struct, the fields of nested struct tlv are not
described but there is no warning about that.
Likewise for the fields of the nested structs header, gen_descr,
and data.

Does this series address when /* private: */ is turned off at the
end of a struct/union?  If so, I don't see it working.
See struct nla_policy for where the final struct member should be
public.

kdoc-nested.c test file is attached.


thanks.
-- 
~Randy

[-- Attachment #2: kdoc-nested.c --]
[-- Type: text/x-csrc, Size: 18826 bytes --]

#include <linux/types.h>
#include <linux/errno.h>
#include <linux/init.h>
#include <linux/module.h>

// from Documentation/doc-guide/kernel-doc.rst:

/**
 * struct nested_foobar - a struct with nested unions and structs
 * @memb1: first member of anonymous union/anonymous struct
 * @memb2: second member of anonymous union/anonymous struct
 * @memb3: third member of anonymous union/anonymous struct
 * @memb4: fourth member of anonymous union/anonymous struct
 * @bar: non-anonymous union
 * @bar.st1: struct st1 inside @bar
 * @bar.st2: struct st2 inside @bar
 * @bar.st1.memb1: first member of struct st1 on union bar
 * @bar.st1.memb2: second member of struct st1 on union bar
 * @bar.st2.memb1: first member of struct st2 on union bar
 * @bar.st2.memb2: second member of struct st2 on union bar
 */
struct nested_foobar {
        /* Anonymous union/struct*/
        union {
          struct {
            int memb1;
            int memb2;
          };
          struct {
            void *memb3;
            int memb4;
          };
        };
        union {
          struct {
            int memb1;
            int memb2;
          } st1;
          struct {
            void *memb1;
            int memb2;
          } st2;
        } bar;
};

/*
 * struct knav_dma_tx_cfg:	Tx channel configuration
 * @filt_einfo:			Filter extended packet info
 * @filt_pswords:		Filter PS words present
 * @priority:			Tx channel scheduling priority
 */
struct knav_dma_tx_cfg {
	bool				filt_einfo;
	bool				filt_pswords;
	u32				priority;
};

#define KNAV_DMA_FDQ_PER_CHAN			4
/*
 * struct knav_dma_rx_cfg:	Rx flow configuration
 * @einfo_present:		Extended packet info present
 * @psinfo_present:		PS words present
 * @err_mode:			Error during buffer starvation
 * @desc_type:			Host or Monolithic desc
 * @psinfo_at_sop:		PS word located at start of packet
 * @sop_offset:			Start of packet offset
 * @dst_q:			Destination queue for a given flow
 * @thresh:			Rx flow size threshold
 * @fdq:			Free desc Queue array
 * @sz_thresh0:			RX packet size threshold 0
 * @sz_thresh1:			RX packet size threshold 1
 * @sz_thresh2:			RX packet size threshold 2
 */
struct knav_dma_rx_cfg {
	bool				einfo_present;
	bool				psinfo_present;
	u32				err_mode;
	u32				desc_type;
	bool				psinfo_at_sop;
	unsigned int			sop_offset;
	unsigned int			dst_q;
	u32				thresh;
	unsigned int			fdq[KNAV_DMA_FDQ_PER_CHAN];
	unsigned int			sz_thresh0;
	unsigned int			sz_thresh1;
	unsigned int			sz_thresh2;
};

// from include/linux/soc/ti/knav_dma.h: struct knav_dma_cfg (w/ 2 warnings)
/**
 * struct knav_dma_cfg:	Pktdma channel configuration
 * @u.sl_cfg:			Slave configuration
 * @u.tx:				Tx channel configuration
 * @u.rx:				Rx flow configuration
 */
struct knav_dma_cfg {
	u32			 direction;
	union {
		struct knav_dma_tx_cfg	tx;
		struct knav_dma_rx_cfg	rx;
	} u;
};
// Above Fixed: (no warnings)
/**
 * struct knav_dma_cfg2:	Pktdma channel configuration
 * @direction:			DMA direction info
 * @u:				@tx or @rx configuration
 * @u.tx:			Tx channel configuration
 * @u.rx:			Rx flow configuration
 */
struct knav_dma_cfg2 {
	u32			 direction;
	union {
		struct knav_dma_tx_cfg	tx;
		struct knav_dma_rx_cfg	rx;
	} u;
};

enum dev_prop_type {
	DEV_PROP_U8,
	DEV_PROP_U16,
	DEV_PROP_U32,
	DEV_PROP_U64,
	DEV_PROP_STRING,
	DEV_PROP_REF,
};

// from include/linux/property.h: struct property_entry (w/ 5 warnings)
/**
 * struct property_entry - "Built-in" device property representation.
 * @name: Name of the property.
 * @length: Length of data making up the value.
 * @is_inline: True when the property value is stored inline.
 * @type: Type of the data in unions.
 * @pointer: Pointer to the property when it is not stored inline.
 * @value: Value of the property when it is stored inline.
 */
struct property_entry {
	const char *name;
	size_t length;
	bool is_inline;
	enum dev_prop_type type;
	union {
		const void *pointer;
		union {
			u8 u8_data[sizeof(u64) / sizeof(u8)];
			u16 u16_data[sizeof(u64) / sizeof(u16)];
			u32 u32_data[sizeof(u64) / sizeof(u32)];
			u64 u64_data[sizeof(u64) / sizeof(u64)];
			const char *str[sizeof(u64) / sizeof(char *)];
		} value;
	};
};
// Above, Modified (partially fixed due to confusion about requirements)
/**
 * struct property_entry2 - "Built-in" device property representation.
 * @name: Name of the property.
 * @length: Length of data making up the value.
 * @is_inline: True when the property value is stored inline.
 * @type: Type of the data in unions.
 * @pointer: Pointer to the property when it is not stored inline.
 * @value: Value of the property when it is stored inline.
 * @u8_data: @value as an array of u8 numbers
 * @u16_data: @value as an array of u16 numbers
 * @u32_data: @value as an array of u32 numbers
 * @u64_data: @value as an array of u64 numbers
 * @str: @value as an array of char pointers
 */
struct property_entry2 {
	const char *name;
	size_t length;
	bool is_inline;
	enum dev_prop_type type;
	union {
		const void *pointer;
		union {
			u8 u8_data[sizeof(u64) / sizeof(u8)];
			u16 u16_data[sizeof(u64) / sizeof(u16)];
			u32 u32_data[sizeof(u64) / sizeof(u32)];
			u64 u64_data[sizeof(u64) / sizeof(u64)];
			const char *str[sizeof(u64) / sizeof(char *)];
		} value;
	};
};

# define NUM_MSI_ALLOC_SCRATCHPAD_REGS	2

// from include/asm-generic/msi.h: struct msi_alloc_info (added: @flags)
// Should there be any warnings when @ul and @ptr are omitted?
/**
 * struct msi_alloc_info - Default structure for MSI interrupt allocation.
 * @desc:	Pointer to msi descriptor
 * @hwirq:	Associated hw interrupt number in the domain
 * @flags:	Bits from MSI_ALLOC_FLAGS_...
 * @scratchpad:	Storage for implementation specific scratch data
 * @scratchpad.ul:	Scratchpad data as unsigned long
 * @scratchpad.ptr:	Scratchpad data as a pointer
 *
 * Architectures can provide their own implementation by not including
 * asm-generic/msi.h into their arch specific header file.
 */
typedef struct msi_alloc_info {
	struct msi_desc			*desc;
	irq_hw_number_t			hwirq;
	unsigned long			flags;
	union {
		unsigned long		ul;
		void			*ptr;
	} scratchpad[NUM_MSI_ALLOC_SCRATCHPAD_REGS];
} msi_alloc_info_t;

#define NAND_MAX_ID_LEN 8
// from include/linux/mtd/rawnand.h: struct nand_flash_dev[ice] (2 versions)
// WITHOUT named nested unions/structs:
/**
 * struct nand_flash_dev - NAND Flash Device ID Structure
 * @name: a human-readable name of the NAND chip
 * @dev_id: the device ID (the second byte of the full chip ID array)
 * @mfr_id: manufacturer ID part of the full chip ID array (refers the same
 *          memory address as ``id[0]``)
 * @dev_id: device ID part of the full chip ID array (refers the same memory
 *          address as ``id[1]``)
 * @id: full device ID array
 * @pagesize: size of the NAND page in bytes; if 0, then the real page size (as
 *            well as the eraseblock size) is determined from the extended NAND
 *            chip ID array)
 * @chipsize: total chip size in MiB
 * @erasesize: eraseblock size in bytes (determined from the extended ID if 0)
 * @options: stores various chip bit options
 * @id_len: The valid length of the @id.
 * @oobsize: OOB size
 * @ecc: ECC correctability and step information from the datasheet.
 * @ecc.strength_ds: The ECC correctability from the datasheet, same as the
 *                   @ecc_strength_ds in nand_chip{}.
 * @ecc.step_ds: The ECC step required by the @ecc.strength_ds, same as the
 *               @ecc_step_ds in nand_chip{}, also from the datasheet.
 *               For example, the "4bit ECC for each 512Byte" can be set with
 *               NAND_ECC_INFO(4, 512).
 */
struct nand_flash_dev {
	char *name;
	union {
		struct {
			uint8_t mfr_id;
			uint8_t dev_id;
		};
		uint8_t id[NAND_MAX_ID_LEN];
	};
	unsigned int pagesize;
	unsigned int chipsize;
	unsigned int erasesize;
	unsigned int options;
	uint16_t id_len;
	uint16_t oobsize;
	struct {
		uint16_t strength_ds;
		uint16_t step_ds;
	} ecc;
};
// WITH named nested unions/structs:
/**
 * struct nand_flash_device - NAND Flash Device ID Structure
 * @name: a human-readable name of the NAND chip
 * @ids_ary: union containing all nand id info
 * @ids_ary.ids: nand ids (@dev_id, @mfr_id)
 * @ids_ary.ids.mfr_id: manufacturer ID part of the full chip ID array
 *   (refers to the same memory address as ``id[0]``)
 * @ids_ary.ids.dev_id: device ID part of the full chip ID array (refers to the
 *   same memory address as ``id[1]``)
 * @ids_ary.id: full device ID array
 * @pagesize: size of the NAND page in bytes; if 0, then the real page size (as
 *            well as the eraseblock size) is determined from the extended NAND
 *            chip ID array)
 * @chipsize: total chip size in MiB
 * @erasesize: eraseblock size in bytes (determined from the extended ID if 0)
 * @options: stores various chip bit options
 * @id_len: The valid length of the @id.
 * @oobsize: OOB size
 * @ecc: ECC correctability and step information from the datasheet.
 * @ecc.strength_ds: The ECC correctability from the datasheet, same as the
 *                   @ecc_strength_ds in nand_chip{}.
 * @ecc.step_ds: The ECC step required by the @ecc.strength_ds, same as the
 *               @ecc_step_ds in nand_chip{}, also from the datasheet.
 *               For example, the "4bit ECC for each 512Byte" can be set with
 *               NAND_ECC_INFO(4, 512).
 */
struct nand_flash_device {
	char *name;
	union u_ids_ary {
		struct nand_ids {
			uint8_t mfr_id;
			uint8_t dev_id;
		} ids;
		uint8_t id[NAND_MAX_ID_LEN];
	} ids_ary;
	unsigned int pagesize;
	unsigned int chipsize;
	unsigned int erasesize;
	unsigned int options;
	uint16_t id_len;
	uint16_t oobsize;
	struct ecc_ds {
		uint16_t strength_ds;
		uint16_t step_ds;
	} ecc;
};

// from include/linux/spi/spi-mem.h: struct spi_mem_op (unpatched; 2 warnings)
/**
 * struct spi_mem_op - describes a SPI memory operation
 * @cmd: the command structure
 * @cmd.nbytes: number of opcode bytes (only 1 or 2 are valid). The opcode is
 *		sent MSB-first.
 * @cmd.buswidth: number of IO lines used to transmit the command
 * @cmd.opcode: operation opcode
 * @cmd.dtr: whether the command opcode should be sent in DTR mode or not
 * @addr: address information
 * @addr.nbytes: number of address bytes to send. Can be zero if the operation
 *		 does not need to send an address
 * @addr.buswidth: number of IO lines used to transmit the address cycles
 * @addr.dtr: whether the address should be sent in DTR mode or not
 * @addr.val: address value. This value is always sent MSB first on the bus.
 *	      Note that only @addr.nbytes are taken into account in this
 *	      address value, so users should make sure the value fits in the
 *	      assigned number of bytes.
 * @dummy: dummy data information
 * @dummy.nbytes: number of dummy bytes to send after an opcode or address. Can
 *		  be zero if the operation does not require dummy bytes
 * @dummy.buswidth: number of IO lanes used to transmit the dummy bytes
 * @dummy.dtr: whether the dummy bytes should be sent in DTR mode or not
 * @data: data information
 * @data.buswidth: number of IO lanes used to send/receive the data
 * @data.dtr: whether the data should be sent in DTR mode or not
 * @data.ecc: whether error correction is required or not
 * @data.swap16: whether the byte order of 16-bit words is swapped when read
 *		 or written in Octal DTR mode compared to STR mode.
 * @data.dir: direction of the transfer
 * @data.nbytes: number of data bytes to send/receive. Can be zero if the
 *		 operation does not involve transferring data
 * @data.buf.in: input buffer (must be DMA-able)
 * @data.buf.out: output buffer (must be DMA-able)
 * @max_freq: frequency limitation wrt this operation. 0 means there is no
 *	      specific constraint and the highest achievable frequency can be
 *	      attempted.
 */
struct spi_mem_op {
	struct {
		u8 nbytes;
		u8 buswidth;
		u8 dtr : 1;
		u8 __pad : 7;
		u16 opcode;
	} cmd;

	struct {
		u8 nbytes;
		u8 buswidth;
		u8 dtr : 1;
		u8 __pad : 7;
		u64 val;
	} addr;

	struct {
		u8 nbytes;
		u8 buswidth;
		u8 dtr : 1;
		u8 __pad : 7;
	} dummy;

	struct {
		u8 buswidth;
		u8 dtr : 1;
		u8 ecc : 1;
		u8 swap16 : 1;
		u8 __pad : 5;
		u8 dir;
		unsigned int nbytes;
		union {
			void *in;
			const void *out;
		} buf;
	} data;

	unsigned int max_freq;
};

// from include/soc/fsl/dpaa2-fd.h: struct dpaa2_fd (@simple not described)
/**
 * struct dpaa2_fd - Struct describing FDs
 * @words:         for easier/faster copying the whole FD structure
 * @simple:        simple frame descriptor
 * @simple.addr:   address in the FD
 * @simple.len:    length in the FD
 * @simple.bpid:   buffer pool ID
 * @simple.format_offset: format, offset, and short-length fields
 * @simple.frc:    frame context
 * @simple.ctrl:   control bits...including dd, sc, va, err, etc
 * @simple.flc:    flow context address
 *
 * This structure represents the basic Frame Descriptor used in the system.
 */
struct dpaa2_fd {
	union {
		u32 words[8];
		struct dpaa2_fd_simple {
			__le64 addr;
			__le32 len;
			__le16 bpid;
			__le16 format_offset;
			__le32 frc;
			__le32 ctrl;
			__le64 flc;
		} simple;
	};
};

// from include/net/netlink.h:
struct nlattr;
struct netlink_ext_ack;

/**
 * struct nla_policy - attribute validation policy
 * @type: Type of attribute or NLA_UNSPEC
 * @validation_type: type of attribute validation done in addition to
 *	type-specific validation (e.g. range, function call), see
 *	&enum nla_policy_validation
 * @len: Type specific length of payload
 * @strict_start_type: first attribute to validate strictly
 * @pflags: policy flags (should be public:)
 */
struct nla_policy {
	u8		type;
	u8		validation_type;
	u16		len;
	union {
		u16 strict_start_type;

		/* private: use NLA_POLICY_*() to set */
		const u32 bitfield32_valid;
		const u32 mask;
		const char *reject_message;
		const struct nla_policy *nested_policy;
		const struct netlink_range_validation *range;
		const struct netlink_range_validation_signed *range_signed;
		struct {
			s16 min, max;
		};
		int (*validate)(const struct nlattr *attr,
				struct netlink_ext_ack *extack);
	};
	/* Should be public; */
	u32	pflags;
};

// Contrived nested struct/union tests:

/**
 * struct super_struct - nested structs and unions
 * @super_type_flags: type of data in this struct plus some common flags
 * @uhdr: union for @tlv or @hdr
 * @uhdr.tlv: type-length-value descriptor
 * @uhdr.hdr: common command/reply header
 * @udata: union for @descr or @data
 * @udata.descr: generic super descriptor
 * @udata.data: flags + pointer or inline data of various basic types
 */
struct super_struct {
	u64			super_type_flags;

	union ss_header {
		struct tlv {
			unsigned long	type;
			unsigned long	length;
			void		*value;
		} tlv;

		struct header {
			unsigned int	type;
			unsigned int	length;
			unsigned long	flags;
			unsigned char	req_rep[16];
			unsigned char	data[];
		} hdr;
	} uhdr;

	union ss_data {
		struct gen_descr { // phys_addr, virt_addr, io_addr, dma_addr, io_addr
			unsigned int	flags;		// which addr type
			unsigned int	addr_flags;	// addr-specific flags
			u64		base_addr;
			u64		end_addr;
			unsigned int	align;
			unsigned int	offset;
			u64		length;
			u64		handle;
			void		*func;
			unsigned char	*name;
			void		*data;		// arbitrary data
		} descr;

		struct data {
			u64		dflags;	// tells what is in @fun
			union actual {
				void		*pointer;
				union values {		// inline data
					u8 u8_data[sizeof(u64) / sizeof(u8)];
					u16 u16_data[sizeof(u64) / sizeof(u16)];
					u32 u32_data[sizeof(u64) / sizeof(u32)];
					u64 u64_data[sizeof(u64) / sizeof(u64)];
					char str[sizeof(u64) / sizeof(char *)];
				} value;
			} actual;
		} data;
	} udata;
};

struct part_name {
	void		*name;
	unsigned long	name_len;
};

union part_model {
	char		*inv_model;
	char		*mfr_model;
} partmodel;

/**
 * struct inv_union - nested unions and structs for inventory
 *
 * (use inline kernel-doc comments here)
 */
union inv_union {
	/** @invpart: all local inventory info for a part */
	struct inv_part {
		/**
		 * @invpart.type_flags: type of data in this union plus
		 * some common flags
		 */
		u64		type_flags;

		/** @invpart.part_price: the item's usual selling price */
		u64		part_price;

		/** @invpart.inv_partnr: the local inventory part number */
		char		*inv_partnr;

		/** @invpart.inv_count: number in inventory */
		u64		inv_count;

		/** @invpart.partname: local part name (descriptor) */
		struct part_name	partname;

		/** @invpart.partmodel: local part model (pointer) */
		union part_model	partmodel;
	} invpart;

	/** @mfrpart: all mfr. info for a part */
	struct mfr_part {
		/**
		 * @mfrpart.type_flags: type of data in this union plus
		 * some common flags
		 */
		u64		type_flags;

		/** @mfrpart.part_price: the item's mfr's cost */
		u64		part_price;

		/**
		 * @mfrpart.mfr_partnr: the manufacturer part number
		 * (descriptor)
		 */
		char		*mfr_partnr;

		/** @mfrpart.partname: mfr. part name (descriptor) */
		struct part_name	partname;

		/** @mfrpart.partmodel: mfr. part model (pointer) */
		union part_model	partmodel;
	} mfrpart;
};

// from n2310.pdf - ISO/IEC 9899:202x (E);
// or n1570.pdf - ISO/IEC 9899:201x
// or n1124.pdf - ISO/IEC 9899:TC2
// or ISO/IEC 9899:1999 (E)
/**
 * union u - union of all data basic data types
 * @n: integer type struct
 * @n.alltypes: arbitrary value
 * @ni: integer type & value struct
 * @ni.type: the type of the value
 * @ni.intnode: the int's value
 * @nf: double type and value struct
 * @nf.type: the type of the value
 * @nf.doublenode: the double's value
 */
union u {
	struct {
		int alltypes;
	} n;
	struct {
		int type;
		int intnode;
	} ni;
	struct {
		int type;
		double doublenode;
	} nf;
} u;

// from n2310.pdf - ISO/IEC 9899:202x (E)
// or n1570.pdf - ISO/IEC 9899:201x
// or n1124.pdf - ISO/IEC 9899:TC2
// or ISO/IEC 9899:1999 (E)
struct s { double i; };

/**
 * union g - composite of (int + struct) or (struct + int)
 *
 * (use inline kernel-doc comments here)
 */
union g {
	/** @u1: struct with members in int/struct order */
	struct {
		int f1;
		struct s f2;
	} u1;
	/** @u2: struct with members in struct/int order */
	struct {
		struct s f3;
		int f4;
	} u2;
} g;

// from n1570.pdf - ISO/IEC 9899:201x
/** struct v - anonymous struct & union example
 * @i: number i
 * @j: number j
 * @w: anon. &struct w
 * @w.k: number w.k
 * @w.l: number w.l
 * @m: number m
 */
struct v {
	union { // anonymous union
		struct { int i, j; }; // anonymous structure
		struct { long k, l; } w;
	};
	int m;
} v1;

#ifdef __KERNEL__

static int __init init_data(void)
{
	return -ENODATA;
}
static void __exit exit_data(void)
{
}
module_init(init_data);
module_exit(exit_data);

MODULE_LICENSE("GPL");
MODULE_AUTHOR("Randy Dunlap");
MODULE_DESCRIPTION("Nested structs/unions test");
#endif

//EOF//

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 00/38] docs: several improvements to kernel-doc
  2026-02-18 10:12 [PATCH 00/38] docs: several improvements to kernel-doc Mauro Carvalho Chehab
                   ` (38 preceding siblings ...)
  2026-02-21  1:24 ` [PATCH 00/38] docs: several improvements to kernel-doc Randy Dunlap
@ 2026-02-23 13:47 ` Jani Nikula
  2026-02-23 15:02   ` Jonathan Corbet
  2026-03-03 14:53   ` Mauro Carvalho Chehab
  2026-02-23 21:58 ` Jonathan Corbet
  40 siblings, 2 replies; 55+ messages in thread
From: Jani Nikula @ 2026-02-23 13:47 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Alexander Lobakin, Jonathan Corbet,
	Kees Cook, Mauro Carvalho Chehab
  Cc: Mauro Carvalho Chehab, intel-wired-lan, linux-doc,
	linux-hardening, linux-kernel, netdev, Gustavo A. R. Silva,
	Aleksandr Loktionov, Randy Dunlap, Shuah Khan

On Wed, 18 Feb 2026, Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> As anyone that worked before with kernel-doc are aware, using regex to
> handle C input is not great. Instead, we need something closer to how
> C statements and declarations are handled.
>
> Yet, to avoid breaking  docs, I avoided touching the regex-based algorithms
> inside it with one exception: struct_group logic was using very complex
> regexes that are incompatible with Python internal "re" module.
>
> So, I came up with a different approach: NestedMatch. The logic inside
> it is meant to properly handle brackets, square brackets and parenthesis,
> which is closer to what C lexical parser does. On that time, I added
> a TODO about the need to extend that.

There's always the question, if you're putting a lot of effort into
making kernel-doc closer to an actual C parser, why not put all that
effort into using and adapting to, you know, an actual C parser?


BR,
Jani.

-- 
Jani Nikula, Intel

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 00/38] docs: several improvements to kernel-doc
  2026-02-23 13:47 ` Jani Nikula
@ 2026-02-23 15:02   ` Jonathan Corbet
  2026-02-24 13:25     ` [Intel-wired-lan] " Mauro Carvalho Chehab
  2026-03-04 10:07     ` Jani Nikula
  2026-03-03 14:53   ` Mauro Carvalho Chehab
  1 sibling, 2 replies; 55+ messages in thread
From: Jonathan Corbet @ 2026-02-23 15:02 UTC (permalink / raw)
  To: Jani Nikula, Mauro Carvalho Chehab, Alexander Lobakin, Kees Cook,
	Mauro Carvalho Chehab
  Cc: Mauro Carvalho Chehab, intel-wired-lan, linux-doc,
	linux-hardening, linux-kernel, netdev, Gustavo A. R. Silva,
	Aleksandr Loktionov, Randy Dunlap, Shuah Khan

Jani Nikula <jani.nikula@linux.intel.com> writes:

> There's always the question, if you're putting a lot of effort into
> making kernel-doc closer to an actual C parser, why not put all that
> effort into using and adapting to, you know, an actual C parser?

Not speaking to the current effort but ... in the past, when I have
contemplated this (using, say, tree-sitter), the real problem is that
those parsers simply strip out the comments.  Kerneldoc without comments
... doesn't work very well.  If there were a parser without those
problems, and which could be made to do the right thing with all of our
weird macro usage, it would certainly be worth considering.

jon

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 00/38] docs: several improvements to kernel-doc
  2026-02-18 10:12 [PATCH 00/38] docs: several improvements to kernel-doc Mauro Carvalho Chehab
                   ` (39 preceding siblings ...)
  2026-02-23 13:47 ` Jani Nikula
@ 2026-02-23 21:58 ` Jonathan Corbet
  2026-03-02 15:54   ` [Intel-wired-lan] " Mauro Carvalho Chehab
  40 siblings, 1 reply; 55+ messages in thread
From: Jonathan Corbet @ 2026-02-23 21:58 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Alexander Lobakin, Kees Cook,
	Mauro Carvalho Chehab
  Cc: Mauro Carvalho Chehab, intel-wired-lan, linux-doc,
	linux-hardening, linux-kernel, netdev, Gustavo A. R. Silva,
	Aleksandr Loktionov, Randy Dunlap, Shuah Khan

Mauro Carvalho Chehab <mchehab+huawei@kernel.org> writes:

> Hi Jon,
>
> This series contain several improvements for kernel-doc.
>
> Most of the patches came from v4 of this series:
> 	https://lore.kernel.org/linux-doc/cover.1769867953.git.mchehab+huawei@kernel.org/

So I will freely confess to having lost the plot with this stuff; I'm
now trying to get back up to speed.  But, before I dig into this big
series, can you say whether you think it's ready, or whether there's
another one on the horizon that I should wait for?

Thanks,

jon

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Intel-wired-lan] [PATCH 00/38] docs: several improvements to kernel-doc
  2026-02-23 15:02   ` Jonathan Corbet
@ 2026-02-24 13:25     ` Mauro Carvalho Chehab
  2026-03-04 10:07     ` Jani Nikula
  1 sibling, 0 replies; 55+ messages in thread
From: Mauro Carvalho Chehab @ 2026-02-24 13:25 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Jani Nikula, Alexander Lobakin, Kees Cook, Mauro Carvalho Chehab,
	intel-wired-lan, linux-doc, linux-hardening, linux-kernel, netdev,
	Gustavo A. R. Silva, Aleksandr Loktionov, Randy Dunlap,
	Shuah Khan

On Mon, 23 Feb 2026 08:02:11 -0700
Jonathan Corbet <corbet@lwn.net> wrote:

> Jani Nikula <jani.nikula@linux.intel.com> writes:
> 
> > There's always the question, if you're putting a lot of effort into
> > making kernel-doc closer to an actual C parser, why not put all that
> > effort into using and adapting to, you know, an actual C parser?  
> 
> Not speaking to the current effort but ... in the past, when I have
> contemplated this (using, say, tree-sitter), the real problem is that
> those parsers simply strip out the comments.  Kerneldoc without comments
> ... doesn't work very well.  If there were a parser without those
> problems, and which could be made to do the right thing with all of our
> weird macro usage, it would certainly be worth considering.

Parser is only needed for statement prototypes. There, stripping
comments (after we parse public/private) should be OK. Yet, we
want a python library to do the parsing, using it only for the
things we want to be parsed.

Assuming we have something like that, we'll still need to teach
the parser about the macro transforms, as those are very Linux
specific.

Maybe something like: https://github.com/eliben/pycparser would
help (didn't test nor tried to check if it does what we want).

There is an additional problem that this will add an extra
dependency for the Kernel build itself, because kernel-doc can
run at Kernel build time.

-- 
Thanks,
Mauro

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Intel-wired-lan] [PATCH 00/38] docs: several improvements to kernel-doc
  2026-02-23 21:58 ` Jonathan Corbet
@ 2026-03-02 15:54   ` Mauro Carvalho Chehab
  2026-03-02 16:14     ` Jonathan Corbet
  0 siblings, 1 reply; 55+ messages in thread
From: Mauro Carvalho Chehab @ 2026-03-02 15:54 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Alexander Lobakin, Kees Cook, Mauro Carvalho Chehab,
	intel-wired-lan, linux-doc, linux-hardening, linux-kernel, netdev,
	Gustavo A. R. Silva, Aleksandr Loktionov, Randy Dunlap,
	Shuah Khan

On Mon, 23 Feb 2026 14:58:53 -0700
Jonathan Corbet <corbet@lwn.net> wrote:

> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> writes:
> 
> > Hi Jon,
> >
> > This series contain several improvements for kernel-doc.
> >
> > Most of the patches came from v4 of this series:
> > 	https://lore.kernel.org/linux-doc/cover.1769867953.git.mchehab+huawei@kernel.org/  
> 
> So I will freely confess to having lost the plot with this stuff; I'm
> now trying to get back up to speed. 

Yeah, I kinda figure it out ;-)

> But, before I dig into this big
> series, can you say whether you think it's ready, or whether there's
> another one on the horizon that I should wait for?

There are more things undergoing, but I need some time to reorganize
the patchset... currently, there are 60+ patches on my pile.

So, instead of merging this patchset, I'll be sending you
a smaller series with the basic stuff, in a way that it would
be easier to review. My plan is to send patches along this week
on smaller chunks, and after checking the differences before/after,
in terms of man/rst/error output.

-- 
Thanks,
Mauro

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Intel-wired-lan] [PATCH 00/38] docs: several improvements to kernel-doc
  2026-03-02 15:54   ` [Intel-wired-lan] " Mauro Carvalho Chehab
@ 2026-03-02 16:14     ` Jonathan Corbet
  0 siblings, 0 replies; 55+ messages in thread
From: Jonathan Corbet @ 2026-03-02 16:14 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Alexander Lobakin, Kees Cook, Mauro Carvalho Chehab,
	intel-wired-lan, linux-doc, linux-hardening, linux-kernel, netdev,
	Gustavo A. R. Silva, Aleksandr Loktionov, Randy Dunlap,
	Shuah Khan

Mauro Carvalho Chehab <mchehab+huawei@kernel.org> writes:

> So, instead of merging this patchset, I'll be sending you
> a smaller series with the basic stuff, in a way that it would
> be easier to review. My plan is to send patches along this week
> on smaller chunks, and after checking the differences before/after,
> in terms of man/rst/error output.

OK... *whew* ... that sounds like a better way to proceed :)

Thanks,

jon

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 00/38] docs: several improvements to kernel-doc
  2026-02-23 13:47 ` Jani Nikula
  2026-02-23 15:02   ` Jonathan Corbet
@ 2026-03-03 14:53   ` Mauro Carvalho Chehab
  2026-03-03 15:12     ` Loktionov, Aleksandr
  2026-03-04  9:51     ` Jani Nikula
  1 sibling, 2 replies; 55+ messages in thread
From: Mauro Carvalho Chehab @ 2026-03-03 14:53 UTC (permalink / raw)
  To: Jani Nikula
  Cc: Alexander Lobakin, Jonathan Corbet, Kees Cook,
	Mauro Carvalho Chehab, intel-wired-lan, linux-doc,
	linux-hardening, linux-kernel, netdev, Gustavo A. R. Silva,
	Aleksandr Loktionov, Randy Dunlap, Shuah Khan

On Mon, 23 Feb 2026 15:47:00 +0200
Jani Nikula <jani.nikula@linux.intel.com> wrote:

> On Wed, 18 Feb 2026, Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> > As anyone that worked before with kernel-doc are aware, using regex to
> > handle C input is not great. Instead, we need something closer to how
> > C statements and declarations are handled.
> >
> > Yet, to avoid breaking  docs, I avoided touching the regex-based algorithms
> > inside it with one exception: struct_group logic was using very complex
> > regexes that are incompatible with Python internal "re" module.
> >
> > So, I came up with a different approach: NestedMatch. The logic inside
> > it is meant to properly handle brackets, square brackets and parenthesis,
> > which is closer to what C lexical parser does. On that time, I added
> > a TODO about the need to extend that.  
> 
> There's always the question, if you're putting a lot of effort into
> making kernel-doc closer to an actual C parser, why not put all that
> effort into using and adapting to, you know, an actual C parser?

Playing with this idea, it is not that hard to write an actual C
parser - or at least a tokenizer. There is already an example of it
at:

	https://docs.python.org/3/library/re.html

I did a quick implementation, and it seems to be able to do its job:

    $ ./tokenizer.py ./include/net/netlink.h
      1:  0  COMMENT       '/* SPDX-License-Identifier: GPL-2.0 */'
      2:  0  CPP           '#ifndef'
      2:  8  ID            '__NET_NETLINK_H'
      3:  0  CPP           '#define'
      3:  8  ID            '__NET_NETLINK_H'
      5:  0  CPP           '#include'
      5:  9  OP            '<'
      5: 10  ID            'linux'
      5: 15  OP            '/'
      5: 16  ID            'types'
      5: 21  PUNC          '.'
      5: 22  ID            'h'
      5: 23  OP            '>'
      6:  0  CPP           '#include'
      6:  9  OP            '<'
      6: 10  ID            'linux'
      6: 15  OP            '/'
      6: 16  ID            'netlink'
      6: 23  PUNC          '.'
      6: 24  ID            'h'
      6: 25  OP            '>'
      7:  0  CPP           '#include'
      7:  9  OP            '<'
      7: 10  ID            'linux'
      7: 15  OP            '/'
      7: 16  ID            'jiffies'
      7: 23  PUNC          '.'
      7: 24  ID            'h'
      7: 25  OP            '>'
      8:  0  CPP           '#include'
      8:  9  OP            '<'
      8: 10  ID            'linux'
      8: 15  OP            '/'
      8: 16  ID            'in6'
...
     12:  1  COMMENT       '/**\n  * Standard attribute types to specify validation policy\n  */'
     13:  0  ENUM          'enum'
     13:  5  PUNC          '{'
     14:  1  ID            'NLA_UNSPEC'
     14: 11  PUNC          ','
     15:  1  ID            'NLA_U8'
     15:  7  PUNC          ','
     16:  1  ID            'NLA_U16'
     16:  8  PUNC          ','
     17:  1  ID            'NLA_U32'
     17:  8  PUNC          ','
     18:  1  ID            'NLA_U64'
     18:  8  PUNC          ','
     19:  1  ID            'NLA_STRING'
     19: 11  PUNC          ','
     20:  1  ID            'NLA_FLAG'
...
     41:  0  STRUCT        'struct'
     41:  7  ID            'netlink_range_validation'
     41: 32  PUNC          '{'
     42:  1  ID            'u64'
     42:  5  ID            'min'
     42:  8  PUNC          ','
     42: 10  ID            'max'
     42: 13  PUNC          ';'
     43:  0  PUNC          '}'
     43:  1  PUNC          ';'
     45:  0  STRUCT        'struct'
     45:  7  ID            'netlink_range_validation_signed'
     45: 39  PUNC          '{'
     46:  1  ID            's64'
     46:  5  ID            'min'
     46:  8  PUNC          ','
     46: 10  ID            'max'
     46: 13  PUNC          ';'
     47:  0  PUNC          '}'
     47:  1  PUNC          ';'
     49:  0  ENUM          'enum'
     49:  5  ID            'nla_policy_validation'
     49: 27  PUNC          '{'
     50:  1  ID            'NLA_VALIDATE_NONE'
     50: 18  PUNC          ','
     51:  1  ID            'NLA_VALIDATE_RANGE'
     51: 19  PUNC          ','
     52:  1  ID            'NLA_VALIDATE_RANGE_WARN_TOO_LONG'
     52: 33  PUNC          ','
     53:  1  ID            'NLA_VALIDATE_MIN'
     53: 17  PUNC          ','
     54:  1  ID            'NLA_VALIDATE_MAX'
     54: 17  PUNC          ','
     55:  1  ID            'NLA_VALIDATE_MASK'
     55: 18  PUNC          ','
     56:  1  ID            'NLA_VALIDATE_RANGE_PTR'
     56: 23  PUNC          ','
     57:  1  ID            'NLA_VALIDATE_FUNCTION'
     57: 22  PUNC          ','
     58:  0  PUNC          '}'
     58:  1  PUNC          ';'

It sounds doable to use it, and, at least on this example, it
properly picked the IDs.

On the other hand, using it would require lots of changes at
kernel-doc. So, I guess I'll add a tokenizer to kernel-doc, but
we should likely start using it gradually.

Maybe starting with NestedSearch and with public/private
comment handling (which is currently half-broken).

As a reference, the above was generated with the code below,
which was based on the Python re documentation.

Comments?

---

One side note: right now, we're not using typing at kernel-doc,
nor really following a proper coding style.

I wanted to use it during the conversion, and place consts in
uppercase, as this is currently the best practices, but doing
it while converting from Perl were very annoying. So, I opted
to make things simpler. Now that we have it coded, perhaps it
is time to define a coding style and apply it to kernel-doc.

-- 
Thanks,
Mauro

#!/usr/bin/env python3

import sys
import re

class Token():
    def __init__(self, type, value, line, column):
        self.type = type
        self.value = value
        self.line = line
        self.column = column

class CTokenizer():
    C_KEYWORDS = {
        "struct", "union", "enum",
    }

    TOKEN_LIST = [
        ("COMMENT", r"//[^\n]*|/\*[\s\S]*?\*/"),

        ("STRING",  r'"(?:\\.|[^"\\])*"'),
        ("CHAR",    r"'(?:\\.|[^'\\])'"),

        ("NUMBER",  r"0[xX][0-9a-fA-F]+[uUlL]*|0[0-7]+[uUlL]*|"
                    r"[0-9]+(\.[0-9]*)?([eE][+-]?[0-9]+)?[fFlL]*"),

        ("ID",      r"[A-Za-z_][A-Za-z0-9_]*"),

        ("OP",      r"\+\+|\-\-|\->|==|\!=|<=|>=|&&|\|\||<<|>>|\+=|\-=|\*=|/=|%="
                    r"|&=|\|=|\^=|=|\+|\-|\*|/|%|<|>|&|\||\^|~|!|\?|\:"),

        ("PUNC",    r"[;,\.\[\]\(\)\{\}]"),

        ("CPP",     r"#\s*(define|include|ifdef|ifndef|if|else|elif|endif|undef|pragma)"),

        ("HASH",    r"#"),

        ("NEWLINE", r"\n"),

        ("SKIP",    r"[\s]+"),

        ("MISMATCH",r"."),
    ]

    def __init__(self):
        re_tokens = []

        for name, pattern in self.TOKEN_LIST:
            re_tokens.append(f"(?P<{name}>{pattern})")

        self.re_scanner = re.compile("|".join(re_tokens),
                                     re.MULTILINE | re.DOTALL)

    def tokenize(self, code):
        # Handle continuation lines
        code = re.sub(r"\\\n", "", code)

        line_num = 1
        line_start = 0

        for match in self.re_scanner.finditer(code):
            kind   = match.lastgroup
            value  = match.group()
            column = match.start() - line_start

            if kind == "NEWLINE":
                line_start = match.end()
                line_num += 1
                continue

            if kind in {"SKIP"}:
                continue

            if kind == "MISMATCH":
                raise RuntimeError(f"Unexpected character {value!r} on line {line_num}")

            if kind == "ID" and value in self.C_KEYWORDS:
                kind = value.upper()

            # For all other tokens we keep the raw string value
            yield Token(kind, value, line_num, column)

if __name__ == "__main__":
    if len(sys.argv) != 2:
        print(f"Usage: python {sys.argv[0]} <fname>")
        sys.exit(1)

    fname = sys.argv[1]

    try:
        with open(fname, 'r', encoding='utf-8') as file:
            sample = file.read()
    except FileNotFoundError:
        print(f"Error: The file '{fname}' was not found.")
        sys.exit(1)
    except Exception as e:
        print(f"An error occurred while reading the file: {str(e)}")
        sys.exit(1)

    print(f"Tokens from {fname}:")

    for tok in CTokenizer().tokenize(sample):
        print(f"{tok.line:3d}:{tok.column:3d}  {tok.type:12}  {tok.value!r}")


^ permalink raw reply	[flat|nested] 55+ messages in thread

* RE: [PATCH 00/38] docs: several improvements to kernel-doc
  2026-03-03 14:53   ` Mauro Carvalho Chehab
@ 2026-03-03 15:12     ` Loktionov, Aleksandr
  2026-03-03 16:09       ` [Intel-wired-lan] " Mauro Carvalho Chehab
  2026-03-04  9:51     ` Jani Nikula
  1 sibling, 1 reply; 55+ messages in thread
From: Loktionov, Aleksandr @ 2026-03-03 15:12 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Jani Nikula
  Cc: Lobakin, Aleksander, Jonathan Corbet, Kees Cook,
	Mauro Carvalho Chehab, intel-wired-lan@lists.osuosl.org,
	linux-doc@vger.kernel.org, linux-hardening@vger.kernel.org,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	Gustavo A. R. Silva, Randy Dunlap, Shuah Khan



> -----Original Message-----
> From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Sent: Tuesday, March 3, 2026 3:53 PM
> To: Jani Nikula <jani.nikula@linux.intel.com>
> Cc: Lobakin, Aleksander <aleksander.lobakin@intel.com>; Jonathan
> Corbet <corbet@lwn.net>; Kees Cook <kees@kernel.org>; Mauro Carvalho
> Chehab <mchehab@kernel.org>; intel-wired-lan@lists.osuosl.org; linux-
> doc@vger.kernel.org; linux-hardening@vger.kernel.org; linux-
> kernel@vger.kernel.org; netdev@vger.kernel.org; Gustavo A. R. Silva
> <gustavoars@kernel.org>; Loktionov, Aleksandr
> <aleksandr.loktionov@intel.com>; Randy Dunlap <rdunlap@infradead.org>;
> Shuah Khan <skhan@linuxfoundation.org>
> Subject: Re: [PATCH 00/38] docs: several improvements to kernel-doc
> 
> On Mon, 23 Feb 2026 15:47:00 +0200
> Jani Nikula <jani.nikula@linux.intel.com> wrote:
> 
> > On Wed, 18 Feb 2026, Mauro Carvalho Chehab
> <mchehab+huawei@kernel.org> wrote:
> > > As anyone that worked before with kernel-doc are aware, using
> regex
> > > to handle C input is not great. Instead, we need something closer
> to
> > > how C statements and declarations are handled.
> > >
> > > Yet, to avoid breaking  docs, I avoided touching the regex-based
> > > algorithms inside it with one exception: struct_group logic was
> > > using very complex regexes that are incompatible with Python
> internal "re" module.
> > >
> > > So, I came up with a different approach: NestedMatch. The logic
> > > inside it is meant to properly handle brackets, square brackets
> and
> > > parenthesis, which is closer to what C lexical parser does. On
> that
> > > time, I added a TODO about the need to extend that.
> >
> > There's always the question, if you're putting a lot of effort into
> > making kernel-doc closer to an actual C parser, why not put all that
> > effort into using and adapting to, you know, an actual C parser?
> 
> Playing with this idea, it is not that hard to write an actual C
> parser - or at least a tokenizer. There is already an example of it
> at:
> 
> 	https://docs.python.org/3/library/re.html
> 
> I did a quick implementation, and it seems to be able to do its job:
> 
>     $ ./tokenizer.py ./include/net/netlink.h
>       1:  0  COMMENT       '/* SPDX-License-Identifier: GPL-2.0 */'
>       2:  0  CPP           '#ifndef'
>       2:  8  ID            '__NET_NETLINK_H'
>       3:  0  CPP           '#define'
>       3:  8  ID            '__NET_NETLINK_H'
>       5:  0  CPP           '#include'
>       5:  9  OP            '<'
>       5: 10  ID            'linux'
>       5: 15  OP            '/'
>       5: 16  ID            'types'
>       5: 21  PUNC          '.'
>       5: 22  ID            'h'
>       5: 23  OP            '>'
>       6:  0  CPP           '#include'
>       6:  9  OP            '<'
>       6: 10  ID            'linux'
>       6: 15  OP            '/'
>       6: 16  ID            'netlink'
>       6: 23  PUNC          '.'
>       6: 24  ID            'h'
>       6: 25  OP            '>'
>       7:  0  CPP           '#include'
>       7:  9  OP            '<'
>       7: 10  ID            'linux'
>       7: 15  OP            '/'
>       7: 16  ID            'jiffies'
>       7: 23  PUNC          '.'
>       7: 24  ID            'h'
>       7: 25  OP            '>'
>       8:  0  CPP           '#include'
>       8:  9  OP            '<'
>       8: 10  ID            'linux'
>       8: 15  OP            '/'
>       8: 16  ID            'in6'
> ...
>      12:  1  COMMENT       '/**\n  * Standard attribute types to
> specify validation policy\n  */'
>      13:  0  ENUM          'enum'
>      13:  5  PUNC          '{'
>      14:  1  ID            'NLA_UNSPEC'
>      14: 11  PUNC          ','
>      15:  1  ID            'NLA_U8'
>      15:  7  PUNC          ','
>      16:  1  ID            'NLA_U16'
>      16:  8  PUNC          ','
>      17:  1  ID            'NLA_U32'
>      17:  8  PUNC          ','
>      18:  1  ID            'NLA_U64'
>      18:  8  PUNC          ','
>      19:  1  ID            'NLA_STRING'
>      19: 11  PUNC          ','
>      20:  1  ID            'NLA_FLAG'
> ...
>      41:  0  STRUCT        'struct'
>      41:  7  ID            'netlink_range_validation'
>      41: 32  PUNC          '{'
>      42:  1  ID            'u64'
>      42:  5  ID            'min'
>      42:  8  PUNC          ','
>      42: 10  ID            'max'
>      42: 13  PUNC          ';'
>      43:  0  PUNC          '}'
>      43:  1  PUNC          ';'
>      45:  0  STRUCT        'struct'
>      45:  7  ID            'netlink_range_validation_signed'
>      45: 39  PUNC          '{'
>      46:  1  ID            's64'
>      46:  5  ID            'min'
>      46:  8  PUNC          ','
>      46: 10  ID            'max'
>      46: 13  PUNC          ';'
>      47:  0  PUNC          '}'
>      47:  1  PUNC          ';'
>      49:  0  ENUM          'enum'
>      49:  5  ID            'nla_policy_validation'
>      49: 27  PUNC          '{'
>      50:  1  ID            'NLA_VALIDATE_NONE'
>      50: 18  PUNC          ','
>      51:  1  ID            'NLA_VALIDATE_RANGE'
>      51: 19  PUNC          ','
>      52:  1  ID            'NLA_VALIDATE_RANGE_WARN_TOO_LONG'
>      52: 33  PUNC          ','
>      53:  1  ID            'NLA_VALIDATE_MIN'
>      53: 17  PUNC          ','
>      54:  1  ID            'NLA_VALIDATE_MAX'
>      54: 17  PUNC          ','
>      55:  1  ID            'NLA_VALIDATE_MASK'
>      55: 18  PUNC          ','
>      56:  1  ID            'NLA_VALIDATE_RANGE_PTR'
>      56: 23  PUNC          ','
>      57:  1  ID            'NLA_VALIDATE_FUNCTION'
>      57: 22  PUNC          ','
>      58:  0  PUNC          '}'
>      58:  1  PUNC          ';'
> 
> It sounds doable to use it, and, at least on this example, it properly
> picked the IDs.
> 
> On the other hand, using it would require lots of changes at kernel-
> doc. So, I guess I'll add a tokenizer to kernel-doc, but we should
> likely start using it gradually.
> 
> Maybe starting with NestedSearch and with public/private comment
> handling (which is currently half-broken).
> 
> As a reference, the above was generated with the code below, which was
> based on the Python re documentation.
> 
> Comments?
> 
> ---
> 
> One side note: right now, we're not using typing at kernel-doc, nor
> really following a proper coding style.
> 
> I wanted to use it during the conversion, and place consts in
> uppercase, as this is currently the best practices, but doing it while
> converting from Perl were very annoying. So, I opted to make things
> simpler. Now that we have it coded, perhaps it is time to define a
> coding style and apply it to kernel-doc.
> 
> --
> Thanks,
> Mauro
> 
> #!/usr/bin/env python3
> 
> import sys
> import re
> 
> class Token():
>     def __init__(self, type, value, line, column):
>         self.type = type
>         self.value = value
>         self.line = line
>         self.column = column
> 
> class CTokenizer():
>     C_KEYWORDS = {
>         "struct", "union", "enum",
>     }
> 
>     TOKEN_LIST = [
>         ("COMMENT", r"//[^\n]*|/\*[\s\S]*?\*/"),
> 
>         ("STRING",  r'"(?:\\.|[^"\\])*"'),
>         ("CHAR",    r"'(?:\\.|[^'\\])'"),
> 
>         ("NUMBER",  r"0[xX][0-9a-fA-F]+[uUlL]*|0[0-7]+[uUlL]*|"
>                     r"[0-9]+(\.[0-9]*)?([eE][+-]?[0-9]+)?[fFlL]*"),
> 
>         ("ID",      r"[A-Za-z_][A-Za-z0-9_]*"),
> 
>         ("OP",      r"\+\+|\-\-|\->|==|\!=|<=|>=|&&|\|\||<<|>>|\+=|\-
> =|\*=|/=|%="
>                     r"|&=|\|=|\^=|=|\+|\-
> |\*|/|%|<|>|&|\||\^|~|!|\?|\:"),
> 
>         ("PUNC",    r"[;,\.\[\]\(\)\{\}]"),
> 
>         ("CPP",
> r"#\s*(define|include|ifdef|ifndef|if|else|elif|endif|undef|pragma)"),
> 
>         ("HASH",    r"#"),
> 
>         ("NEWLINE", r"\n"),
> 
>         ("SKIP",    r"[\s]+"),
> 
>         ("MISMATCH",r"."),
>     ]
> 
>     def __init__(self):
>         re_tokens = []
> 
>         for name, pattern in self.TOKEN_LIST:
>             re_tokens.append(f"(?P<{name}>{pattern})")
> 
>         self.re_scanner = re.compile("|".join(re_tokens),
>                                      re.MULTILINE | re.DOTALL)
> 
>     def tokenize(self, code):
>         # Handle continuation lines
>         code = re.sub(r"\\\n", "", code)
> 
>         line_num = 1
>         line_start = 0
> 
>         for match in self.re_scanner.finditer(code):
>             kind   = match.lastgroup
>             value  = match.group()
>             column = match.start() - line_start
> 
>             if kind == "NEWLINE":
>                 line_start = match.end()
>                 line_num += 1
>                 continue
> 
>             if kind in {"SKIP"}:
>                 continue
> 
>             if kind == "MISMATCH":
>                 raise RuntimeError(f"Unexpected character {value!r} on
> line {line_num}")
> 
>             if kind == "ID" and value in self.C_KEYWORDS:
>                 kind = value.upper()
> 
>             # For all other tokens we keep the raw string value
>             yield Token(kind, value, line_num, column)
> 
> if __name__ == "__main__":
>     if len(sys.argv) != 2:
>         print(f"Usage: python {sys.argv[0]} <fname>")
>         sys.exit(1)
> 
>     fname = sys.argv[1]
> 
>     try:
>         with open(fname, 'r', encoding='utf-8') as file:
>             sample = file.read()
>     except FileNotFoundError:
>         print(f"Error: The file '{fname}' was not found.")
>         sys.exit(1)
>     except Exception as e:
>         print(f"An error occurred while reading the file: {str(e)}")
>         sys.exit(1)
> 
>     print(f"Tokens from {fname}:")
> 
>     for tok in CTokenizer().tokenize(sample):
>         print(f"{tok.line:3d}:{tok.column:3d}  {tok.type:12}
> {tok.value!r}")

As hobby C compiler writer, I must say that you need to implement C preprocessor first, because C preprocessor influences/changes the syntax.
In your tokenizer I see right away that any line which begins from '#' must be just as C preprocessor command without further tokenizing.
But the real pain make C preprocessor substitutions IMHO



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Intel-wired-lan] [PATCH 00/38] docs: several improvements to kernel-doc
  2026-03-03 15:12     ` Loktionov, Aleksandr
@ 2026-03-03 16:09       ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 55+ messages in thread
From: Mauro Carvalho Chehab @ 2026-03-03 16:09 UTC (permalink / raw)
  To: Loktionov, Aleksandr
  Cc: Jani Nikula, Lobakin, Aleksander, Jonathan Corbet, Kees Cook,
	Mauro Carvalho Chehab, intel-wired-lan@lists.osuosl.org,
	linux-doc@vger.kernel.org, linux-hardening@vger.kernel.org,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	Gustavo A. R. Silva, Randy Dunlap, Shuah Khan

On Tue, 3 Mar 2026 15:12:30 +0000
"Loktionov, Aleksandr" <aleksandr.loktionov@intel.com> wrote:

> > -----Original Message-----
> > From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > Sent: Tuesday, March 3, 2026 3:53 PM
> > To: Jani Nikula <jani.nikula@linux.intel.com>
> > Cc: Lobakin, Aleksander <aleksander.lobakin@intel.com>; Jonathan
> > Corbet <corbet@lwn.net>; Kees Cook <kees@kernel.org>; Mauro Carvalho
> > Chehab <mchehab@kernel.org>; intel-wired-lan@lists.osuosl.org; linux-
> > doc@vger.kernel.org; linux-hardening@vger.kernel.org; linux-
> > kernel@vger.kernel.org; netdev@vger.kernel.org; Gustavo A. R. Silva
> > <gustavoars@kernel.org>; Loktionov, Aleksandr
> > <aleksandr.loktionov@intel.com>; Randy Dunlap <rdunlap@infradead.org>;
> > Shuah Khan <skhan@linuxfoundation.org>
> > Subject: Re: [PATCH 00/38] docs: several improvements to kernel-doc
> > 
> > On Mon, 23 Feb 2026 15:47:00 +0200
> > Jani Nikula <jani.nikula@linux.intel.com> wrote:
> >   
> > > There's always the question, if you're putting a lot of effort into
> > > making kernel-doc closer to an actual C parser, why not put all that
> > > effort into using and adapting to, you know, an actual C parser?  
> > 
> > Playing with this idea, it is not that hard to write an actual C
> > parser - or at least a tokenizer. There is already an example of it
> > at:
> > 
> > 	https://docs.python.org/3/library/re.html
> > 
> > I did a quick implementation, and it seems to be able to do its job:

...

> 
> As hobby C compiler writer, I must say that you need to implement C preprocessor first, because C preprocessor influences/changes the syntax.
> In your tokenizer I see right away that any line which begins from '#' must be just as C preprocessor command without further tokenizing.

Yeah, we may need to implement C preprocessor parser in the future,
but this will require handling #include, with could be somewhat
complex. It is also tricky to handle conditional preprocessor macros,
as kernel-doc would either require a file with at least some defines
or would have to guess how to evaluate it to produce the right
documentation, as ifdefs interfere at C macros.

For now, I want to solve some specific problems:

- fix trim_private_members() function that it is meant to handle
  /* private: */ and /* public: */ comments, as it currently have
  bugs when used on nested structs/unions, related to where the
  "private" scope finishes;

- properly parse nested struct/union and properly pick nested
  identifiers;

- detect and replace function arguments when macros with multiple 
  arguments are used at the same prototype.

Plus, kernel-doc has already a table of transforms to "convert"
the C preprocessor macros that affect documentation into something
that will work.

So, I'm considering to start simple, for now ignoring cpp, addressing
the existing issues. 

> But the real pain make C preprocessor substitutions IMHO

Agreed. For now, we're using a transforms list inside kernel-doc for
such purpose. So, those macros are manually "evaluated" there, like:

	(KernRe(r'DEFINE_DMA_UNMAP_ADDR\s*\(' + struct_args_pattern + r'\)', re.S), r'dma_addr_t \1'),

This works fine on trivial cases, where the argument is just an ID,
but there are cases were we use macros like here:

    struct page_pool_params {
	struct_group_tagged(page_pool_params_fast, fast,
		unsigned int	order;
		unsigned int	pool_size;
		int		nid;
		struct device	*dev;
		struct napi_struct *napi;
		enum dma_data_direction dma_dir;
		unsigned int	max_len;
		unsigned int	offset;
	);
	struct_group_tagged(page_pool_params_slow, slow,
		struct net_device *netdev;
		unsigned int queue_idx;
		unsigned int	flags;
    /* private: used by test code only */
		void (*init_callback)(netmem_ref netmem, void *arg);
		void *init_arg;
	);
    };

To handle it, I'm thinking on using something like this(*):

	CFunction('struct_group_tagged'), r'struct \1 { \3 } \2;')

E.g. teaching kernel-doc that, when:

	struct_group_tagged(a, b, c)

is used, it should convert it into:

	struct a { c } b;

which is basically what this macro does. On other words, hardcoding
kernel-doc with some rules to handle the cases where CPP macros
need to be evaluated. As there aren't much cases where such macros affect
documentation (on lots of cases, just drop macros are enough), such
approach kinda works.

(*) I wrote already a patch for it, but as Jani pointed, perhaps
    using a tokenizer will make the logic simpler and easier to
    be understood/maintained.

-- 
Thanks,
Mauro

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 00/38] docs: several improvements to kernel-doc
  2026-03-03 14:53   ` Mauro Carvalho Chehab
  2026-03-03 15:12     ` Loktionov, Aleksandr
@ 2026-03-04  9:51     ` Jani Nikula
  1 sibling, 0 replies; 55+ messages in thread
From: Jani Nikula @ 2026-03-04  9:51 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Alexander Lobakin, Jonathan Corbet, Kees Cook,
	Mauro Carvalho Chehab, intel-wired-lan, linux-doc,
	linux-hardening, linux-kernel, netdev, Gustavo A. R. Silva,
	Aleksandr Loktionov, Randy Dunlap, Shuah Khan

On Tue, 03 Mar 2026, Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> On Mon, 23 Feb 2026 15:47:00 +0200
> Jani Nikula <jani.nikula@linux.intel.com> wrote:
>> There's always the question, if you're putting a lot of effort into
>> making kernel-doc closer to an actual C parser, why not put all that
>> effort into using and adapting to, you know, an actual C parser?
>
> Playing with this idea, it is not that hard to write an actual C
> parser - or at least a tokenizer.

Just for the record, I suggested using an existing parser, not going all
NIH and writing your own.

BR,
Jani.

-- 
Jani Nikula, Intel

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 00/38] docs: several improvements to kernel-doc
  2026-02-23 15:02   ` Jonathan Corbet
  2026-02-24 13:25     ` [Intel-wired-lan] " Mauro Carvalho Chehab
@ 2026-03-04 10:07     ` Jani Nikula
  2026-03-04 12:20       ` [Intel-wired-lan] " Mauro Carvalho Chehab
                         ` (2 more replies)
  1 sibling, 3 replies; 55+ messages in thread
From: Jani Nikula @ 2026-03-04 10:07 UTC (permalink / raw)
  To: Jonathan Corbet, Mauro Carvalho Chehab, Alexander Lobakin,
	Kees Cook, Mauro Carvalho Chehab
  Cc: Mauro Carvalho Chehab, intel-wired-lan, linux-doc,
	linux-hardening, linux-kernel, netdev, Gustavo A. R. Silva,
	Aleksandr Loktionov, Randy Dunlap, Shuah Khan

On Mon, 23 Feb 2026, Jonathan Corbet <corbet@lwn.net> wrote:
> Jani Nikula <jani.nikula@linux.intel.com> writes:
>
>> There's always the question, if you're putting a lot of effort into
>> making kernel-doc closer to an actual C parser, why not put all that
>> effort into using and adapting to, you know, an actual C parser?
>
> Not speaking to the current effort but ... in the past, when I have
> contemplated this (using, say, tree-sitter), the real problem is that
> those parsers simply strip out the comments.  Kerneldoc without comments
> ... doesn't work very well.  If there were a parser without those
> problems, and which could be made to do the right thing with all of our
> weird macro usage, it would certainly be worth considering.

I think e.g. libclang and its Python bindings can be made to work. The
main problems with that are passing proper compiler options (because
it'll need to include stuff to know about types etc. because it is a
proper parser), preprocessing everything is going to take time, you need
to invest a bunch into it to know how slow exactly compared to the
current thing and whether it's prohitive, and it introduces an extra
dependency.

So yeah, there are definitely tradeoffs there. But it's not like this
constant patching of kernel-doc is exactly burden free either. I don't
know, is it just me, but I'd like to think as a profession we'd be past
writing ad hoc C parsers by now.

BR,
Jani.

-- 
Jani Nikula, Intel

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Intel-wired-lan] [PATCH 00/38] docs: several improvements to kernel-doc
  2026-03-04 10:07     ` Jani Nikula
@ 2026-03-04 12:20       ` Mauro Carvalho Chehab
  2026-03-04 22:34       ` Jonathan Corbet
  2026-03-13 10:48       ` [Intel-wired-lan] " Mauro Carvalho Chehab
  2 siblings, 0 replies; 55+ messages in thread
From: Mauro Carvalho Chehab @ 2026-03-04 12:20 UTC (permalink / raw)
  To: Jani Nikula
  Cc: Jonathan Corbet, Alexander Lobakin, Kees Cook,
	Mauro Carvalho Chehab, intel-wired-lan, linux-doc,
	linux-hardening, linux-kernel, netdev, Gustavo A. R. Silva,
	Aleksandr Loktionov, Randy Dunlap, Shuah Khan


On Wed, 04 Mar 2026 12:07:45 +0200
Jani Nikula <jani.nikula@linux.intel.com> wrote:

> On Mon, 23 Feb 2026, Jonathan Corbet <corbet@lwn.net> wrote:
> > Jani Nikula <jani.nikula@linux.intel.com> writes:
> >  
> >> There's always the question, if you're putting a lot of effort into
> >> making kernel-doc closer to an actual C parser, why not put all that
> >> effort into using and adapting to, you know, an actual C parser?  
> >
> > Not speaking to the current effort but ... in the past, when I have
> > contemplated this (using, say, tree-sitter), the real problem is that
> > those parsers simply strip out the comments.  Kerneldoc without comments
> > ... doesn't work very well.  If there were a parser without those
> > problems, and which could be made to do the right thing with all of our
> > weird macro usage, it would certainly be worth considering.  
> 
> I think e.g. libclang and its Python bindings can be made to work. The
> main problems with that are passing proper compiler options (because
> it'll need to include stuff to know about types etc. because it is a
> proper parser), preprocessing everything is going to take time, you need
> to invest a bunch into it to know how slow exactly compared to the
> current thing and whether it's prohitive, and it introduces an extra
> dependency.

It is not just that. Assume we're parsing something like this:

	static __always_inline int _raw_read_trylock(rwlock_t *lock)
		__cond_acquires_shared(true, lock);


using a cpp (or libclang). We would need to define/undefine 3 symbols:

	#if defined(WARN_CONTEXT_ANALYSIS) && !defined(__CHECKER__) && !defined(__GENKSYMS__)

(in this particular case, the default is OK, but on others, it may not
be)

This is by far more complex than just writing a logic that would
convert the above into:

	static int _raw_read_trylock(rwlock_t *lock);

which is the current kernel-doc approach.

-

Using a C preprocessor, we might have a very big prototype - and even have
arch-specific defines affecting it, as some includes may be inside 
arch/*/include.

So, we would need a kernel-doc ".config" file with a set of defines
that can be hard to maintain.

> So yeah, there are definitely tradeoffs there. But it's not like this
> constant patching of kernel-doc is exactly burden free either. I don't
> know, is it just me, but I'd like to think as a profession we'd be past
> writing ad hoc C parsers by now.

I'd say that the binding logic and the ".config" kernel-doc defines will
be complex to maintain. Maybe more complex than kernel-doc patching and
a simple C parser, like the one on my test.

> > On Mon, 23 Feb 2026 15:47:00 +0200
> > Jani Nikula <jani.nikula@linux.intel.com> wrote:  
> >> There's always the question, if you're putting a lot of effort into
> >> making kernel-doc closer to an actual C parser, why not put all that
> >> effort into using and adapting to, you know, an actual C parser?  
> >
> > Playing with this idea, it is not that hard to write an actual C
> > parser - or at least a tokenizer.  
> 
> Just for the record, I suggested using an existing parser, not going all
> NIH and writing your own.

I know, but I suspect that a simple tokenizer similar to my example might
do the job without any major impact, but yeah, tests are needed.


-- 
Thanks,
Mauro

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 00/38] docs: several improvements to kernel-doc
  2026-03-04 10:07     ` Jani Nikula
  2026-03-04 12:20       ` [Intel-wired-lan] " Mauro Carvalho Chehab
@ 2026-03-04 22:34       ` Jonathan Corbet
  2026-03-13 10:48       ` [Intel-wired-lan] " Mauro Carvalho Chehab
  2 siblings, 0 replies; 55+ messages in thread
From: Jonathan Corbet @ 2026-03-04 22:34 UTC (permalink / raw)
  To: Jani Nikula, Mauro Carvalho Chehab, Alexander Lobakin, Kees Cook,
	Mauro Carvalho Chehab
  Cc: Mauro Carvalho Chehab, intel-wired-lan, linux-doc,
	linux-hardening, linux-kernel, netdev, Gustavo A. R. Silva,
	Aleksandr Loktionov, Randy Dunlap, Shuah Khan

Jani Nikula <jani.nikula@linux.intel.com> writes:

> So yeah, there are definitely tradeoffs there. But it's not like this
> constant patching of kernel-doc is exactly burden free either. I don't
> know, is it just me, but I'd like to think as a profession we'd be past
> writing ad hoc C parsers by now.

I don't think that having a "real" parser is going to free us from the
need to patch kernel-doc.  The kernel uses a weird form of C, and
kernel-doc is expected to evolve as our dialect of the language does.
It *might* make that patching job easier -- that is to be seen -- but it
won't make it go away.

Thanks,

jon

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [Intel-wired-lan] [PATCH 00/38] docs: several improvements to kernel-doc
  2026-03-04 10:07     ` Jani Nikula
  2026-03-04 12:20       ` [Intel-wired-lan] " Mauro Carvalho Chehab
  2026-03-04 22:34       ` Jonathan Corbet
@ 2026-03-13 10:48       ` Mauro Carvalho Chehab
  2 siblings, 0 replies; 55+ messages in thread
From: Mauro Carvalho Chehab @ 2026-03-13 10:48 UTC (permalink / raw)
  To: Jani Nikula
  Cc: Jonathan Corbet, Alexander Lobakin, Kees Cook,
	Mauro Carvalho Chehab, intel-wired-lan, linux-doc,
	linux-hardening, linux-kernel, netdev, Gustavo A. R. Silva,
	Aleksandr Loktionov, Randy Dunlap, Shuah Khan

On Wed, 04 Mar 2026 12:07:45 +0200
Jani Nikula <jani.nikula@linux.intel.com> wrote:

> On Mon, 23 Feb 2026, Jonathan Corbet <corbet@lwn.net> wrote:
> > Jani Nikula <jani.nikula@linux.intel.com> writes:
> >  
> >> There's always the question, if you're putting a lot of effort into
> >> making kernel-doc closer to an actual C parser, why not put all that
> >> effort into using and adapting to, you know, an actual C parser?  
> >
> > Not speaking to the current effort but ... in the past, when I have
> > contemplated this (using, say, tree-sitter), the real problem is that
> > those parsers simply strip out the comments.  Kerneldoc without comments
> > ... doesn't work very well.  If there were a parser without those
> > problems, and which could be made to do the right thing with all of our
> > weird macro usage, it would certainly be worth considering.  
> 
> I think e.g. libclang and its Python bindings can be made to work. The
> main problems with that are passing proper compiler options (because
> it'll need to include stuff to know about types etc. because it is a
> proper parser), preprocessing everything is going to take time, you need
> to invest a bunch into it to know how slow exactly compared to the
> current thing and whether it's prohitive, and it introduces an extra
> dependency.
> 
> So yeah, there are definitely tradeoffs there. But it's not like this
> constant patching of kernel-doc is exactly burden free either. 

On my tests with a simple C tokenizer:

	https://lore.kernel.org/linux-doc/cover.1773326442.git.mchehab+huawei@kernel.org/

The tokenizer is working fine and didn't make it much slow: it
increases the time to pass the entire Kernel tree from 37s to 47s
for man pages generation, but should not change much the time for
htmldocs, as right now only ~4 seconds is needed to read files
pointed by Documentation kernel-doc tags and parse them.

The code can still be cleaned up, as there are still some things
hardcoded on the various dump_* functions that could be better
implemented (*).

The advantage of the approach I'm using is that it allows to
gradually migrate to rely at the tokenized code, as it can be done
incrementally.

(*) for instance, __attribute__ and a couple of other macros are parsed
    twice at dump_struct() logic, on different places.

> I don't
> know, is it just me, but I'd like to think as a profession we'd be past
> writing ad hoc C parsers by now.

Probably not, but I don't think we need a C parser, as kernel-doc
just needs to understand data types (enum, struct, typedef, union,
vars) and function/macro prototypes.

For such purpose, a tokenizer sounds enough.

Now, there is the code that it is now inside:
	https://github.com/mchehab/linux/blob/tokenizer-v5/tools/lib/python/kdoc/xforms_lists.py

which contains a list of C/gcc/clang keywords that will
be ignored, like:

	__attribute__
	static
	extern
	inline

Together with a sanitized version of the kernel macros it needs
to handle or ignore:

	DECLARE_BITMAP
	DECLARE_HASHTABLE
 	__acquires
	__init
	__exit
	struct_group
	...

Once we finish cleaning up kdoc_parser.py to rely only
on it for prototype transformations, this will be the only file
that will require changes when more macros start affecting 
kernel-doc.

As this is complex, and may require manual adjustments, it
is probably better to not try to auto-generate xforms list
in runtime. A better approach is, IMO, to have a C pre-processor
code to help periodically update it, like using a target like:

	make kdoc-xforms

that would use either cpp or clang to generate a patch to
update xforms_list content after adding new macros that
affect docs generation.

-- 
Thanks,
Mauro

^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2026-03-13 10:48 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-18 10:12 [PATCH 00/38] docs: several improvements to kernel-doc Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 01/38] docs: kdoc_re: add support for groups() Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 02/38] docs: kdoc_re: don't go past the end of a line Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 03/38] docs: kdoc_parser: move var transformers to the beginning Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 04/38] docs: kdoc_parser: don't mangle with function defines Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 05/38] docs: kdoc_parser: add functions support for NestedMatch Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 06/38] docs: kdoc_parser: use NestedMatch to handle __attribute__ on functions Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 07/38] docs: kdoc_parser: fix variable regexes to work with size_t Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 08/38] docs: kdoc_parser: fix the default_value logic for variables Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 09/38] docs: kdoc_parser: add some debug for variable parsing Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 10/38] docs: kdoc_parser: don't exclude defaults from prototype Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 11/38] docs: kdoc_parser: fix parser to support multi-word types Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 12/38] docs: kdoc_parser: ignore context analysis and lock attributes Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 13/38] docs: kdoc_parser: add support for LIST_HEAD Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 14/38] docs: kdoc_parser: handle struct member macro VIRTIO_DECLARE_FEATURES(name) Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 15/38] docs: kdoc_re: properly handle strings and escape chars on it Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 16/38] docs: kdoc_re: better show KernRe() at documentation Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 17/38] docs: kdoc_re: don't recompile NestedMatch regex every time Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 18/38] docs: kdoc_re: Change NestedMath args replacement to \0 Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 19/38] docs: kdoc_re: make NestedMatch use KernRe Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 20/38] docs: kdoc_re: add support on NestedMatch for argument replacement Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 21/38] docs: kdoc_parser: better handle struct_group macros Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 22/38] docs: kdoc_re: fix a parse bug on struct page_pool_params Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 23/38] docs: kdoc_re: add a helper class to declare C function matches Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 24/38] docs: kdoc_parser: use the new CFunction class Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 25/38] docs: kdoc_parser: minimize differences with struct_group_tagged Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 26/38] docs: kdoc_parser: move transform lists to a separate file Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 27/38] docs: kdoc_re: don't remove the trailing ";" with NestedMatch Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 28/38] docs: kdoc_re: prevent adding whitespaces on sub replacements Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 29/38] docs: xforms_lists.py: use CFuntion to handle all function macros Mauro Carvalho Chehab
2026-02-18 10:13 ` [PATCH 30/38] docs: kdoc_files: allows the caller to use a different xforms class Mauro Carvalho Chehab
2026-02-18 10:13 ` [PATCH 31/38] docs: kdoc_re: Fix NestedMatch.sub() which causes PDF builds to break Mauro Carvalho Chehab
2026-02-18 10:13 ` [PATCH 32/38] docs: kdoc_files: document KernelFiles() ABI Mauro Carvalho Chehab
2026-02-18 10:13 ` [PATCH 33/38] docs: kdoc_output: add optional args to ManOutput class Mauro Carvalho Chehab
2026-02-18 10:13 ` [PATCH 34/38] docs: sphinx-build-wrapper: better handle troff .TH markups Mauro Carvalho Chehab
2026-02-18 10:13 ` [PATCH 35/38] docs: kdoc_output: use a more standard order for .TH on man pages Mauro Carvalho Chehab
2026-02-18 10:13 ` [PATCH 36/38] docs: sphinx-build-wrapper: don't allow "/" on file names Mauro Carvalho Chehab
2026-02-18 10:13 ` [PATCH 37/38] docs: kdoc_output: describe the class init parameters Mauro Carvalho Chehab
2026-02-18 10:13 ` [PATCH 38/38] docs: kdoc_output: pick a better default for modulename Mauro Carvalho Chehab
2026-02-21  1:24 ` [PATCH 00/38] docs: several improvements to kernel-doc Randy Dunlap
2026-02-22  1:24   ` Randy Dunlap
2026-02-23 13:47 ` Jani Nikula
2026-02-23 15:02   ` Jonathan Corbet
2026-02-24 13:25     ` [Intel-wired-lan] " Mauro Carvalho Chehab
2026-03-04 10:07     ` Jani Nikula
2026-03-04 12:20       ` [Intel-wired-lan] " Mauro Carvalho Chehab
2026-03-04 22:34       ` Jonathan Corbet
2026-03-13 10:48       ` [Intel-wired-lan] " Mauro Carvalho Chehab
2026-03-03 14:53   ` Mauro Carvalho Chehab
2026-03-03 15:12     ` Loktionov, Aleksandr
2026-03-03 16:09       ` [Intel-wired-lan] " Mauro Carvalho Chehab
2026-03-04  9:51     ` Jani Nikula
2026-02-23 21:58 ` Jonathan Corbet
2026-03-02 15:54   ` [Intel-wired-lan] " Mauro Carvalho Chehab
2026-03-02 16:14     ` Jonathan Corbet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox