[PATCH 00/12] [PATCH 00/11] Thrash up the parser/output interface

linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 00/12] [PATCH 00/11] Thrash up the parser/output interface
@ 2025-07-02 22:35 Jonathan Corbet
  2025-07-02 22:35 ` [PATCH 01/12] docs: kdoc; Add a rudimentary class to represent output items Jonathan Corbet
                   ` (13 more replies)
  0 siblings, 14 replies; 41+ messages in thread
From: Jonathan Corbet @ 2025-07-02 22:35 UTC (permalink / raw)
  To: linux-doc
  Cc: linux-kernel, Mauro Carvalho Chehab, Akira Yokosawa,
	Jonathan Corbet

[I'll slow down soon, honest - real work is piling up...]

The kerneldoc parsing phase gathers all of the information about the
declarations of interest, then passes it through to the output phase as a
dict that is an unstructured blob of information; this organization has its
origins in the Perl version of the program.  It results in an interface
that is difficult to reason about, dozen-parameter function calls, and
other ills.

Introduce a new class (KdocItem) to carry this information between the
parser and the output modules, and, step by step, modify the system to use
this class in a more structured way.  This could be taken further by
creating a subclass of KdocItem for each declaration type (function,
struct, ...), but that is probably more structure than we need.

As a final step, add some structure for the accumulation of the output
text.

The result is (I hope) clearer code, the removal of a bunch of boilerplate,
and no changes to the generated output.

Jonathan Corbet (12):
  docs: kdoc; Add a rudimentary class to represent output items
  docs: kdoc: simplify the output-item passing
  docs: kdoc: drop "sectionlist"
  docs: kdoc: Centralize handling of the item section list
  docs: kdoc: remove the "struct_actual" machinery
  docs: kdoc: use self.entry.parameterlist directly in check_sections()
  docs: kdoc: Coalesce parameter-list handling
  docs: kdoc: Regularize the use of the declaration name
  docs: kdoc: straighten up dump_declaration()
  docs: kdoc: directly access the always-there KdocItem fields
  docs: kdoc: clean up check_sections()
  docs: kdoc: Improve the output text accumulation

 scripts/lib/kdoc/kdoc_files.py  |   4 +-
 scripts/lib/kdoc/kdoc_item.py   |  39 ++++
 scripts/lib/kdoc/kdoc_output.py | 331 ++++++++++++++------------------
 scripts/lib/kdoc/kdoc_parser.py | 162 +++-------------
 4 files changed, 218 insertions(+), 318 deletions(-)
 create mode 100644 scripts/lib/kdoc/kdoc_item.py

-- 
2.49.0

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 01/12] docs: kdoc; Add a rudimentary class to represent output items
  2025-07-02 22:35 [PATCH 00/12] [PATCH 00/11] Thrash up the parser/output interface Jonathan Corbet
@ 2025-07-02 22:35 ` Jonathan Corbet
  2025-07-10  5:28   ` Mauro Carvalho Chehab
  2025-07-02 22:35 ` [PATCH 02/12] docs: kdoc: simplify the output-item passing Jonathan Corbet
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 41+ messages in thread
From: Jonathan Corbet @ 2025-07-02 22:35 UTC (permalink / raw)
  To: linux-doc
  Cc: linux-kernel, Mauro Carvalho Chehab, Akira Yokosawa,
	Jonathan Corbet

This class is intended to replace the unstructured dict used to accumulate
an entry to pass to an output module.  For now, it remains unstructured,
but it works well enough that the output classes don't notice the
difference.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 scripts/lib/kdoc/kdoc_item.py   | 26 ++++++++++++++++++++++++++
 scripts/lib/kdoc/kdoc_parser.py | 30 +++++++++---------------------
 2 files changed, 35 insertions(+), 21 deletions(-)
 create mode 100644 scripts/lib/kdoc/kdoc_item.py

diff --git a/scripts/lib/kdoc/kdoc_item.py b/scripts/lib/kdoc/kdoc_item.py
new file mode 100644
index 000000000000..add2cc772fec
--- /dev/null
+++ b/scripts/lib/kdoc/kdoc_item.py
@@ -0,0 +1,26 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# A class that will, eventually, encapsulate all of the parsed data that we
+# then pass into the output modules.
+#
+
+class KdocItem:
+    def __init__(self, name, type, start_line, **other_stuff):
+        self.name = name
+        self.type = type
+        self.declaration_start_line = start_line
+        #
+        # Just save everything else into our own dict so that the output
+        # side can grab it directly as before.  As we move things into more
+        # structured data, this will, hopefully, fade away.
+        #
+        self.other_stuff = other_stuff
+
+    def get(self, key, default = None):
+        ret = self.other_stuff.get(key, default)
+        if ret == default:
+            return self.__dict__.get(key, default)
+        return ret
+
+    def __getitem__(self, key):
+        return self.get(key)
diff --git a/scripts/lib/kdoc/kdoc_parser.py b/scripts/lib/kdoc/kdoc_parser.py
index 831f061f61b8..a5a59b97a444 100644
--- a/scripts/lib/kdoc/kdoc_parser.py
+++ b/scripts/lib/kdoc/kdoc_parser.py
@@ -16,7 +16,7 @@ import re
 from pprint import pformat
 
 from kdoc_re import NestedMatch, KernRe
-
+from kdoc_item import KdocItem
 
 #
 # Regular expressions used to parse kernel-doc markups at KernelDoc class.
@@ -271,32 +271,20 @@ class KernelDoc:
         The actual output and output filters will be handled elsewhere
         """
 
-        # The implementation here is different than the original kernel-doc:
-        # instead of checking for output filters or actually output anything,
-        # it just stores the declaration content at self.entries, as the
-        # output will happen on a separate class.
-        #
-        # For now, we're keeping the same name of the function just to make
-        # easier to compare the source code of both scripts
-
-        args["declaration_start_line"] = self.entry.declaration_start_line
-        args["type"] = dtype
-        args["warnings"] = self.entry.warnings
-
-        # TODO: use colletions.OrderedDict to remove sectionlist
+        item = KdocItem(name, dtype, self.entry.declaration_start_line, **args)
+        item.warnings = self.entry.warnings
 
-        sections = args.get('sections', {})
-        sectionlist = args.get('sectionlist', [])
+        sections = item.get('sections', {})
+        sectionlist = item.get('sectionlist', [])
 
         # Drop empty sections
         # TODO: improve empty sections logic to emit warnings
         for section in ["Description", "Return"]:
-            if section in sectionlist:
-                if not sections[section].rstrip():
-                    del sections[section]
-                    sectionlist.remove(section)
+            if section in sectionlist and not sections[section].rstrip():
+                del sections[section]
+                sectionlist.remove(section)
 
-        self.entries.append((name, args))
+        self.entries.append((name, item))
 
         self.config.log.debug("Output: %s:%s = %s", dtype, name, pformat(args))
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 02/12] docs: kdoc: simplify the output-item passing
  2025-07-02 22:35 [PATCH 00/12] [PATCH 00/11] Thrash up the parser/output interface Jonathan Corbet
  2025-07-02 22:35 ` [PATCH 01/12] docs: kdoc; Add a rudimentary class to represent output items Jonathan Corbet
@ 2025-07-02 22:35 ` Jonathan Corbet
  2025-07-10  5:29   ` Mauro Carvalho Chehab
  2025-07-02 22:35 ` [PATCH 03/12] docs: kdoc: drop "sectionlist" Jonathan Corbet
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 41+ messages in thread
From: Jonathan Corbet @ 2025-07-02 22:35 UTC (permalink / raw)
  To: linux-doc
  Cc: linux-kernel, Mauro Carvalho Chehab, Akira Yokosawa,
	Jonathan Corbet

Since our output items contain their name, we don't need to pass it
separately.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 scripts/lib/kdoc/kdoc_files.py  | 4 ++--
 scripts/lib/kdoc/kdoc_parser.py | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/scripts/lib/kdoc/kdoc_files.py b/scripts/lib/kdoc/kdoc_files.py
index 9be4a64df71d..9e09b45b02fa 100644
--- a/scripts/lib/kdoc/kdoc_files.py
+++ b/scripts/lib/kdoc/kdoc_files.py
@@ -275,8 +275,8 @@ class KernelFiles():
                 self.config.log.warning("No kernel-doc for file %s", fname)
                 continue
 
-            for name, arg in self.results[fname]:
-                m = self.out_msg(fname, name, arg)
+            for arg in self.results[fname]:
+                m = self.out_msg(fname, arg.name, arg)
 
                 if m is None:
                     ln = arg.get("ln", 0)
diff --git a/scripts/lib/kdoc/kdoc_parser.py b/scripts/lib/kdoc/kdoc_parser.py
index a5a59b97a444..97380ff30a0d 100644
--- a/scripts/lib/kdoc/kdoc_parser.py
+++ b/scripts/lib/kdoc/kdoc_parser.py
@@ -284,7 +284,7 @@ class KernelDoc:
                 del sections[section]
                 sectionlist.remove(section)
 
-        self.entries.append((name, item))
+        self.entries.append(item)
 
         self.config.log.debug("Output: %s:%s = %s", dtype, name, pformat(args))
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 03/12] docs: kdoc: drop "sectionlist"
  2025-07-02 22:35 [PATCH 00/12] [PATCH 00/11] Thrash up the parser/output interface Jonathan Corbet
  2025-07-02 22:35 ` [PATCH 01/12] docs: kdoc; Add a rudimentary class to represent output items Jonathan Corbet
  2025-07-02 22:35 ` [PATCH 02/12] docs: kdoc: simplify the output-item passing Jonathan Corbet
@ 2025-07-02 22:35 ` Jonathan Corbet
  2025-07-09 16:27   ` Mauro Carvalho Chehab
  2025-07-02 22:35 ` [PATCH 04/12] docs: kdoc: Centralize handling of the item section list Jonathan Corbet
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 41+ messages in thread
From: Jonathan Corbet @ 2025-07-02 22:35 UTC (permalink / raw)
  To: linux-doc
  Cc: linux-kernel, Mauro Carvalho Chehab, Akira Yokosawa,
	Jonathan Corbet

Python dicts (as of 3.7) are guaranteed to remember the insertion order of
items, so we do not need a separate list for that purpose.  Drop the
per-entry sectionlist variable and just rely on native dict ordering.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 scripts/lib/kdoc/kdoc_output.py | 18 ++++++------------
 scripts/lib/kdoc/kdoc_parser.py | 13 +------------
 2 files changed, 7 insertions(+), 24 deletions(-)

diff --git a/scripts/lib/kdoc/kdoc_output.py b/scripts/lib/kdoc/kdoc_output.py
index 86102e628d91..4895c80e4b81 100644
--- a/scripts/lib/kdoc/kdoc_output.py
+++ b/scripts/lib/kdoc/kdoc_output.py
@@ -339,11 +339,10 @@ class RestFormat(OutputFormat):
         tends to duplicate a header already in the template file.
         """
 
-        sectionlist = args.get('sectionlist', [])
         sections = args.get('sections', {})
         section_start_lines = args.get('section_start_lines', {})
 
-        for section in sectionlist:
+        for section in sections:
             # Skip sections that are in the nosymbol_table
             if section in self.nosymbol:
                 continue
@@ -636,7 +635,6 @@ class ManFormat(OutputFormat):
                 self.data += line + "\n"
 
     def out_doc(self, fname, name, args):
-        sectionlist = args.get('sectionlist', [])
         sections = args.get('sections', {})
 
         if not self.check_doc(name, args):
@@ -644,7 +642,7 @@ class ManFormat(OutputFormat):
 
         self.data += f'.TH "{self.modulename}" 9 "{self.modulename}" "{self.man_date}" "API Manual" LINUX' + "\n"
 
-        for section in sectionlist:
+        for section in sections:
             self.data += f'.SH "{section}"' + "\n"
             self.output_highlight(sections.get(section))
 
@@ -653,7 +651,6 @@ class ManFormat(OutputFormat):
 
         parameterlist = args.get('parameterlist', [])
         parameterdescs = args.get('parameterdescs', {})
-        sectionlist = args.get('sectionlist', [])
         sections = args.get('sections', {})
 
         self.data += f'.TH "{args["function"]}" 9 "{args["function"]}" "{self.man_date}" "Kernel Hacker\'s Manual" LINUX' + "\n"
@@ -695,7 +692,7 @@ class ManFormat(OutputFormat):
             self.data += f'.IP "{parameter}" 12' + "\n"
             self.output_highlight(parameterdescs.get(parameter_name, ""))
 
-        for section in sectionlist:
+        for section in sections:
             self.data += f'.SH "{section.upper()}"' + "\n"
             self.output_highlight(sections[section])
 
@@ -703,7 +700,6 @@ class ManFormat(OutputFormat):
 
         name = args.get('enum', '')
         parameterlist = args.get('parameterlist', [])
-        sectionlist = args.get('sectionlist', [])
         sections = args.get('sections', {})
 
         self.data += f'.TH "{self.modulename}" 9 "enum {args["enum"]}" "{self.man_date}" "API Manual" LINUX' + "\n"
@@ -731,7 +727,7 @@ class ManFormat(OutputFormat):
             self.data += f'.IP "{parameter}" 12' + "\n"
             self.output_highlight(args['parameterdescs'].get(parameter_name, ""))
 
-        for section in sectionlist:
+        for section in sections:
             self.data += f'.SH "{section}"' + "\n"
             self.output_highlight(sections[section])
 
@@ -739,7 +735,6 @@ class ManFormat(OutputFormat):
         module = self.modulename
         typedef = args.get('typedef')
         purpose = args.get('purpose')
-        sectionlist = args.get('sectionlist', [])
         sections = args.get('sections', {})
 
         self.data += f'.TH "{module}" 9 "{typedef}" "{self.man_date}" "API Manual" LINUX' + "\n"
@@ -747,7 +742,7 @@ class ManFormat(OutputFormat):
         self.data += ".SH NAME\n"
         self.data += f"typedef {typedef} \\- {purpose}\n"
 
-        for section in sectionlist:
+        for section in sections:
             self.data += f'.SH "{section}"' + "\n"
             self.output_highlight(sections.get(section))
 
@@ -757,7 +752,6 @@ class ManFormat(OutputFormat):
         struct_name = args.get('struct')
         purpose = args.get('purpose')
         definition = args.get('definition')
-        sectionlist = args.get('sectionlist', [])
         parameterlist = args.get('parameterlist', [])
         sections = args.get('sections', {})
         parameterdescs = args.get('parameterdescs', {})
@@ -788,6 +782,6 @@ class ManFormat(OutputFormat):
             self.data += f'.IP "{parameter}" 12' + "\n"
             self.output_highlight(parameterdescs.get(parameter_name))
 
-        for section in sectionlist:
+        for section in sections:
             self.data += f'.SH "{section}"' + "\n"
             self.output_highlight(sections.get(section))
diff --git a/scripts/lib/kdoc/kdoc_parser.py b/scripts/lib/kdoc/kdoc_parser.py
index 97380ff30a0d..2e00c8b3a5f2 100644
--- a/scripts/lib/kdoc/kdoc_parser.py
+++ b/scripts/lib/kdoc/kdoc_parser.py
@@ -127,7 +127,6 @@ class KernelEntry:
         self.parameterdesc_start_lines = {}
 
         self.section_start_lines = {}
-        self.sectionlist = []
         self.sections = {}
 
         self.anon_struct_union = False
@@ -202,7 +201,6 @@ class KernelEntry:
                 self.sections[name] += '\n' + contents
             else:
                 self.sections[name] = contents
-                self.sectionlist.append(name)
                 self.section_start_lines[name] = self.new_start_line
                 self.new_start_line = 0
 
@@ -275,14 +273,12 @@ class KernelDoc:
         item.warnings = self.entry.warnings
 
         sections = item.get('sections', {})
-        sectionlist = item.get('sectionlist', [])
 
         # Drop empty sections
         # TODO: improve empty sections logic to emit warnings
         for section in ["Description", "Return"]:
-            if section in sectionlist and not sections[section].rstrip():
+            if section in sections and not sections[section].rstrip():
                 del sections[section]
-                sectionlist.remove(section)
 
         self.entries.append(item)
 
@@ -828,7 +824,6 @@ class KernelDoc:
                                 parameterdescs=self.entry.parameterdescs,
                                 parametertypes=self.entry.parametertypes,
                                 parameterdesc_start_lines=self.entry.parameterdesc_start_lines,
-                                sectionlist=self.entry.sectionlist,
                                 sections=self.entry.sections,
                                 section_start_lines=self.entry.section_start_lines,
                                 purpose=self.entry.declaration_purpose)
@@ -913,7 +908,6 @@ class KernelDoc:
                                 parameterlist=self.entry.parameterlist,
                                 parameterdescs=self.entry.parameterdescs,
                                 parameterdesc_start_lines=self.entry.parameterdesc_start_lines,
-                                sectionlist=self.entry.sectionlist,
                                 sections=self.entry.sections,
                                 section_start_lines=self.entry.section_start_lines,
                                 purpose=self.entry.declaration_purpose)
@@ -1085,7 +1079,6 @@ class KernelDoc:
                                     parameterdescs=self.entry.parameterdescs,
                                     parametertypes=self.entry.parametertypes,
                                     parameterdesc_start_lines=self.entry.parameterdesc_start_lines,
-                                    sectionlist=self.entry.sectionlist,
                                     sections=self.entry.sections,
                                     section_start_lines=self.entry.section_start_lines,
                                     purpose=self.entry.declaration_purpose,
@@ -1099,7 +1092,6 @@ class KernelDoc:
                                     parameterdescs=self.entry.parameterdescs,
                                     parametertypes=self.entry.parametertypes,
                                     parameterdesc_start_lines=self.entry.parameterdesc_start_lines,
-                                    sectionlist=self.entry.sectionlist,
                                     sections=self.entry.sections,
                                     section_start_lines=self.entry.section_start_lines,
                                     purpose=self.entry.declaration_purpose,
@@ -1145,7 +1137,6 @@ class KernelDoc:
                                     parameterdescs=self.entry.parameterdescs,
                                     parametertypes=self.entry.parametertypes,
                                     parameterdesc_start_lines=self.entry.parameterdesc_start_lines,
-                                    sectionlist=self.entry.sectionlist,
                                     sections=self.entry.sections,
                                     section_start_lines=self.entry.section_start_lines,
                                     purpose=self.entry.declaration_purpose)
@@ -1168,7 +1159,6 @@ class KernelDoc:
 
             self.output_declaration('typedef', declaration_name,
                                     typedef=declaration_name,
-                                    sectionlist=self.entry.sectionlist,
                                     sections=self.entry.sections,
                                     section_start_lines=self.entry.section_start_lines,
                                     purpose=self.entry.declaration_purpose)
@@ -1653,7 +1643,6 @@ class KernelDoc:
         if doc_end.search(line):
             self.dump_section()
             self.output_declaration("doc", self.entry.identifier,
-                                    sectionlist=self.entry.sectionlist,
                                     sections=self.entry.sections,
                                     section_start_lines=self.entry.section_start_lines)
             self.reset_state(ln)
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 04/12] docs: kdoc: Centralize handling of the item section list
  2025-07-02 22:35 [PATCH 00/12] [PATCH 00/11] Thrash up the parser/output interface Jonathan Corbet
                   ` (2 preceding siblings ...)
  2025-07-02 22:35 ` [PATCH 03/12] docs: kdoc: drop "sectionlist" Jonathan Corbet
@ 2025-07-02 22:35 ` Jonathan Corbet
  2025-07-10  5:45   ` Mauro Carvalho Chehab
  2025-07-02 22:35 ` [PATCH 05/12] docs: kdoc: remove the "struct_actual" machinery Jonathan Corbet
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 41+ messages in thread
From: Jonathan Corbet @ 2025-07-02 22:35 UTC (permalink / raw)
  To: linux-doc
  Cc: linux-kernel, Mauro Carvalho Chehab, Akira Yokosawa,
	Jonathan Corbet

The section list always comes directly from the under-construction entry
and is used uniformly.  Formalize section handling in the KdocItem class,
and have output_declaration() load the sections directly from the entry,
eliminating a lot of duplicated, verbose parameters.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 scripts/lib/kdoc/kdoc_item.py   |  8 ++++++++
 scripts/lib/kdoc/kdoc_output.py | 36 ++++++++++++---------------------
 scripts/lib/kdoc/kdoc_parser.py | 20 +++---------------
 3 files changed, 24 insertions(+), 40 deletions(-)

diff --git a/scripts/lib/kdoc/kdoc_item.py b/scripts/lib/kdoc/kdoc_item.py
index add2cc772fec..c8329019a219 100644
--- a/scripts/lib/kdoc/kdoc_item.py
+++ b/scripts/lib/kdoc/kdoc_item.py
@@ -9,6 +9,7 @@ class KdocItem:
         self.name = name
         self.type = type
         self.declaration_start_line = start_line
+        self.sections = self.sections_start_lines = { }
         #
         # Just save everything else into our own dict so that the output
         # side can grab it directly as before.  As we move things into more
@@ -24,3 +25,10 @@ class KdocItem:
 
     def __getitem__(self, key):
         return self.get(key)
+
+    #
+    # Tracking of section information.
+    #
+    def set_sections(self, sections, start_lines):
+        self.sections = sections
+        self.section_start_lines = start_lines
diff --git a/scripts/lib/kdoc/kdoc_output.py b/scripts/lib/kdoc/kdoc_output.py
index 4895c80e4b81..15cb89f91987 100644
--- a/scripts/lib/kdoc/kdoc_output.py
+++ b/scripts/lib/kdoc/kdoc_output.py
@@ -338,11 +338,7 @@ class RestFormat(OutputFormat):
         starts by putting out the name of the doc section itself, but that
         tends to duplicate a header already in the template file.
         """
-
-        sections = args.get('sections', {})
-        section_start_lines = args.get('section_start_lines', {})
-
-        for section in sections:
+        for section, text in args.sections.items():
             # Skip sections that are in the nosymbol_table
             if section in self.nosymbol:
                 continue
@@ -354,8 +350,8 @@ class RestFormat(OutputFormat):
             else:
                 self.data += f'{self.lineprefix}**{section}**\n\n'
 
-            self.print_lineno(section_start_lines.get(section, 0))
-            self.output_highlight(sections[section])
+            self.print_lineno(args.section_start_lines.get(section, 0))
+            self.output_highlight(text)
             self.data += "\n"
         self.data += "\n"
 
@@ -635,23 +631,20 @@ class ManFormat(OutputFormat):
                 self.data += line + "\n"
 
     def out_doc(self, fname, name, args):
-        sections = args.get('sections', {})
-
         if not self.check_doc(name, args):
             return
 
         self.data += f'.TH "{self.modulename}" 9 "{self.modulename}" "{self.man_date}" "API Manual" LINUX' + "\n"
 
-        for section in sections:
+        for section, text in args.sections.items():
             self.data += f'.SH "{section}"' + "\n"
-            self.output_highlight(sections.get(section))
+            self.output_highlight(text)
 
     def out_function(self, fname, name, args):
         """output function in man"""
 
         parameterlist = args.get('parameterlist', [])
         parameterdescs = args.get('parameterdescs', {})
-        sections = args.get('sections', {})
 
         self.data += f'.TH "{args["function"]}" 9 "{args["function"]}" "{self.man_date}" "Kernel Hacker\'s Manual" LINUX' + "\n"
 
@@ -692,15 +685,14 @@ class ManFormat(OutputFormat):
             self.data += f'.IP "{parameter}" 12' + "\n"
             self.output_highlight(parameterdescs.get(parameter_name, ""))
 
-        for section in sections:
+        for section, text in args.sections.items():
             self.data += f'.SH "{section.upper()}"' + "\n"
-            self.output_highlight(sections[section])
+            self.output_highlight(text)
 
     def out_enum(self, fname, name, args):
 
         name = args.get('enum', '')
         parameterlist = args.get('parameterlist', [])
-        sections = args.get('sections', {})
 
         self.data += f'.TH "{self.modulename}" 9 "enum {args["enum"]}" "{self.man_date}" "API Manual" LINUX' + "\n"
 
@@ -727,24 +719,23 @@ class ManFormat(OutputFormat):
             self.data += f'.IP "{parameter}" 12' + "\n"
             self.output_highlight(args['parameterdescs'].get(parameter_name, ""))
 
-        for section in sections:
+        for section, text in args.sections.items():
             self.data += f'.SH "{section}"' + "\n"
-            self.output_highlight(sections[section])
+            self.output_highlight(text)
 
     def out_typedef(self, fname, name, args):
         module = self.modulename
         typedef = args.get('typedef')
         purpose = args.get('purpose')
-        sections = args.get('sections', {})
 
         self.data += f'.TH "{module}" 9 "{typedef}" "{self.man_date}" "API Manual" LINUX' + "\n"
 
         self.data += ".SH NAME\n"
         self.data += f"typedef {typedef} \\- {purpose}\n"
 
-        for section in sections:
+        for section, text in args.sections.items():
             self.data += f'.SH "{section}"' + "\n"
-            self.output_highlight(sections.get(section))
+            self.output_highlight(text)
 
     def out_struct(self, fname, name, args):
         module = self.modulename
@@ -753,7 +744,6 @@ class ManFormat(OutputFormat):
         purpose = args.get('purpose')
         definition = args.get('definition')
         parameterlist = args.get('parameterlist', [])
-        sections = args.get('sections', {})
         parameterdescs = args.get('parameterdescs', {})
 
         self.data += f'.TH "{module}" 9 "{struct_type} {struct_name}" "{self.man_date}" "API Manual" LINUX' + "\n"
@@ -782,6 +772,6 @@ class ManFormat(OutputFormat):
             self.data += f'.IP "{parameter}" 12' + "\n"
             self.output_highlight(parameterdescs.get(parameter_name))
 
-        for section in sections:
+        for section, text in args.sections.items():
             self.data += f'.SH "{section}"' + "\n"
-            self.output_highlight(sections.get(section))
+            self.output_highlight(text)
diff --git a/scripts/lib/kdoc/kdoc_parser.py b/scripts/lib/kdoc/kdoc_parser.py
index 2e00c8b3a5f2..608f3a1045dc 100644
--- a/scripts/lib/kdoc/kdoc_parser.py
+++ b/scripts/lib/kdoc/kdoc_parser.py
@@ -272,13 +272,13 @@ class KernelDoc:
         item = KdocItem(name, dtype, self.entry.declaration_start_line, **args)
         item.warnings = self.entry.warnings
 
-        sections = item.get('sections', {})
-
         # Drop empty sections
         # TODO: improve empty sections logic to emit warnings
+        sections = self.entry.sections
         for section in ["Description", "Return"]:
             if section in sections and not sections[section].rstrip():
                 del sections[section]
+        item.set_sections(sections, self.entry.section_start_lines)
 
         self.entries.append(item)
 
@@ -824,8 +824,6 @@ class KernelDoc:
                                 parameterdescs=self.entry.parameterdescs,
                                 parametertypes=self.entry.parametertypes,
                                 parameterdesc_start_lines=self.entry.parameterdesc_start_lines,
-                                sections=self.entry.sections,
-                                section_start_lines=self.entry.section_start_lines,
                                 purpose=self.entry.declaration_purpose)
 
     def dump_enum(self, ln, proto):
@@ -908,8 +906,6 @@ class KernelDoc:
                                 parameterlist=self.entry.parameterlist,
                                 parameterdescs=self.entry.parameterdescs,
                                 parameterdesc_start_lines=self.entry.parameterdesc_start_lines,
-                                sections=self.entry.sections,
-                                section_start_lines=self.entry.section_start_lines,
                                 purpose=self.entry.declaration_purpose)
 
     def dump_declaration(self, ln, prototype):
@@ -1079,8 +1075,6 @@ class KernelDoc:
                                     parameterdescs=self.entry.parameterdescs,
                                     parametertypes=self.entry.parametertypes,
                                     parameterdesc_start_lines=self.entry.parameterdesc_start_lines,
-                                    sections=self.entry.sections,
-                                    section_start_lines=self.entry.section_start_lines,
                                     purpose=self.entry.declaration_purpose,
                                     func_macro=func_macro)
         else:
@@ -1092,8 +1086,6 @@ class KernelDoc:
                                     parameterdescs=self.entry.parameterdescs,
                                     parametertypes=self.entry.parametertypes,
                                     parameterdesc_start_lines=self.entry.parameterdesc_start_lines,
-                                    sections=self.entry.sections,
-                                    section_start_lines=self.entry.section_start_lines,
                                     purpose=self.entry.declaration_purpose,
                                     func_macro=func_macro)
 
@@ -1137,8 +1129,6 @@ class KernelDoc:
                                     parameterdescs=self.entry.parameterdescs,
                                     parametertypes=self.entry.parametertypes,
                                     parameterdesc_start_lines=self.entry.parameterdesc_start_lines,
-                                    sections=self.entry.sections,
-                                    section_start_lines=self.entry.section_start_lines,
                                     purpose=self.entry.declaration_purpose)
             return
 
@@ -1159,8 +1149,6 @@ class KernelDoc:
 
             self.output_declaration('typedef', declaration_name,
                                     typedef=declaration_name,
-                                    sections=self.entry.sections,
-                                    section_start_lines=self.entry.section_start_lines,
                                     purpose=self.entry.declaration_purpose)
             return
 
@@ -1642,9 +1630,7 @@ class KernelDoc:
 
         if doc_end.search(line):
             self.dump_section()
-            self.output_declaration("doc", self.entry.identifier,
-                                    sections=self.entry.sections,
-                                    section_start_lines=self.entry.section_start_lines)
+            self.output_declaration("doc", self.entry.identifier)
             self.reset_state(ln)
 
         elif doc_content.search(line):
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 05/12] docs: kdoc: remove the "struct_actual" machinery
  2025-07-02 22:35 [PATCH 00/12] [PATCH 00/11] Thrash up the parser/output interface Jonathan Corbet
                   ` (3 preceding siblings ...)
  2025-07-02 22:35 ` [PATCH 04/12] docs: kdoc: Centralize handling of the item section list Jonathan Corbet
@ 2025-07-02 22:35 ` Jonathan Corbet
  2025-07-10  6:11   ` Mauro Carvalho Chehab
  2025-07-02 22:35 ` [PATCH 06/12] docs: kdoc: use self.entry.parameterlist directly in check_sections() Jonathan Corbet
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 41+ messages in thread
From: Jonathan Corbet @ 2025-07-02 22:35 UTC (permalink / raw)
  To: linux-doc
  Cc: linux-kernel, Mauro Carvalho Chehab, Akira Yokosawa,
	Jonathan Corbet

The code goes out of its way to create a special list of parameters in
entry.struct_actual that is just like entry.parameterlist, but with extra
junk.  The only use of that information, in check_sections(), promptly
strips all the extra junk back out.  Drop all that extra work and just use
parameterlist.

No output changes.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 scripts/lib/kdoc/kdoc_parser.py | 32 ++------------------------------
 1 file changed, 2 insertions(+), 30 deletions(-)

diff --git a/scripts/lib/kdoc/kdoc_parser.py b/scripts/lib/kdoc/kdoc_parser.py
index 608f3a1045dc..b28f056365cb 100644
--- a/scripts/lib/kdoc/kdoc_parser.py
+++ b/scripts/lib/kdoc/kdoc_parser.py
@@ -116,7 +116,6 @@ class KernelEntry:
 
         self._contents = []
         self.sectcheck = ""
-        self.struct_actual = ""
         self.prototype = ""
 
         self.warnings = []
@@ -366,15 +365,6 @@ class KernelDoc:
         org_arg = KernRe(r'\s\s+').sub(' ', org_arg)
         self.entry.parametertypes[param] = org_arg
 
-    def save_struct_actual(self, actual):
-        """
-        Strip all spaces from the actual param so that it looks like
-        one string item.
-        """
-
-        actual = KernRe(r'\s*').sub("", actual, count=1)
-
-        self.entry.struct_actual += actual + " "
 
     def create_parameter_list(self, ln, decl_type, args,
                               splitter, declaration_name):
@@ -420,7 +410,6 @@ class KernelDoc:
                     param = arg
 
                 dtype = KernRe(r'([^\(]+\(\*?)\s*' + re.escape(param)).sub(r'\1', arg)
-                self.save_struct_actual(param)
                 self.push_parameter(ln, decl_type, param, dtype,
                                     arg, declaration_name)
 
@@ -437,7 +426,6 @@ class KernelDoc:
 
                 dtype = KernRe(r'([^\(]+\(\*?)\s*' + re.escape(param)).sub(r'\1', arg)
 
-                self.save_struct_actual(param)
                 self.push_parameter(ln, decl_type, param, dtype,
                                     arg, declaration_name)
 
@@ -470,7 +458,6 @@ class KernelDoc:
 
                         param = r.group(1)
 
-                        self.save_struct_actual(r.group(2))
                         self.push_parameter(ln, decl_type, r.group(2),
                                             f"{dtype} {r.group(1)}",
                                             arg, declaration_name)
@@ -482,12 +469,10 @@ class KernelDoc:
                             continue
 
                         if dtype != "":  # Skip unnamed bit-fields
-                            self.save_struct_actual(r.group(1))
                             self.push_parameter(ln, decl_type, r.group(1),
                                                 f"{dtype}:{r.group(2)}",
                                                 arg, declaration_name)
                     else:
-                        self.save_struct_actual(param)
                         self.push_parameter(ln, decl_type, param, dtype,
                                             arg, declaration_name)
 
@@ -499,24 +484,11 @@ class KernelDoc:
 
         sects = sectcheck.split()
         prms = prmscheck.split()
-        err = False
 
         for sx in range(len(sects)):                  # pylint: disable=C0200
             err = True
             for px in range(len(prms)):               # pylint: disable=C0200
-                prm_clean = prms[px]
-                prm_clean = KernRe(r'\[.*\]').sub('', prm_clean)
-                prm_clean = attribute.sub('', prm_clean)
-
-                # ignore array size in a parameter string;
-                # however, the original param string may contain
-                # spaces, e.g.:  addr[6 + 2]
-                # and this appears in @prms as "addr[6" since the
-                # parameter list is split at spaces;
-                # hence just ignore "[..." for the sections check;
-                prm_clean = KernRe(r'\[.*').sub('', prm_clean)
-
-                if prm_clean == sects[sx]:
+                if prms[px] == sects[sx]:
                     err = False
                     break
 
@@ -782,7 +754,7 @@ class KernelDoc:
         self.create_parameter_list(ln, decl_type, members, ';',
                                    declaration_name)
         self.check_sections(ln, declaration_name, decl_type,
-                            self.entry.sectcheck, self.entry.struct_actual)
+                            self.entry.sectcheck, ' '.join(self.entry.parameterlist))
 
         # Adjust declaration for better display
         declaration = KernRe(r'([\{;])').sub(r'\1\n', declaration)
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 06/12] docs: kdoc: use self.entry.parameterlist directly in check_sections()
  2025-07-02 22:35 [PATCH 00/12] [PATCH 00/11] Thrash up the parser/output interface Jonathan Corbet
                   ` (4 preceding siblings ...)
  2025-07-02 22:35 ` [PATCH 05/12] docs: kdoc: remove the "struct_actual" machinery Jonathan Corbet
@ 2025-07-02 22:35 ` Jonathan Corbet
  2025-07-10  6:12   ` Mauro Carvalho Chehab
  2025-07-02 22:35 ` [PATCH 07/12] docs: kdoc: Coalesce parameter-list handling Jonathan Corbet
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 41+ messages in thread
From: Jonathan Corbet @ 2025-07-02 22:35 UTC (permalink / raw)
  To: linux-doc
  Cc: linux-kernel, Mauro Carvalho Chehab, Akira Yokosawa,
	Jonathan Corbet

Callers of check_sections() join parameterlist into a single string, which
is then immediately split back into the original list.  Rather than do all
that, just use parameterlist directly in check_sections().

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 scripts/lib/kdoc/kdoc_parser.py | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/scripts/lib/kdoc/kdoc_parser.py b/scripts/lib/kdoc/kdoc_parser.py
index b28f056365cb..ffd49f9395ae 100644
--- a/scripts/lib/kdoc/kdoc_parser.py
+++ b/scripts/lib/kdoc/kdoc_parser.py
@@ -476,19 +476,18 @@ class KernelDoc:
                         self.push_parameter(ln, decl_type, param, dtype,
                                             arg, declaration_name)
 
-    def check_sections(self, ln, decl_name, decl_type, sectcheck, prmscheck):
+    def check_sections(self, ln, decl_name, decl_type, sectcheck):
         """
         Check for errors inside sections, emitting warnings if not found
         parameters are described.
         """
 
         sects = sectcheck.split()
-        prms = prmscheck.split()
 
         for sx in range(len(sects)):                  # pylint: disable=C0200
             err = True
-            for px in range(len(prms)):               # pylint: disable=C0200
-                if prms[px] == sects[sx]:
+            for param in self.entry.parameterlist:
+                if param == sects[sx]:
                     err = False
                     break
 
@@ -753,8 +752,7 @@ class KernelDoc:
 
         self.create_parameter_list(ln, decl_type, members, ';',
                                    declaration_name)
-        self.check_sections(ln, declaration_name, decl_type,
-                            self.entry.sectcheck, ' '.join(self.entry.parameterlist))
+        self.check_sections(ln, declaration_name, decl_type, self.entry.sectcheck)
 
         # Adjust declaration for better display
         declaration = KernRe(r'([\{;])').sub(r'\1\n', declaration)
@@ -1032,9 +1030,7 @@ class KernelDoc:
                           f"expecting prototype for {self.entry.identifier}(). Prototype was for {declaration_name}() instead")
             return
 
-        prms = " ".join(self.entry.parameterlist)
-        self.check_sections(ln, declaration_name, "function",
-                            self.entry.sectcheck, prms)
+        self.check_sections(ln, declaration_name, "function", self.entry.sectcheck)
 
         self.check_return_section(ln, declaration_name, return_type)
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 07/12] docs: kdoc: Coalesce parameter-list handling
  2025-07-02 22:35 [PATCH 00/12] [PATCH 00/11] Thrash up the parser/output interface Jonathan Corbet
                   ` (5 preceding siblings ...)
  2025-07-02 22:35 ` [PATCH 06/12] docs: kdoc: use self.entry.parameterlist directly in check_sections() Jonathan Corbet
@ 2025-07-02 22:35 ` Jonathan Corbet
  2025-07-10  6:20   ` Mauro Carvalho Chehab
  2025-07-02 22:35 ` [PATCH 08/12] docs: kdoc: Regularize the use of the declaration name Jonathan Corbet
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 41+ messages in thread
From: Jonathan Corbet @ 2025-07-02 22:35 UTC (permalink / raw)
  To: linux-doc
  Cc: linux-kernel, Mauro Carvalho Chehab, Akira Yokosawa,
	Jonathan Corbet

Callers to output_declaration() always pass the parameter information from
self.entry; remove all of the boilerplate arguments and just get at that
information directly.  Formalize its placement in the KdocItem class.

It would be nice to get rid of parameterlist as well, but that has the
effect of reordering the output of function parameters and struct fields to
match the order in the kerneldoc comment rather than in the declaration.
One could argue about which is more correct, but the ordering has been left
unchanged for now.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 scripts/lib/kdoc/kdoc_item.py   | 10 ++++-
 scripts/lib/kdoc/kdoc_output.py | 75 +++++++++++++--------------------
 scripts/lib/kdoc/kdoc_parser.py | 23 ++--------
 3 files changed, 41 insertions(+), 67 deletions(-)

diff --git a/scripts/lib/kdoc/kdoc_item.py b/scripts/lib/kdoc/kdoc_item.py
index c8329019a219..51e8669b9a6e 100644
--- a/scripts/lib/kdoc/kdoc_item.py
+++ b/scripts/lib/kdoc/kdoc_item.py
@@ -10,6 +10,8 @@ class KdocItem:
         self.type = type
         self.declaration_start_line = start_line
         self.sections = self.sections_start_lines = { }
+        self.parameterlist = self.parameterdesc_start_lines = []
+        self.parameterdescs = self.parametertypes = { }
         #
         # Just save everything else into our own dict so that the output
         # side can grab it directly as before.  As we move things into more
@@ -27,8 +29,14 @@ class KdocItem:
         return self.get(key)
 
     #
-    # Tracking of section information.
+    # Tracking of section and parameter information.
     #
     def set_sections(self, sections, start_lines):
         self.sections = sections
         self.section_start_lines = start_lines
+
+    def set_params(self, names, descs, types, starts):
+        self.parameterlist = names
+        self.parameterdescs = descs
+        self.parametertypes = types
+        self.parameterdesc_start_lines = starts
diff --git a/scripts/lib/kdoc/kdoc_output.py b/scripts/lib/kdoc/kdoc_output.py
index 15cb89f91987..d6f4d9e7173b 100644
--- a/scripts/lib/kdoc/kdoc_output.py
+++ b/scripts/lib/kdoc/kdoc_output.py
@@ -373,18 +373,13 @@ class RestFormat(OutputFormat):
                 signature = args['functiontype'] + " "
             signature += args['function'] + " ("
 
-        parameterlist = args.get('parameterlist', [])
-        parameterdescs = args.get('parameterdescs', {})
-        parameterdesc_start_lines = args.get('parameterdesc_start_lines', {})
-
         ln = args.get('declaration_start_line', 0)
-
         count = 0
-        for parameter in parameterlist:
+        for parameter in args.parameterlist:
             if count != 0:
                 signature += ", "
             count += 1
-            dtype = args['parametertypes'].get(parameter, "")
+            dtype = args.parametertypes.get(parameter, "")
 
             if function_pointer.search(dtype):
                 signature += function_pointer.group(1) + parameter + function_pointer.group(3)
@@ -419,26 +414,26 @@ class RestFormat(OutputFormat):
         # function prototypes apart
         self.lineprefix = "  "
 
-        if parameterlist:
+        if args.parameterlist:
             self.data += ".. container:: kernelindent\n\n"
             self.data += f"{self.lineprefix}**Parameters**\n\n"
 
-        for parameter in parameterlist:
+        for parameter in args.parameterlist:
             parameter_name = KernRe(r'\[.*').sub('', parameter)
-            dtype = args['parametertypes'].get(parameter, "")
+            dtype = args.parametertypes.get(parameter, "")
 
             if dtype:
                 self.data += f"{self.lineprefix}``{dtype}``\n"
             else:
                 self.data += f"{self.lineprefix}``{parameter}``\n"
 
-            self.print_lineno(parameterdesc_start_lines.get(parameter_name, 0))
+            self.print_lineno(args.parameterdesc_start_lines.get(parameter_name, 0))
 
             self.lineprefix = "    "
-            if parameter_name in parameterdescs and \
-               parameterdescs[parameter_name] != KernelDoc.undescribed:
+            if parameter_name in args.parameterdescs and \
+               args.parameterdescs[parameter_name] != KernelDoc.undescribed:
 
-                self.output_highlight(parameterdescs[parameter_name])
+                self.output_highlight(args.parameterdescs[parameter_name])
                 self.data += "\n"
             else:
                 self.data += f"{self.lineprefix}*undescribed*\n\n"
@@ -451,8 +446,6 @@ class RestFormat(OutputFormat):
 
         oldprefix = self.lineprefix
         name = args.get('enum', '')
-        parameterlist = args.get('parameterlist', [])
-        parameterdescs = args.get('parameterdescs', {})
         ln = args.get('declaration_start_line', 0)
 
         self.data += f"\n\n.. c:enum:: {name}\n\n"
@@ -467,11 +460,11 @@ class RestFormat(OutputFormat):
         self.lineprefix = outer + "  "
         self.data += f"{outer}**Constants**\n\n"
 
-        for parameter in parameterlist:
+        for parameter in args.parameterlist:
             self.data += f"{outer}``{parameter}``\n"
 
-            if parameterdescs.get(parameter, '') != KernelDoc.undescribed:
-                self.output_highlight(parameterdescs[parameter])
+            if args.parameterdescs.get(parameter, '') != KernelDoc.undescribed:
+                self.output_highlight(args.parameterdescs[parameter])
             else:
                 self.data += f"{self.lineprefix}*undescribed*\n\n"
             self.data += "\n"
@@ -505,10 +498,6 @@ class RestFormat(OutputFormat):
         dtype = args.get('type', "struct")
         ln = args.get('declaration_start_line', 0)
 
-        parameterlist = args.get('parameterlist', [])
-        parameterdescs = args.get('parameterdescs', {})
-        parameterdesc_start_lines = args.get('parameterdesc_start_lines', {})
-
         self.data += f"\n\n.. c:{dtype}:: {name}\n\n"
 
         self.print_lineno(ln)
@@ -531,21 +520,21 @@ class RestFormat(OutputFormat):
 
         self.lineprefix = "  "
         self.data += f"{self.lineprefix}**Members**\n\n"
-        for parameter in parameterlist:
+        for parameter in args.parameterlist:
             if not parameter or parameter.startswith("#"):
                 continue
 
             parameter_name = parameter.split("[", maxsplit=1)[0]
 
-            if parameterdescs.get(parameter_name) == KernelDoc.undescribed:
+            if args.parameterdescs.get(parameter_name) == KernelDoc.undescribed:
                 continue
 
-            self.print_lineno(parameterdesc_start_lines.get(parameter_name, 0))
+            self.print_lineno(args.parameterdesc_start_lines.get(parameter_name, 0))
 
             self.data += f"{self.lineprefix}``{parameter}``\n"
 
             self.lineprefix = "    "
-            self.output_highlight(parameterdescs[parameter_name])
+            self.output_highlight(args.parameterdescs[parameter_name])
             self.lineprefix = "  "
 
             self.data += "\n"
@@ -643,9 +632,6 @@ class ManFormat(OutputFormat):
     def out_function(self, fname, name, args):
         """output function in man"""
 
-        parameterlist = args.get('parameterlist', [])
-        parameterdescs = args.get('parameterdescs', {})
-
         self.data += f'.TH "{args["function"]}" 9 "{args["function"]}" "{self.man_date}" "Kernel Hacker\'s Manual" LINUX' + "\n"
 
         self.data += ".SH NAME\n"
@@ -661,11 +647,11 @@ class ManFormat(OutputFormat):
         parenth = "("
         post = ","
 
-        for parameter in parameterlist:
-            if count == len(parameterlist) - 1:
+        for parameter in args.parameterlist:
+            if count == len(args.parameterlist) - 1:
                 post = ");"
 
-            dtype = args['parametertypes'].get(parameter, "")
+            dtype = args.parametertypes.get(parameter, "")
             if function_pointer.match(dtype):
                 # Pointer-to-function
                 self.data += f'".BI "{parenth}{function_pointer.group(1)}" " ") ({function_pointer.group(2)}){post}"' + "\n"
@@ -676,14 +662,14 @@ class ManFormat(OutputFormat):
             count += 1
             parenth = ""
 
-        if parameterlist:
+        if args.parameterlist:
             self.data += ".SH ARGUMENTS\n"
 
-        for parameter in parameterlist:
+        for parameter in args.parameterlist:
             parameter_name = re.sub(r'\[.*', '', parameter)
 
             self.data += f'.IP "{parameter}" 12' + "\n"
-            self.output_highlight(parameterdescs.get(parameter_name, ""))
+            self.output_highlight(args.parameterdescs.get(parameter_name, ""))
 
         for section, text in args.sections.items():
             self.data += f'.SH "{section.upper()}"' + "\n"
@@ -692,7 +678,6 @@ class ManFormat(OutputFormat):
     def out_enum(self, fname, name, args):
 
         name = args.get('enum', '')
-        parameterlist = args.get('parameterlist', [])
 
         self.data += f'.TH "{self.modulename}" 9 "enum {args["enum"]}" "{self.man_date}" "API Manual" LINUX' + "\n"
 
@@ -703,9 +688,9 @@ class ManFormat(OutputFormat):
         self.data += f"enum {args['enum']}" + " {\n"
 
         count = 0
-        for parameter in parameterlist:
+        for parameter in args.parameterlist:
             self.data += f'.br\n.BI "    {parameter}"' + "\n"
-            if count == len(parameterlist) - 1:
+            if count == len(args.parameterlist) - 1:
                 self.data += "\n};\n"
             else:
                 self.data += ", \n.br\n"
@@ -714,10 +699,10 @@ class ManFormat(OutputFormat):
 
         self.data += ".SH Constants\n"
 
-        for parameter in parameterlist:
+        for parameter in args.parameterlist:
             parameter_name = KernRe(r'\[.*').sub('', parameter)
             self.data += f'.IP "{parameter}" 12' + "\n"
-            self.output_highlight(args['parameterdescs'].get(parameter_name, ""))
+            self.output_highlight(args.parameterdescs.get(parameter_name, ""))
 
         for section, text in args.sections.items():
             self.data += f'.SH "{section}"' + "\n"
@@ -743,8 +728,6 @@ class ManFormat(OutputFormat):
         struct_name = args.get('struct')
         purpose = args.get('purpose')
         definition = args.get('definition')
-        parameterlist = args.get('parameterlist', [])
-        parameterdescs = args.get('parameterdescs', {})
 
         self.data += f'.TH "{module}" 9 "{struct_type} {struct_name}" "{self.man_date}" "API Manual" LINUX' + "\n"
 
@@ -760,17 +743,17 @@ class ManFormat(OutputFormat):
         self.data += f'.BI "{declaration}\n' + "};\n.br\n\n"
 
         self.data += ".SH Members\n"
-        for parameter in parameterlist:
+        for parameter in args.parameterlist:
             if parameter.startswith("#"):
                 continue
 
             parameter_name = re.sub(r"\[.*", "", parameter)
 
-            if parameterdescs.get(parameter_name) == KernelDoc.undescribed:
+            if args.parameterdescs.get(parameter_name) == KernelDoc.undescribed:
                 continue
 
             self.data += f'.IP "{parameter}" 12' + "\n"
-            self.output_highlight(parameterdescs.get(parameter_name))
+            self.output_highlight(args.parameterdescs.get(parameter_name))
 
         for section, text in args.sections.items():
             self.data += f'.SH "{section}"' + "\n"
diff --git a/scripts/lib/kdoc/kdoc_parser.py b/scripts/lib/kdoc/kdoc_parser.py
index ffd49f9395ae..298abd260264 100644
--- a/scripts/lib/kdoc/kdoc_parser.py
+++ b/scripts/lib/kdoc/kdoc_parser.py
@@ -278,7 +278,9 @@ class KernelDoc:
             if section in sections and not sections[section].rstrip():
                 del sections[section]
         item.set_sections(sections, self.entry.section_start_lines)
-
+        item.set_params(self.entry.parameterlist, self.entry.parameterdescs,
+                        self.entry.parametertypes,
+                        self.entry.parameterdesc_start_lines)
         self.entries.append(item)
 
         self.config.log.debug("Output: %s:%s = %s", dtype, name, pformat(args))
@@ -790,10 +792,6 @@ class KernelDoc:
         self.output_declaration(decl_type, declaration_name,
                                 struct=declaration_name,
                                 definition=declaration,
-                                parameterlist=self.entry.parameterlist,
-                                parameterdescs=self.entry.parameterdescs,
-                                parametertypes=self.entry.parametertypes,
-                                parameterdesc_start_lines=self.entry.parameterdesc_start_lines,
                                 purpose=self.entry.declaration_purpose)
 
     def dump_enum(self, ln, proto):
@@ -873,9 +871,6 @@ class KernelDoc:
 
         self.output_declaration('enum', declaration_name,
                                 enum=declaration_name,
-                                parameterlist=self.entry.parameterlist,
-                                parameterdescs=self.entry.parameterdescs,
-                                parameterdesc_start_lines=self.entry.parameterdesc_start_lines,
                                 purpose=self.entry.declaration_purpose)
 
     def dump_declaration(self, ln, prototype):
@@ -1039,10 +1034,6 @@ class KernelDoc:
                                     function=declaration_name,
                                     typedef=True,
                                     functiontype=return_type,
-                                    parameterlist=self.entry.parameterlist,
-                                    parameterdescs=self.entry.parameterdescs,
-                                    parametertypes=self.entry.parametertypes,
-                                    parameterdesc_start_lines=self.entry.parameterdesc_start_lines,
                                     purpose=self.entry.declaration_purpose,
                                     func_macro=func_macro)
         else:
@@ -1050,10 +1041,6 @@ class KernelDoc:
                                     function=declaration_name,
                                     typedef=False,
                                     functiontype=return_type,
-                                    parameterlist=self.entry.parameterlist,
-                                    parameterdescs=self.entry.parameterdescs,
-                                    parametertypes=self.entry.parametertypes,
-                                    parameterdesc_start_lines=self.entry.parameterdesc_start_lines,
                                     purpose=self.entry.declaration_purpose,
                                     func_macro=func_macro)
 
@@ -1093,10 +1080,6 @@ class KernelDoc:
                                     function=declaration_name,
                                     typedef=True,
                                     functiontype=return_type,
-                                    parameterlist=self.entry.parameterlist,
-                                    parameterdescs=self.entry.parameterdescs,
-                                    parametertypes=self.entry.parametertypes,
-                                    parameterdesc_start_lines=self.entry.parameterdesc_start_lines,
                                     purpose=self.entry.declaration_purpose)
             return
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 08/12] docs: kdoc: Regularize the use of the declaration name
  2025-07-02 22:35 [PATCH 00/12] [PATCH 00/11] Thrash up the parser/output interface Jonathan Corbet
                   ` (6 preceding siblings ...)
  2025-07-02 22:35 ` [PATCH 07/12] docs: kdoc: Coalesce parameter-list handling Jonathan Corbet
@ 2025-07-02 22:35 ` Jonathan Corbet
  2025-07-10  6:22   ` Mauro Carvalho Chehab
  2025-07-02 22:35 ` [PATCH 09/12] docs: kdoc: straighten up dump_declaration() Jonathan Corbet
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 41+ messages in thread
From: Jonathan Corbet @ 2025-07-02 22:35 UTC (permalink / raw)
  To: linux-doc
  Cc: linux-kernel, Mauro Carvalho Chehab, Akira Yokosawa,
	Jonathan Corbet

Each declaration type passes through the name in a unique field of the
"args" blob - even though we have always just passed the name separately.
Get rid of all the weird names and just use the common version.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 scripts/lib/kdoc/kdoc_output.py | 39 +++++++++++++--------------------
 scripts/lib/kdoc/kdoc_parser.py |  6 -----
 2 files changed, 15 insertions(+), 30 deletions(-)

diff --git a/scripts/lib/kdoc/kdoc_output.py b/scripts/lib/kdoc/kdoc_output.py
index d6f4d9e7173b..8a31b637ffd2 100644
--- a/scripts/lib/kdoc/kdoc_output.py
+++ b/scripts/lib/kdoc/kdoc_output.py
@@ -367,11 +367,11 @@ class RestFormat(OutputFormat):
 
         func_macro = args.get('func_macro', False)
         if func_macro:
-            signature = args['function']
+            signature = name
         else:
             if args.get('functiontype'):
                 signature = args['functiontype'] + " "
-            signature += args['function'] + " ("
+            signature += name + " ("
 
         ln = args.get('declaration_start_line', 0)
         count = 0
@@ -391,7 +391,7 @@ class RestFormat(OutputFormat):
 
         self.print_lineno(ln)
         if args.get('typedef') or not args.get('functiontype'):
-            self.data += f".. c:macro:: {args['function']}\n\n"
+            self.data += f".. c:macro:: {name}\n\n"
 
             if args.get('typedef'):
                 self.data += "   **Typedef**: "
@@ -445,7 +445,6 @@ class RestFormat(OutputFormat):
     def out_enum(self, fname, name, args):
 
         oldprefix = self.lineprefix
-        name = args.get('enum', '')
         ln = args.get('declaration_start_line', 0)
 
         self.data += f"\n\n.. c:enum:: {name}\n\n"
@@ -475,7 +474,6 @@ class RestFormat(OutputFormat):
     def out_typedef(self, fname, name, args):
 
         oldprefix = self.lineprefix
-        name = args.get('typedef', '')
         ln = args.get('declaration_start_line', 0)
 
         self.data += f"\n\n.. c:type:: {name}\n\n"
@@ -492,7 +490,6 @@ class RestFormat(OutputFormat):
 
     def out_struct(self, fname, name, args):
 
-        name = args.get('struct', "")
         purpose = args.get('purpose', "")
         declaration = args.get('definition', "")
         dtype = args.get('type', "struct")
@@ -632,16 +629,16 @@ class ManFormat(OutputFormat):
     def out_function(self, fname, name, args):
         """output function in man"""
 
-        self.data += f'.TH "{args["function"]}" 9 "{args["function"]}" "{self.man_date}" "Kernel Hacker\'s Manual" LINUX' + "\n"
+        self.data += f'.TH "{name}" 9 "{name}" "{self.man_date}" "Kernel Hacker\'s Manual" LINUX' + "\n"
 
         self.data += ".SH NAME\n"
-        self.data += f"{args['function']} \\- {args['purpose']}\n"
+        self.data += f"{name} \\- {args['purpose']}\n"
 
         self.data += ".SH SYNOPSIS\n"
         if args.get('functiontype', ''):
-            self.data += f'.B "{args["functiontype"]}" {args["function"]}' + "\n"
+            self.data += f'.B "{args["functiontype"]}" {name}' + "\n"
         else:
-            self.data += f'.B "{args["function"]}' + "\n"
+            self.data += f'.B "{name}' + "\n"
 
         count = 0
         parenth = "("
@@ -676,16 +673,13 @@ class ManFormat(OutputFormat):
             self.output_highlight(text)
 
     def out_enum(self, fname, name, args):
-
-        name = args.get('enum', '')
-
-        self.data += f'.TH "{self.modulename}" 9 "enum {args["enum"]}" "{self.man_date}" "API Manual" LINUX' + "\n"
+        self.data += f'.TH "{self.modulename}" 9 "enum {name}" "{self.man_date}" "API Manual" LINUX' + "\n"
 
         self.data += ".SH NAME\n"
-        self.data += f"enum {args['enum']} \\- {args['purpose']}\n"
+        self.data += f"enum {name} \\- {args['purpose']}\n"
 
         self.data += ".SH SYNOPSIS\n"
-        self.data += f"enum {args['enum']}" + " {\n"
+        self.data += f"enum {name}" + " {\n"
 
         count = 0
         for parameter in args.parameterlist:
@@ -710,13 +704,12 @@ class ManFormat(OutputFormat):
 
     def out_typedef(self, fname, name, args):
         module = self.modulename
-        typedef = args.get('typedef')
         purpose = args.get('purpose')
 
-        self.data += f'.TH "{module}" 9 "{typedef}" "{self.man_date}" "API Manual" LINUX' + "\n"
+        self.data += f'.TH "{module}" 9 "{name}" "{self.man_date}" "API Manual" LINUX' + "\n"
 
         self.data += ".SH NAME\n"
-        self.data += f"typedef {typedef} \\- {purpose}\n"
+        self.data += f"typedef {name} \\- {purpose}\n"
 
         for section, text in args.sections.items():
             self.data += f'.SH "{section}"' + "\n"
@@ -724,22 +717,20 @@ class ManFormat(OutputFormat):
 
     def out_struct(self, fname, name, args):
         module = self.modulename
-        struct_type = args.get('type')
-        struct_name = args.get('struct')
         purpose = args.get('purpose')
         definition = args.get('definition')
 
-        self.data += f'.TH "{module}" 9 "{struct_type} {struct_name}" "{self.man_date}" "API Manual" LINUX' + "\n"
+        self.data += f'.TH "{module}" 9 "{args.type} {name}" "{self.man_date}" "API Manual" LINUX' + "\n"
 
         self.data += ".SH NAME\n"
-        self.data += f"{struct_type} {struct_name} \\- {purpose}\n"
+        self.data += f"{args.type} {name} \\- {purpose}\n"
 
         # Replace tabs with two spaces and handle newlines
         declaration = definition.replace("\t", "  ")
         declaration = KernRe(r"\n").sub('"\n.br\n.BI "', declaration)
 
         self.data += ".SH SYNOPSIS\n"
-        self.data += f"{struct_type} {struct_name} " + "{" + "\n.br\n"
+        self.data += f"{args.type} {name} " + "{" + "\n.br\n"
         self.data += f'.BI "{declaration}\n' + "};\n.br\n\n"
 
         self.data += ".SH Members\n"
diff --git a/scripts/lib/kdoc/kdoc_parser.py b/scripts/lib/kdoc/kdoc_parser.py
index 298abd260264..6e35e508608b 100644
--- a/scripts/lib/kdoc/kdoc_parser.py
+++ b/scripts/lib/kdoc/kdoc_parser.py
@@ -790,7 +790,6 @@ class KernelDoc:
                 level += 1
 
         self.output_declaration(decl_type, declaration_name,
-                                struct=declaration_name,
                                 definition=declaration,
                                 purpose=self.entry.declaration_purpose)
 
@@ -870,7 +869,6 @@ class KernelDoc:
                               f"Excess enum value '%{k}' description in '{declaration_name}'")
 
         self.output_declaration('enum', declaration_name,
-                                enum=declaration_name,
                                 purpose=self.entry.declaration_purpose)
 
     def dump_declaration(self, ln, prototype):
@@ -1031,14 +1029,12 @@ class KernelDoc:
 
         if 'typedef' in return_type:
             self.output_declaration(decl_type, declaration_name,
-                                    function=declaration_name,
                                     typedef=True,
                                     functiontype=return_type,
                                     purpose=self.entry.declaration_purpose,
                                     func_macro=func_macro)
         else:
             self.output_declaration(decl_type, declaration_name,
-                                    function=declaration_name,
                                     typedef=False,
                                     functiontype=return_type,
                                     purpose=self.entry.declaration_purpose,
@@ -1077,7 +1073,6 @@ class KernelDoc:
             self.create_parameter_list(ln, decl_type, args, ',', declaration_name)
 
             self.output_declaration(decl_type, declaration_name,
-                                    function=declaration_name,
                                     typedef=True,
                                     functiontype=return_type,
                                     purpose=self.entry.declaration_purpose)
@@ -1099,7 +1094,6 @@ class KernelDoc:
                 return
 
             self.output_declaration('typedef', declaration_name,
-                                    typedef=declaration_name,
                                     purpose=self.entry.declaration_purpose)
             return
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 09/12] docs: kdoc: straighten up dump_declaration()
  2025-07-02 22:35 [PATCH 00/12] [PATCH 00/11] Thrash up the parser/output interface Jonathan Corbet
                   ` (7 preceding siblings ...)
  2025-07-02 22:35 ` [PATCH 08/12] docs: kdoc: Regularize the use of the declaration name Jonathan Corbet
@ 2025-07-02 22:35 ` Jonathan Corbet
  2025-07-10  6:25   ` Mauro Carvalho Chehab
  2025-07-02 22:35 ` [PATCH 10/12] docs: kdoc: directly access the always-there KdocItem fields Jonathan Corbet
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 41+ messages in thread
From: Jonathan Corbet @ 2025-07-02 22:35 UTC (permalink / raw)
  To: linux-doc
  Cc: linux-kernel, Mauro Carvalho Chehab, Akira Yokosawa,
	Jonathan Corbet

Get rid of the excess "return" statements in dump_declaration(), along with
a line of never-executed dead code.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 scripts/lib/kdoc/kdoc_parser.py | 15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/scripts/lib/kdoc/kdoc_parser.py b/scripts/lib/kdoc/kdoc_parser.py
index 6e35e508608b..7191fa94e17a 100644
--- a/scripts/lib/kdoc/kdoc_parser.py
+++ b/scripts/lib/kdoc/kdoc_parser.py
@@ -878,18 +878,13 @@ class KernelDoc:
 
         if self.entry.decl_type == "enum":
             self.dump_enum(ln, prototype)
-            return
-
-        if self.entry.decl_type == "typedef":
+        elif self.entry.decl_type == "typedef":
             self.dump_typedef(ln, prototype)
-            return
-
-        if self.entry.decl_type in ["union", "struct"]:
+        elif self.entry.decl_type in ["union", "struct"]:
             self.dump_struct(ln, prototype)
-            return
-
-        self.output_declaration(self.entry.decl_type, prototype,
-                                entry=self.entry)
+        else:
+            # This would be a bug
+            self.emit_message(ln, f'Unknown declaration type: {self.entry.decl_type}')
 
     def dump_function(self, ln, prototype):
         """
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 10/12] docs: kdoc: directly access the always-there KdocItem fields
  2025-07-02 22:35 [PATCH 00/12] [PATCH 00/11] Thrash up the parser/output interface Jonathan Corbet
                   ` (8 preceding siblings ...)
  2025-07-02 22:35 ` [PATCH 09/12] docs: kdoc: straighten up dump_declaration() Jonathan Corbet
@ 2025-07-02 22:35 ` Jonathan Corbet
  2025-07-10  6:27   ` Mauro Carvalho Chehab
  2025-07-02 22:35 ` [PATCH 11/12] docs: kdoc: clean up check_sections() Jonathan Corbet
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 41+ messages in thread
From: Jonathan Corbet @ 2025-07-02 22:35 UTC (permalink / raw)
  To: linux-doc
  Cc: linux-kernel, Mauro Carvalho Chehab, Akira Yokosawa,
	Jonathan Corbet

They are part of the interface, so use them directly.  This allows the
removal of the transitional __dict__ hack in KdocItem.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 scripts/lib/kdoc/kdoc_item.py   |  5 +----
 scripts/lib/kdoc/kdoc_output.py | 16 +++++++---------
 2 files changed, 8 insertions(+), 13 deletions(-)

diff --git a/scripts/lib/kdoc/kdoc_item.py b/scripts/lib/kdoc/kdoc_item.py
index 51e8669b9a6e..807290678984 100644
--- a/scripts/lib/kdoc/kdoc_item.py
+++ b/scripts/lib/kdoc/kdoc_item.py
@@ -20,10 +20,7 @@ class KdocItem:
         self.other_stuff = other_stuff
 
     def get(self, key, default = None):
-        ret = self.other_stuff.get(key, default)
-        if ret == default:
-            return self.__dict__.get(key, default)
-        return ret
+        return self.other_stuff.get(key, default)
 
     def __getitem__(self, key):
         return self.get(key)
diff --git a/scripts/lib/kdoc/kdoc_output.py b/scripts/lib/kdoc/kdoc_output.py
index 8a31b637ffd2..ea8914537ba0 100644
--- a/scripts/lib/kdoc/kdoc_output.py
+++ b/scripts/lib/kdoc/kdoc_output.py
@@ -124,9 +124,7 @@ class OutputFormat:
         Output warnings for identifiers that will be displayed.
         """
 
-        warnings = args.get('warnings', [])
-
-        for log_msg in warnings:
+        for log_msg in args.warnings:
             self.config.warning(log_msg)
 
     def check_doc(self, name, args):
@@ -184,7 +182,7 @@ class OutputFormat:
 
         self.data = ""
 
-        dtype = args.get('type', "")
+        dtype = args.type
 
         if dtype == "doc":
             self.out_doc(fname, name, args)
@@ -373,7 +371,7 @@ class RestFormat(OutputFormat):
                 signature = args['functiontype'] + " "
             signature += name + " ("
 
-        ln = args.get('declaration_start_line', 0)
+        ln = args.declaration_start_line
         count = 0
         for parameter in args.parameterlist:
             if count != 0:
@@ -445,7 +443,7 @@ class RestFormat(OutputFormat):
     def out_enum(self, fname, name, args):
 
         oldprefix = self.lineprefix
-        ln = args.get('declaration_start_line', 0)
+        ln = args.declaration_start_line
 
         self.data += f"\n\n.. c:enum:: {name}\n\n"
 
@@ -474,7 +472,7 @@ class RestFormat(OutputFormat):
     def out_typedef(self, fname, name, args):
 
         oldprefix = self.lineprefix
-        ln = args.get('declaration_start_line', 0)
+        ln = args.declaration_start_line
 
         self.data += f"\n\n.. c:type:: {name}\n\n"
 
@@ -492,8 +490,8 @@ class RestFormat(OutputFormat):
 
         purpose = args.get('purpose', "")
         declaration = args.get('definition', "")
-        dtype = args.get('type', "struct")
-        ln = args.get('declaration_start_line', 0)
+        dtype = args.type
+        ln = args.declaration_start_line
 
         self.data += f"\n\n.. c:{dtype}:: {name}\n\n"
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 11/12] docs: kdoc: clean up check_sections()
  2025-07-02 22:35 [PATCH 00/12] [PATCH 00/11] Thrash up the parser/output interface Jonathan Corbet
                   ` (9 preceding siblings ...)
  2025-07-02 22:35 ` [PATCH 10/12] docs: kdoc: directly access the always-there KdocItem fields Jonathan Corbet
@ 2025-07-02 22:35 ` Jonathan Corbet
  2025-07-10  6:29   ` Mauro Carvalho Chehab
  2025-07-02 22:35 ` [PATCH 12/12] docs: kdoc: Improve the output text accumulation Jonathan Corbet
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 41+ messages in thread
From: Jonathan Corbet @ 2025-07-02 22:35 UTC (permalink / raw)
  To: linux-doc
  Cc: linux-kernel, Mauro Carvalho Chehab, Akira Yokosawa,
	Jonathan Corbet

entry.sectcheck is just a duplicate of our list of sections that is only
passed to check_sections(); its main purpose seems to be to avoid checking
the special named sections.  Rework check_sections() to not use that field
(which is then deleted), tocheck for the known sections directly, and
tighten up the logic in general.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 scripts/lib/kdoc/kdoc_parser.py | 31 +++++++++++--------------------
 1 file changed, 11 insertions(+), 20 deletions(-)

diff --git a/scripts/lib/kdoc/kdoc_parser.py b/scripts/lib/kdoc/kdoc_parser.py
index 7191fa94e17a..fdde14b045fe 100644
--- a/scripts/lib/kdoc/kdoc_parser.py
+++ b/scripts/lib/kdoc/kdoc_parser.py
@@ -42,9 +42,11 @@ doc_decl = doc_com + KernRe(r'(\w+)', cache=False)
 #         @{section-name}:
 # while trying to not match literal block starts like "example::"
 #
+known_section_names = 'description|context|returns?|notes?|examples?'
+known_sections = KernRe(known_section_names, flags = re.I)
 doc_sect = doc_com + \
-            KernRe(r'\s*(\@[.\w]+|\@\.\.\.|description|context|returns?|notes?|examples?)\s*:([^:].*)?$',
-                flags=re.I, cache=False)
+    KernRe(r'\s*(\@[.\w]+|\@\.\.\.|' + known_section_names + r')\s*:([^:].*)?$',
+           flags=re.I, cache=False)
 
 doc_content = doc_com_body + KernRe(r'(.*)', cache=False)
 doc_inline_start = KernRe(r'^\s*/\*\*\s*$', cache=False)
@@ -115,7 +117,6 @@ class KernelEntry:
         self.config = config
 
         self._contents = []
-        self.sectcheck = ""
         self.prototype = ""
 
         self.warnings = []
@@ -187,7 +188,6 @@ class KernelEntry:
             self.parameterdescs[name] = contents
             self.parameterdesc_start_lines[name] = self.new_start_line
 
-            self.sectcheck += name + " "
             self.new_start_line = 0
 
         else:
@@ -478,29 +478,20 @@ class KernelDoc:
                         self.push_parameter(ln, decl_type, param, dtype,
                                             arg, declaration_name)
 
-    def check_sections(self, ln, decl_name, decl_type, sectcheck):
+    def check_sections(self, ln, decl_name, decl_type):
         """
         Check for errors inside sections, emitting warnings if not found
         parameters are described.
         """
-
-        sects = sectcheck.split()
-
-        for sx in range(len(sects)):                  # pylint: disable=C0200
-            err = True
-            for param in self.entry.parameterlist:
-                if param == sects[sx]:
-                    err = False
-                    break
-
-            if err:
+        for section in self.entry.sections:
+            if section not in self.entry.parameterlist and \
+               not known_sections.search(section):
                 if decl_type == 'function':
                     dname = f"{decl_type} parameter"
                 else:
                     dname = f"{decl_type} member"
-
                 self.emit_msg(ln,
-                              f"Excess {dname} '{sects[sx]}' description in '{decl_name}'")
+                              f"Excess {dname} '{section}' description in '{decl_name}'")
 
     def check_return_section(self, ln, declaration_name, return_type):
         """
@@ -754,7 +745,7 @@ class KernelDoc:
 
         self.create_parameter_list(ln, decl_type, members, ';',
                                    declaration_name)
-        self.check_sections(ln, declaration_name, decl_type, self.entry.sectcheck)
+        self.check_sections(ln, declaration_name, decl_type)
 
         # Adjust declaration for better display
         declaration = KernRe(r'([\{;])').sub(r'\1\n', declaration)
@@ -1018,7 +1009,7 @@ class KernelDoc:
                           f"expecting prototype for {self.entry.identifier}(). Prototype was for {declaration_name}() instead")
             return
 
-        self.check_sections(ln, declaration_name, "function", self.entry.sectcheck)
+        self.check_sections(ln, declaration_name, "function")
 
         self.check_return_section(ln, declaration_name, return_type)
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 12/12] docs: kdoc: Improve the output text accumulation
  2025-07-02 22:35 [PATCH 00/12] [PATCH 00/11] Thrash up the parser/output interface Jonathan Corbet
                   ` (10 preceding siblings ...)
  2025-07-02 22:35 ` [PATCH 11/12] docs: kdoc: clean up check_sections() Jonathan Corbet
@ 2025-07-02 22:35 ` Jonathan Corbet
  2025-07-10  6:41   ` Mauro Carvalho Chehab
  2025-07-03  2:07 ` [PATCH 00/12] [PATCH 00/11] Thrash up the parser/output interface Yanteng Si
  2025-07-09 15:29 ` Jonathan Corbet
  13 siblings, 1 reply; 41+ messages in thread
From: Jonathan Corbet @ 2025-07-02 22:35 UTC (permalink / raw)
  To: linux-doc
  Cc: linux-kernel, Mauro Carvalho Chehab, Akira Yokosawa,
	Jonathan Corbet

Building strings with repeated concatenation is somewhat inefficient in
Python; it is better to make a list and glom them all together at the end.
Add a small set of methods to the OutputFormat superclass to manage the
output string, and use them throughout.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 scripts/lib/kdoc/kdoc_output.py | 185 +++++++++++++++++---------------
 1 file changed, 98 insertions(+), 87 deletions(-)

diff --git a/scripts/lib/kdoc/kdoc_output.py b/scripts/lib/kdoc/kdoc_output.py
index ea8914537ba0..d4aabdaa9c51 100644
--- a/scripts/lib/kdoc/kdoc_output.py
+++ b/scripts/lib/kdoc/kdoc_output.py
@@ -73,7 +73,19 @@ class OutputFormat:
         self.config = None
         self.no_doc_sections = False
 
-        self.data = ""
+    #
+    # Accumulation and management of the output text.
+    #
+    def reset_output(self):
+        self._output = []
+
+    def emit(self, text):
+        """Add a string to out output text"""
+        self._output.append(text)
+
+    def output(self):
+        """Obtain the accumulated output text"""
+        return ''.join(self._output)
 
     def set_config(self, config):
         """
@@ -180,32 +192,31 @@ class OutputFormat:
         Handles a single entry from kernel-doc parser
         """
 
-        self.data = ""
-
+        self.reset_output()
         dtype = args.type
 
         if dtype == "doc":
             self.out_doc(fname, name, args)
-            return self.data
+            return self.output()
 
         if not self.check_declaration(dtype, name, args):
-            return self.data
+            return self.output()
 
         if dtype == "function":
             self.out_function(fname, name, args)
-            return self.data
+            return self.output()
 
         if dtype == "enum":
             self.out_enum(fname, name, args)
-            return self.data
+            return self.output()
 
         if dtype == "typedef":
             self.out_typedef(fname, name, args)
-            return self.data
+            return self.output()
 
         if dtype in ["struct", "union"]:
             self.out_struct(fname, name, args)
-            return self.data
+            return self.output()
 
         # Warn if some type requires an output logic
         self.config.log.warning("doesn't now how to output '%s' block",
@@ -274,7 +285,7 @@ class RestFormat(OutputFormat):
 
         if self.enable_lineno and ln is not None:
             ln += 1
-            self.data += f".. LINENO {ln}\n"
+            self.emit(f".. LINENO {ln}\n")
 
     def output_highlight(self, args):
         """
@@ -326,7 +337,7 @@ class RestFormat(OutputFormat):
 
         # Print the output with the line prefix
         for line in output.strip("\n").split("\n"):
-            self.data += self.lineprefix + line + "\n"
+            self.emit(self.lineprefix + line + "\n")
 
     def out_section(self, args, out_docblock=False):
         """
@@ -343,15 +354,15 @@ class RestFormat(OutputFormat):
 
             if out_docblock:
                 if not self.out_mode == self.OUTPUT_INCLUDE:
-                    self.data += f".. _{section}:\n\n"
-                    self.data += f'{self.lineprefix}**{section}**\n\n'
+                    self.emit(f".. _{section}:\n\n")
+                    self.emit(f'{self.lineprefix}**{section}**\n\n')
             else:
-                self.data += f'{self.lineprefix}**{section}**\n\n'
+                self.emit(f'{self.lineprefix}**{section}**\n\n')
 
             self.print_lineno(args.section_start_lines.get(section, 0))
             self.output_highlight(text)
-            self.data += "\n"
-        self.data += "\n"
+            self.emit("\n")
+        self.emit("\n")
 
     def out_doc(self, fname, name, args):
         if not self.check_doc(name, args):
@@ -389,41 +400,41 @@ class RestFormat(OutputFormat):
 
         self.print_lineno(ln)
         if args.get('typedef') or not args.get('functiontype'):
-            self.data += f".. c:macro:: {name}\n\n"
+            self.emit(f".. c:macro:: {name}\n\n")
 
             if args.get('typedef'):
-                self.data += "   **Typedef**: "
+                self.emit("   **Typedef**: ")
                 self.lineprefix = ""
                 self.output_highlight(args.get('purpose', ""))
-                self.data += "\n\n**Syntax**\n\n"
-                self.data += f"  ``{signature}``\n\n"
+                self.emit("\n\n**Syntax**\n\n")
+                self.emit(f"  ``{signature}``\n\n")
             else:
-                self.data += f"``{signature}``\n\n"
+                self.emit(f"``{signature}``\n\n")
         else:
-            self.data += f".. c:function:: {signature}\n\n"
+            self.emit(f".. c:function:: {signature}\n\n")
 
         if not args.get('typedef'):
             self.print_lineno(ln)
             self.lineprefix = "   "
             self.output_highlight(args.get('purpose', ""))
-            self.data += "\n"
+            self.emit("\n")
 
         # Put descriptive text into a container (HTML <div>) to help set
         # function prototypes apart
         self.lineprefix = "  "
 
         if args.parameterlist:
-            self.data += ".. container:: kernelindent\n\n"
-            self.data += f"{self.lineprefix}**Parameters**\n\n"
+            self.emit(".. container:: kernelindent\n\n")
+            self.emit(f"{self.lineprefix}**Parameters**\n\n")
 
         for parameter in args.parameterlist:
             parameter_name = KernRe(r'\[.*').sub('', parameter)
             dtype = args.parametertypes.get(parameter, "")
 
             if dtype:
-                self.data += f"{self.lineprefix}``{dtype}``\n"
+                self.emit(f"{self.lineprefix}``{dtype}``\n")
             else:
-                self.data += f"{self.lineprefix}``{parameter}``\n"
+                self.emit(f"{self.lineprefix}``{parameter}``\n")
 
             self.print_lineno(args.parameterdesc_start_lines.get(parameter_name, 0))
 
@@ -432,9 +443,9 @@ class RestFormat(OutputFormat):
                args.parameterdescs[parameter_name] != KernelDoc.undescribed:
 
                 self.output_highlight(args.parameterdescs[parameter_name])
-                self.data += "\n"
+                self.emit("\n")
             else:
-                self.data += f"{self.lineprefix}*undescribed*\n\n"
+                self.emit(f"{self.lineprefix}*undescribed*\n\n")
             self.lineprefix = "  "
 
         self.out_section(args)
@@ -445,26 +456,26 @@ class RestFormat(OutputFormat):
         oldprefix = self.lineprefix
         ln = args.declaration_start_line
 
-        self.data += f"\n\n.. c:enum:: {name}\n\n"
+        self.emit(f"\n\n.. c:enum:: {name}\n\n")
 
         self.print_lineno(ln)
         self.lineprefix = "  "
         self.output_highlight(args.get('purpose', ''))
-        self.data += "\n"
+        self.emit("\n")
 
-        self.data += ".. container:: kernelindent\n\n"
+        self.emit(".. container:: kernelindent\n\n")
         outer = self.lineprefix + "  "
         self.lineprefix = outer + "  "
-        self.data += f"{outer}**Constants**\n\n"
+        self.emit(f"{outer}**Constants**\n\n")
 
         for parameter in args.parameterlist:
-            self.data += f"{outer}``{parameter}``\n"
+            self.emit(f"{outer}``{parameter}``\n")
 
             if args.parameterdescs.get(parameter, '') != KernelDoc.undescribed:
                 self.output_highlight(args.parameterdescs[parameter])
             else:
-                self.data += f"{self.lineprefix}*undescribed*\n\n"
-            self.data += "\n"
+                self.emit(f"{self.lineprefix}*undescribed*\n\n")
+            self.emit("\n")
 
         self.lineprefix = oldprefix
         self.out_section(args)
@@ -474,14 +485,14 @@ class RestFormat(OutputFormat):
         oldprefix = self.lineprefix
         ln = args.declaration_start_line
 
-        self.data += f"\n\n.. c:type:: {name}\n\n"
+        self.emit(f"\n\n.. c:type:: {name}\n\n")
 
         self.print_lineno(ln)
         self.lineprefix = "   "
 
         self.output_highlight(args.get('purpose', ''))
 
-        self.data += "\n"
+        self.emit("\n")
 
         self.lineprefix = oldprefix
         self.out_section(args)
@@ -493,7 +504,7 @@ class RestFormat(OutputFormat):
         dtype = args.type
         ln = args.declaration_start_line
 
-        self.data += f"\n\n.. c:{dtype}:: {name}\n\n"
+        self.emit(f"\n\n.. c:{dtype}:: {name}\n\n")
 
         self.print_lineno(ln)
 
@@ -501,20 +512,20 @@ class RestFormat(OutputFormat):
         self.lineprefix += "  "
 
         self.output_highlight(purpose)
-        self.data += "\n"
+        self.emit("\n")
 
-        self.data += ".. container:: kernelindent\n\n"
-        self.data += f"{self.lineprefix}**Definition**::\n\n"
+        self.emit(".. container:: kernelindent\n\n")
+        self.emit(f"{self.lineprefix}**Definition**::\n\n")
 
         self.lineprefix = self.lineprefix + "  "
 
         declaration = declaration.replace("\t", self.lineprefix)
 
-        self.data += f"{self.lineprefix}{dtype} {name}" + ' {' + "\n"
-        self.data += f"{declaration}{self.lineprefix}" + "};\n\n"
+        self.emit(f"{self.lineprefix}{dtype} {name}" + ' {' + "\n")
+        self.emit(f"{declaration}{self.lineprefix}" + "};\n\n")
 
         self.lineprefix = "  "
-        self.data += f"{self.lineprefix}**Members**\n\n"
+        self.emit(f"{self.lineprefix}**Members**\n\n")
         for parameter in args.parameterlist:
             if not parameter or parameter.startswith("#"):
                 continue
@@ -526,15 +537,15 @@ class RestFormat(OutputFormat):
 
             self.print_lineno(args.parameterdesc_start_lines.get(parameter_name, 0))
 
-            self.data += f"{self.lineprefix}``{parameter}``\n"
+            self.emit(f"{self.lineprefix}``{parameter}``\n")
 
             self.lineprefix = "    "
             self.output_highlight(args.parameterdescs[parameter_name])
             self.lineprefix = "  "
 
-            self.data += "\n"
+            self.emit("\n")
 
-        self.data += "\n"
+        self.emit("\n")
 
         self.lineprefix = oldprefix
         self.out_section(args)
@@ -610,33 +621,33 @@ class ManFormat(OutputFormat):
                 continue
 
             if line[0] == ".":
-                self.data += "\\&" + line + "\n"
+                self.emit("\\&" + line + "\n")
             else:
-                self.data += line + "\n"
+                self.emit(line + "\n")
 
     def out_doc(self, fname, name, args):
         if not self.check_doc(name, args):
             return
 
-        self.data += f'.TH "{self.modulename}" 9 "{self.modulename}" "{self.man_date}" "API Manual" LINUX' + "\n"
+        self.emit(f'.TH "{self.modulename}" 9 "{self.modulename}" "{self.man_date}" "API Manual" LINUX' + "\n")
 
         for section, text in args.sections.items():
-            self.data += f'.SH "{section}"' + "\n"
+            self.emit(f'.SH "{section}"' + "\n")
             self.output_highlight(text)
 
     def out_function(self, fname, name, args):
         """output function in man"""
 
-        self.data += f'.TH "{name}" 9 "{name}" "{self.man_date}" "Kernel Hacker\'s Manual" LINUX' + "\n"
+        self.emit(f'.TH "{name}" 9 "{name}" "{self.man_date}" "Kernel Hacker\'s Manual" LINUX' + "\n")
 
-        self.data += ".SH NAME\n"
-        self.data += f"{name} \\- {args['purpose']}\n"
+        self.emit(".SH NAME\n")
+        self.emit(f"{name} \\- {args['purpose']}\n")
 
-        self.data += ".SH SYNOPSIS\n"
+        self.emit(".SH SYNOPSIS\n")
         if args.get('functiontype', ''):
-            self.data += f'.B "{args["functiontype"]}" {name}' + "\n"
+            self.emit(f'.B "{args["functiontype"]}" {name}' + "\n")
         else:
-            self.data += f'.B "{name}' + "\n"
+            self.emit(f'.B "{name}' + "\n")
 
         count = 0
         parenth = "("
@@ -649,68 +660,68 @@ class ManFormat(OutputFormat):
             dtype = args.parametertypes.get(parameter, "")
             if function_pointer.match(dtype):
                 # Pointer-to-function
-                self.data += f'".BI "{parenth}{function_pointer.group(1)}" " ") ({function_pointer.group(2)}){post}"' + "\n"
+                self.emit(f'".BI "{parenth}{function_pointer.group(1)}" " ") ({function_pointer.group(2)}){post}"' + "\n")
             else:
                 dtype = KernRe(r'([^\*])$').sub(r'\1 ', dtype)
 
-                self.data += f'.BI "{parenth}{dtype}"  "{post}"' + "\n"
+                self.emit(f'.BI "{parenth}{dtype}"  "{post}"' + "\n")
             count += 1
             parenth = ""
 
         if args.parameterlist:
-            self.data += ".SH ARGUMENTS\n"
+            self.emit(".SH ARGUMENTS\n")
 
         for parameter in args.parameterlist:
             parameter_name = re.sub(r'\[.*', '', parameter)
 
-            self.data += f'.IP "{parameter}" 12' + "\n"
+            self.emit(f'.IP "{parameter}" 12' + "\n")
             self.output_highlight(args.parameterdescs.get(parameter_name, ""))
 
         for section, text in args.sections.items():
-            self.data += f'.SH "{section.upper()}"' + "\n"
+            self.emit(f'.SH "{section.upper()}"' + "\n")
             self.output_highlight(text)
 
     def out_enum(self, fname, name, args):
-        self.data += f'.TH "{self.modulename}" 9 "enum {name}" "{self.man_date}" "API Manual" LINUX' + "\n"
+        self.emit(f'.TH "{self.modulename}" 9 "enum {name}" "{self.man_date}" "API Manual" LINUX' + "\n")
 
-        self.data += ".SH NAME\n"
-        self.data += f"enum {name} \\- {args['purpose']}\n"
+        self.emit(".SH NAME\n")
+        self.emit(f"enum {name} \\- {args['purpose']}\n")
 
-        self.data += ".SH SYNOPSIS\n"
-        self.data += f"enum {name}" + " {\n"
+        self.emit(".SH SYNOPSIS\n")
+        self.emit(f"enum {name}" + " {\n")
 
         count = 0
         for parameter in args.parameterlist:
-            self.data += f'.br\n.BI "    {parameter}"' + "\n"
+            self.emit(f'.br\n.BI "    {parameter}"' + "\n")
             if count == len(args.parameterlist) - 1:
-                self.data += "\n};\n"
+                self.emit("\n};\n")
             else:
-                self.data += ", \n.br\n"
+                self.emit(", \n.br\n")
 
             count += 1
 
-        self.data += ".SH Constants\n"
+        self.emit(".SH Constants\n")
 
         for parameter in args.parameterlist:
             parameter_name = KernRe(r'\[.*').sub('', parameter)
-            self.data += f'.IP "{parameter}" 12' + "\n"
+            self.emit(f'.IP "{parameter}" 12' + "\n")
             self.output_highlight(args.parameterdescs.get(parameter_name, ""))
 
         for section, text in args.sections.items():
-            self.data += f'.SH "{section}"' + "\n"
+            self.emit(f'.SH "{section}"' + "\n")
             self.output_highlight(text)
 
     def out_typedef(self, fname, name, args):
         module = self.modulename
         purpose = args.get('purpose')
 
-        self.data += f'.TH "{module}" 9 "{name}" "{self.man_date}" "API Manual" LINUX' + "\n"
+        self.emit(f'.TH "{module}" 9 "{name}" "{self.man_date}" "API Manual" LINUX' + "\n")
 
-        self.data += ".SH NAME\n"
-        self.data += f"typedef {name} \\- {purpose}\n"
+        self.emit(".SH NAME\n")
+        self.emit(f"typedef {name} \\- {purpose}\n")
 
         for section, text in args.sections.items():
-            self.data += f'.SH "{section}"' + "\n"
+            self.emit(f'.SH "{section}"' + "\n")
             self.output_highlight(text)
 
     def out_struct(self, fname, name, args):
@@ -718,20 +729,20 @@ class ManFormat(OutputFormat):
         purpose = args.get('purpose')
         definition = args.get('definition')
 
-        self.data += f'.TH "{module}" 9 "{args.type} {name}" "{self.man_date}" "API Manual" LINUX' + "\n"
+        self.emit(f'.TH "{module}" 9 "{args.type} {name}" "{self.man_date}" "API Manual" LINUX' + "\n")
 
-        self.data += ".SH NAME\n"
-        self.data += f"{args.type} {name} \\- {purpose}\n"
+        self.emit(".SH NAME\n")
+        self.emit(f"{args.type} {name} \\- {purpose}\n")
 
         # Replace tabs with two spaces and handle newlines
         declaration = definition.replace("\t", "  ")
         declaration = KernRe(r"\n").sub('"\n.br\n.BI "', declaration)
 
-        self.data += ".SH SYNOPSIS\n"
-        self.data += f"{args.type} {name} " + "{" + "\n.br\n"
-        self.data += f'.BI "{declaration}\n' + "};\n.br\n\n"
+        self.emit(".SH SYNOPSIS\n")
+        self.emit(f"{args.type} {name} " + "{" + "\n.br\n")
+        self.emit(f'.BI "{declaration}\n' + "};\n.br\n\n")
 
-        self.data += ".SH Members\n"
+        self.emit(".SH Members\n")
         for parameter in args.parameterlist:
             if parameter.startswith("#"):
                 continue
@@ -741,9 +752,9 @@ class ManFormat(OutputFormat):
             if args.parameterdescs.get(parameter_name) == KernelDoc.undescribed:
                 continue
 
-            self.data += f'.IP "{parameter}" 12' + "\n"
+            self.emit(f'.IP "{parameter}" 12' + "\n")
             self.output_highlight(args.parameterdescs.get(parameter_name))
 
         for section, text in args.sections.items():
-            self.data += f'.SH "{section}"' + "\n"
+            self.emit(f'.SH "{section}"' + "\n")
             self.output_highlight(text)
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH 00/12] [PATCH 00/11] Thrash up the parser/output interface
  2025-07-02 22:35 [PATCH 00/12] [PATCH 00/11] Thrash up the parser/output interface Jonathan Corbet
                   ` (11 preceding siblings ...)
  2025-07-02 22:35 ` [PATCH 12/12] docs: kdoc: Improve the output text accumulation Jonathan Corbet
@ 2025-07-03  2:07 ` Yanteng Si
  2025-07-09 15:29 ` Jonathan Corbet
  13 siblings, 0 replies; 41+ messages in thread
From: Yanteng Si @ 2025-07-03  2:07 UTC (permalink / raw)
  To: Jonathan Corbet, linux-doc
  Cc: linux-kernel, Mauro Carvalho Chehab, Akira Yokosawa


在 7/3/25 6:35 AM, Jonathan Corbet 写道:
> [I'll slow down soon, honest - real work is piling up...]
>
> The kerneldoc parsing phase gathers all of the information about the
> declarations of interest, then passes it through to the output phase as a
> dict that is an unstructured blob of information; this organization has its
> origins in the Perl version of the program.  It results in an interface
> that is difficult to reason about, dozen-parameter function calls, and
> other ills.
>
> Introduce a new class (KdocItem) to carry this information between the
> parser and the output modules, and, step by step, modify the system to use
> this class in a more structured way.  This could be taken further by
> creating a subclass of KdocItem for each declaration type (function,
> struct, ...), but that is probably more structure than we need.
>
> As a final step, add some structure for the accumulation of the output
> text.
>
> The result is (I hope) clearer code, the removal of a bunch of boilerplate,
> and no changes to the generated output.
>
> Jonathan Corbet (12):
>    docs: kdoc; Add a rudimentary class to represent output items
>    docs: kdoc: simplify the output-item passing
>    docs: kdoc: drop "sectionlist"
>    docs: kdoc: Centralize handling of the item section list
>    docs: kdoc: remove the "struct_actual" machinery
>    docs: kdoc: use self.entry.parameterlist directly in check_sections()
>    docs: kdoc: Coalesce parameter-list handling
>    docs: kdoc: Regularize the use of the declaration name
>    docs: kdoc: straighten up dump_declaration()
>    docs: kdoc: directly access the always-there KdocItem fields
>    docs: kdoc: clean up check_sections()
>    docs: kdoc: Improve the output text accumulation

Reviewed-by: Yanteng Si <siyanteng@cqsoftware.com.cn>


Thanks,

Yanteng



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 00/12] [PATCH 00/11] Thrash up the parser/output interface
  2025-07-02 22:35 [PATCH 00/12] [PATCH 00/11] Thrash up the parser/output interface Jonathan Corbet
                   ` (12 preceding siblings ...)
  2025-07-03  2:07 ` [PATCH 00/12] [PATCH 00/11] Thrash up the parser/output interface Yanteng Si
@ 2025-07-09 15:29 ` Jonathan Corbet
  2025-07-09 16:21   ` Mauro Carvalho Chehab
  13 siblings, 1 reply; 41+ messages in thread
From: Jonathan Corbet @ 2025-07-09 15:29 UTC (permalink / raw)
  To: linux-doc; +Cc: linux-kernel, Mauro Carvalho Chehab, Akira Yokosawa

Jonathan Corbet <corbet@lwn.net> writes:

> [I'll slow down soon, honest - real work is piling up...]
>
> The kerneldoc parsing phase gathers all of the information about the
> declarations of interest, then passes it through to the output phase as a
> dict that is an unstructured blob of information; this organization has its
> origins in the Perl version of the program.  It results in an interface
> that is difficult to reason about, dozen-parameter function calls, and
> other ills.
>
> Introduce a new class (KdocItem) to carry this information between the
> parser and the output modules, and, step by step, modify the system to use
> this class in a more structured way.  This could be taken further by
> creating a subclass of KdocItem for each declaration type (function,
> struct, ...), but that is probably more structure than we need.
>
> As a final step, add some structure for the accumulation of the output
> text.
>
> The result is (I hope) clearer code, the removal of a bunch of boilerplate,
> and no changes to the generated output.

Has anybody else had a chance to look at this?  Or should I assume it's
perfect? :)

Thanks,

jon

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 00/12] [PATCH 00/11] Thrash up the parser/output interface
  2025-07-09 15:29 ` Jonathan Corbet
@ 2025-07-09 16:21   ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 41+ messages in thread
From: Mauro Carvalho Chehab @ 2025-07-09 16:21 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: linux-doc, linux-kernel, Akira Yokosawa

Em Wed, 09 Jul 2025 09:29:28 -0600
Jonathan Corbet <corbet@lwn.net> escreveu:

> Jonathan Corbet <corbet@lwn.net> writes:
> 
> > [I'll slow down soon, honest - real work is piling up...]
> >
> > The kerneldoc parsing phase gathers all of the information about the
> > declarations of interest, then passes it through to the output phase as a
> > dict that is an unstructured blob of information; this organization has its
> > origins in the Perl version of the program.  It results in an interface
> > that is difficult to reason about, dozen-parameter function calls, and
> > other ills.
> >
> > Introduce a new class (KdocItem) to carry this information between the
> > parser and the output modules, and, step by step, modify the system to use
> > this class in a more structured way.  This could be taken further by
> > creating a subclass of KdocItem for each declaration type (function,
> > struct, ...), but that is probably more structure than we need.
> >
> > As a final step, add some structure for the accumulation of the output
> > text.
> >
> > The result is (I hope) clearer code, the removal of a bunch of boilerplate,
> > and no changes to the generated output.  
> 
> Has anybody else had a chance to look at this?  Or should I assume it's
> perfect? :)

I didn't look on it yet. I'll try to look it along the week.

Regards,
Mauro

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 03/12] docs: kdoc: drop "sectionlist"
  2025-07-02 22:35 ` [PATCH 03/12] docs: kdoc: drop "sectionlist" Jonathan Corbet
@ 2025-07-09 16:27   ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 41+ messages in thread
From: Mauro Carvalho Chehab @ 2025-07-09 16:27 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: linux-doc, linux-kernel, Akira Yokosawa

Em Wed,  2 Jul 2025 16:35:15 -0600
Jonathan Corbet <corbet@lwn.net> escreveu:

> Python dicts (as of 3.7) are guaranteed to remember the insertion order of
> items, so we do not need a separate list for that purpose.  Drop the
> per-entry sectionlist variable and just rely on native dict ordering.

I avoided doing such assumption, as, when I wrote, our minimal version
were below that ;-)

Sounds OK to me, but please add a notice somewhere, to let it clear or
raise an error if < 3.7, as this is the type of API changes that
scares me a lot, as there's no way to detect that the script relies
on Python dict >= 3.7.


> 
> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
> ---
>  scripts/lib/kdoc/kdoc_output.py | 18 ++++++------------
>  scripts/lib/kdoc/kdoc_parser.py | 13 +------------
>  2 files changed, 7 insertions(+), 24 deletions(-)
> 
> diff --git a/scripts/lib/kdoc/kdoc_output.py b/scripts/lib/kdoc/kdoc_output.py
> index 86102e628d91..4895c80e4b81 100644
> --- a/scripts/lib/kdoc/kdoc_output.py
> +++ b/scripts/lib/kdoc/kdoc_output.py
> @@ -339,11 +339,10 @@ class RestFormat(OutputFormat):
>          tends to duplicate a header already in the template file.
>          """
>  
> -        sectionlist = args.get('sectionlist', [])
>          sections = args.get('sections', {})
>          section_start_lines = args.get('section_start_lines', {})
>  
> -        for section in sectionlist:
> +        for section in sections:
>              # Skip sections that are in the nosymbol_table
>              if section in self.nosymbol:
>                  continue
> @@ -636,7 +635,6 @@ class ManFormat(OutputFormat):
>                  self.data += line + "\n"
>  
>      def out_doc(self, fname, name, args):
> -        sectionlist = args.get('sectionlist', [])
>          sections = args.get('sections', {})
>  
>          if not self.check_doc(name, args):
> @@ -644,7 +642,7 @@ class ManFormat(OutputFormat):
>  
>          self.data += f'.TH "{self.modulename}" 9 "{self.modulename}" "{self.man_date}" "API Manual" LINUX' + "\n"
>  
> -        for section in sectionlist:
> +        for section in sections:
>              self.data += f'.SH "{section}"' + "\n"
>              self.output_highlight(sections.get(section))
>  
> @@ -653,7 +651,6 @@ class ManFormat(OutputFormat):
>  
>          parameterlist = args.get('parameterlist', [])
>          parameterdescs = args.get('parameterdescs', {})
> -        sectionlist = args.get('sectionlist', [])
>          sections = args.get('sections', {})
>  
>          self.data += f'.TH "{args["function"]}" 9 "{args["function"]}" "{self.man_date}" "Kernel Hacker\'s Manual" LINUX' + "\n"
> @@ -695,7 +692,7 @@ class ManFormat(OutputFormat):
>              self.data += f'.IP "{parameter}" 12' + "\n"
>              self.output_highlight(parameterdescs.get(parameter_name, ""))
>  
> -        for section in sectionlist:
> +        for section in sections:
>              self.data += f'.SH "{section.upper()}"' + "\n"
>              self.output_highlight(sections[section])
>  
> @@ -703,7 +700,6 @@ class ManFormat(OutputFormat):
>  
>          name = args.get('enum', '')
>          parameterlist = args.get('parameterlist', [])
> -        sectionlist = args.get('sectionlist', [])
>          sections = args.get('sections', {})
>  
>          self.data += f'.TH "{self.modulename}" 9 "enum {args["enum"]}" "{self.man_date}" "API Manual" LINUX' + "\n"
> @@ -731,7 +727,7 @@ class ManFormat(OutputFormat):
>              self.data += f'.IP "{parameter}" 12' + "\n"
>              self.output_highlight(args['parameterdescs'].get(parameter_name, ""))
>  
> -        for section in sectionlist:
> +        for section in sections:
>              self.data += f'.SH "{section}"' + "\n"
>              self.output_highlight(sections[section])
>  
> @@ -739,7 +735,6 @@ class ManFormat(OutputFormat):
>          module = self.modulename
>          typedef = args.get('typedef')
>          purpose = args.get('purpose')
> -        sectionlist = args.get('sectionlist', [])
>          sections = args.get('sections', {})
>  
>          self.data += f'.TH "{module}" 9 "{typedef}" "{self.man_date}" "API Manual" LINUX' + "\n"
> @@ -747,7 +742,7 @@ class ManFormat(OutputFormat):
>          self.data += ".SH NAME\n"
>          self.data += f"typedef {typedef} \\- {purpose}\n"
>  
> -        for section in sectionlist:
> +        for section in sections:
>              self.data += f'.SH "{section}"' + "\n"
>              self.output_highlight(sections.get(section))
>  
> @@ -757,7 +752,6 @@ class ManFormat(OutputFormat):
>          struct_name = args.get('struct')
>          purpose = args.get('purpose')
>          definition = args.get('definition')
> -        sectionlist = args.get('sectionlist', [])
>          parameterlist = args.get('parameterlist', [])
>          sections = args.get('sections', {})
>          parameterdescs = args.get('parameterdescs', {})
> @@ -788,6 +782,6 @@ class ManFormat(OutputFormat):
>              self.data += f'.IP "{parameter}" 12' + "\n"
>              self.output_highlight(parameterdescs.get(parameter_name))
>  
> -        for section in sectionlist:
> +        for section in sections:
>              self.data += f'.SH "{section}"' + "\n"
>              self.output_highlight(sections.get(section))
> diff --git a/scripts/lib/kdoc/kdoc_parser.py b/scripts/lib/kdoc/kdoc_parser.py
> index 97380ff30a0d..2e00c8b3a5f2 100644
> --- a/scripts/lib/kdoc/kdoc_parser.py
> +++ b/scripts/lib/kdoc/kdoc_parser.py
> @@ -127,7 +127,6 @@ class KernelEntry:
>          self.parameterdesc_start_lines = {}
>  
>          self.section_start_lines = {}
> -        self.sectionlist = []
>          self.sections = {}
>  
>          self.anon_struct_union = False
> @@ -202,7 +201,6 @@ class KernelEntry:
>                  self.sections[name] += '\n' + contents
>              else:
>                  self.sections[name] = contents
> -                self.sectionlist.append(name)
>                  self.section_start_lines[name] = self.new_start_line
>                  self.new_start_line = 0
>  
> @@ -275,14 +273,12 @@ class KernelDoc:
>          item.warnings = self.entry.warnings
>  
>          sections = item.get('sections', {})
> -        sectionlist = item.get('sectionlist', [])
>  
>          # Drop empty sections
>          # TODO: improve empty sections logic to emit warnings
>          for section in ["Description", "Return"]:
> -            if section in sectionlist and not sections[section].rstrip():
> +            if section in sections and not sections[section].rstrip():
>                  del sections[section]
> -                sectionlist.remove(section)
>  
>          self.entries.append(item)
>  
> @@ -828,7 +824,6 @@ class KernelDoc:
>                                  parameterdescs=self.entry.parameterdescs,
>                                  parametertypes=self.entry.parametertypes,
>                                  parameterdesc_start_lines=self.entry.parameterdesc_start_lines,
> -                                sectionlist=self.entry.sectionlist,
>                                  sections=self.entry.sections,
>                                  section_start_lines=self.entry.section_start_lines,
>                                  purpose=self.entry.declaration_purpose)
> @@ -913,7 +908,6 @@ class KernelDoc:
>                                  parameterlist=self.entry.parameterlist,
>                                  parameterdescs=self.entry.parameterdescs,
>                                  parameterdesc_start_lines=self.entry.parameterdesc_start_lines,
> -                                sectionlist=self.entry.sectionlist,
>                                  sections=self.entry.sections,
>                                  section_start_lines=self.entry.section_start_lines,
>                                  purpose=self.entry.declaration_purpose)
> @@ -1085,7 +1079,6 @@ class KernelDoc:
>                                      parameterdescs=self.entry.parameterdescs,
>                                      parametertypes=self.entry.parametertypes,
>                                      parameterdesc_start_lines=self.entry.parameterdesc_start_lines,
> -                                    sectionlist=self.entry.sectionlist,
>                                      sections=self.entry.sections,
>                                      section_start_lines=self.entry.section_start_lines,
>                                      purpose=self.entry.declaration_purpose,
> @@ -1099,7 +1092,6 @@ class KernelDoc:
>                                      parameterdescs=self.entry.parameterdescs,
>                                      parametertypes=self.entry.parametertypes,
>                                      parameterdesc_start_lines=self.entry.parameterdesc_start_lines,
> -                                    sectionlist=self.entry.sectionlist,
>                                      sections=self.entry.sections,
>                                      section_start_lines=self.entry.section_start_lines,
>                                      purpose=self.entry.declaration_purpose,
> @@ -1145,7 +1137,6 @@ class KernelDoc:
>                                      parameterdescs=self.entry.parameterdescs,
>                                      parametertypes=self.entry.parametertypes,
>                                      parameterdesc_start_lines=self.entry.parameterdesc_start_lines,
> -                                    sectionlist=self.entry.sectionlist,
>                                      sections=self.entry.sections,
>                                      section_start_lines=self.entry.section_start_lines,
>                                      purpose=self.entry.declaration_purpose)
> @@ -1168,7 +1159,6 @@ class KernelDoc:
>  
>              self.output_declaration('typedef', declaration_name,
>                                      typedef=declaration_name,
> -                                    sectionlist=self.entry.sectionlist,
>                                      sections=self.entry.sections,
>                                      section_start_lines=self.entry.section_start_lines,
>                                      purpose=self.entry.declaration_purpose)
> @@ -1653,7 +1643,6 @@ class KernelDoc:
>          if doc_end.search(line):
>              self.dump_section()
>              self.output_declaration("doc", self.entry.identifier,
> -                                    sectionlist=self.entry.sectionlist,
>                                      sections=self.entry.sections,
>                                      section_start_lines=self.entry.section_start_lines)
>              self.reset_state(ln)



Thanks,
Mauro

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 01/12] docs: kdoc; Add a rudimentary class to represent output items
  2025-07-02 22:35 ` [PATCH 01/12] docs: kdoc; Add a rudimentary class to represent output items Jonathan Corbet
@ 2025-07-10  5:28   ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 41+ messages in thread
From: Mauro Carvalho Chehab @ 2025-07-10  5:28 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: linux-doc, linux-kernel, Akira Yokosawa

Em Wed,  2 Jul 2025 16:35:13 -0600
Jonathan Corbet <corbet@lwn.net> escreveu:

> This class is intended to replace the unstructured dict used to accumulate
> an entry to pass to an output module.  For now, it remains unstructured,
> but it works well enough that the output classes don't notice the
> difference.
> 
> Signed-off-by: Jonathan Corbet <corbet@lwn.net>

Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

> ---
>  scripts/lib/kdoc/kdoc_item.py   | 26 ++++++++++++++++++++++++++
>  scripts/lib/kdoc/kdoc_parser.py | 30 +++++++++---------------------
>  2 files changed, 35 insertions(+), 21 deletions(-)
>  create mode 100644 scripts/lib/kdoc/kdoc_item.py
> 
> diff --git a/scripts/lib/kdoc/kdoc_item.py b/scripts/lib/kdoc/kdoc_item.py
> new file mode 100644
> index 000000000000..add2cc772fec
> --- /dev/null
> +++ b/scripts/lib/kdoc/kdoc_item.py
> @@ -0,0 +1,26 @@
> +# SPDX-License-Identifier: GPL-2.0
> +#
> +# A class that will, eventually, encapsulate all of the parsed data that we
> +# then pass into the output modules.
> +#
> +
> +class KdocItem:
> +    def __init__(self, name, type, start_line, **other_stuff):
> +        self.name = name
> +        self.type = type
> +        self.declaration_start_line = start_line
> +        #
> +        # Just save everything else into our own dict so that the output
> +        # side can grab it directly as before.  As we move things into more
> +        # structured data, this will, hopefully, fade away.
> +        #
> +        self.other_stuff = other_stuff
> +
> +    def get(self, key, default = None):
> +        ret = self.other_stuff.get(key, default)
> +        if ret == default:
> +            return self.__dict__.get(key, default)
> +        return ret
> +
> +    def __getitem__(self, key):
> +        return self.get(key)
> diff --git a/scripts/lib/kdoc/kdoc_parser.py b/scripts/lib/kdoc/kdoc_parser.py
> index 831f061f61b8..a5a59b97a444 100644
> --- a/scripts/lib/kdoc/kdoc_parser.py
> +++ b/scripts/lib/kdoc/kdoc_parser.py
> @@ -16,7 +16,7 @@ import re
>  from pprint import pformat
>  
>  from kdoc_re import NestedMatch, KernRe
> -
> +from kdoc_item import KdocItem
>  
>  #
>  # Regular expressions used to parse kernel-doc markups at KernelDoc class.
> @@ -271,32 +271,20 @@ class KernelDoc:
>          The actual output and output filters will be handled elsewhere
>          """
>  
> -        # The implementation here is different than the original kernel-doc:
> -        # instead of checking for output filters or actually output anything,
> -        # it just stores the declaration content at self.entries, as the
> -        # output will happen on a separate class.
> -        #
> -        # For now, we're keeping the same name of the function just to make
> -        # easier to compare the source code of both scripts
> -
> -        args["declaration_start_line"] = self.entry.declaration_start_line
> -        args["type"] = dtype
> -        args["warnings"] = self.entry.warnings
> -
> -        # TODO: use colletions.OrderedDict to remove sectionlist
> +        item = KdocItem(name, dtype, self.entry.declaration_start_line, **args)
> +        item.warnings = self.entry.warnings
>  
> -        sections = args.get('sections', {})
> -        sectionlist = args.get('sectionlist', [])
> +        sections = item.get('sections', {})
> +        sectionlist = item.get('sectionlist', [])
>  
>          # Drop empty sections
>          # TODO: improve empty sections logic to emit warnings
>          for section in ["Description", "Return"]:
> -            if section in sectionlist:
> -                if not sections[section].rstrip():
> -                    del sections[section]
> -                    sectionlist.remove(section)
> +            if section in sectionlist and not sections[section].rstrip():
> +                del sections[section]
> +                sectionlist.remove(section)
>  
> -        self.entries.append((name, args))
> +        self.entries.append((name, item))
>  
>          self.config.log.debug("Output: %s:%s = %s", dtype, name, pformat(args))
>  



Thanks,
Mauro

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 02/12] docs: kdoc: simplify the output-item passing
  2025-07-02 22:35 ` [PATCH 02/12] docs: kdoc: simplify the output-item passing Jonathan Corbet
@ 2025-07-10  5:29   ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 41+ messages in thread
From: Mauro Carvalho Chehab @ 2025-07-10  5:29 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: linux-doc, linux-kernel, Akira Yokosawa

Em Wed,  2 Jul 2025 16:35:14 -0600
Jonathan Corbet <corbet@lwn.net> escreveu:

> Since our output items contain their name, we don't need to pass it
> separately.
> 
> Signed-off-by: Jonathan Corbet <corbet@lwn.net>

Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

> ---
>  scripts/lib/kdoc/kdoc_files.py  | 4 ++--
>  scripts/lib/kdoc/kdoc_parser.py | 2 +-
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/scripts/lib/kdoc/kdoc_files.py b/scripts/lib/kdoc/kdoc_files.py
> index 9be4a64df71d..9e09b45b02fa 100644
> --- a/scripts/lib/kdoc/kdoc_files.py
> +++ b/scripts/lib/kdoc/kdoc_files.py
> @@ -275,8 +275,8 @@ class KernelFiles():
>                  self.config.log.warning("No kernel-doc for file %s", fname)
>                  continue
>  
> -            for name, arg in self.results[fname]:
> -                m = self.out_msg(fname, name, arg)
> +            for arg in self.results[fname]:
> +                m = self.out_msg(fname, arg.name, arg)
>  
>                  if m is None:
>                      ln = arg.get("ln", 0)
> diff --git a/scripts/lib/kdoc/kdoc_parser.py b/scripts/lib/kdoc/kdoc_parser.py
> index a5a59b97a444..97380ff30a0d 100644
> --- a/scripts/lib/kdoc/kdoc_parser.py
> +++ b/scripts/lib/kdoc/kdoc_parser.py
> @@ -284,7 +284,7 @@ class KernelDoc:
>                  del sections[section]
>                  sectionlist.remove(section)
>  
> -        self.entries.append((name, item))
> +        self.entries.append(item)
>  
>          self.config.log.debug("Output: %s:%s = %s", dtype, name, pformat(args))
>  



Thanks,
Mauro

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 04/12] docs: kdoc: Centralize handling of the item section list
  2025-07-02 22:35 ` [PATCH 04/12] docs: kdoc: Centralize handling of the item section list Jonathan Corbet
@ 2025-07-10  5:45   ` Mauro Carvalho Chehab
  2025-07-10 13:25     ` Jonathan Corbet
  0 siblings, 1 reply; 41+ messages in thread
From: Mauro Carvalho Chehab @ 2025-07-10  5:45 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: linux-doc, linux-kernel, Akira Yokosawa

Em Wed,  2 Jul 2025 16:35:16 -0600
Jonathan Corbet <corbet@lwn.net> escreveu:

> The section list always comes directly from the under-construction entry
> and is used uniformly.  Formalize section handling in the KdocItem class,
> and have output_declaration() load the sections directly from the entry,
> eliminating a lot of duplicated, verbose parameters.
> 
> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
> ---
>  scripts/lib/kdoc/kdoc_item.py   |  8 ++++++++
>  scripts/lib/kdoc/kdoc_output.py | 36 ++++++++++++---------------------
>  scripts/lib/kdoc/kdoc_parser.py | 20 +++---------------
>  3 files changed, 24 insertions(+), 40 deletions(-)
> 
> diff --git a/scripts/lib/kdoc/kdoc_item.py b/scripts/lib/kdoc/kdoc_item.py
> index add2cc772fec..c8329019a219 100644
> --- a/scripts/lib/kdoc/kdoc_item.py
> +++ b/scripts/lib/kdoc/kdoc_item.py
> @@ -9,6 +9,7 @@ class KdocItem:
>          self.name = name
>          self.type = type
>          self.declaration_start_line = start_line
> +        self.sections = self.sections_start_lines = { }

Nitpicks:
- to make coding-style uniform, please use "{}" without spaces;
- Please place one statement per line, just like we (usually) do in Kernel. 

  In this specific case, I strongly suspect that what you coded is not
  implementing the semantics you want. See:

	1. are you creating a single dict and placing the same dict on two
	   variables?
  or:
	2. are you initializing two different vars with their own empty
	   dict?

The subsequent code assumes (2), but a quick check with python3 command
line:

	>>> a = b = {}
	>>> a["foo"] = "bar"
	>>> print(b)
	{'foo': 'bar'}

Shows that Python is doing (1) here: it basically creates a single
dict and assign pointers to it for both self.declaration_start_line
and self.sections. Clearly, this is not what we want. 

This is not a problem in practice with the current code, as 
set_sections will replace both at the same time, but if we ever
handle them using different functions, this will become a problem.

The rest of the code looks sane to me.

>          #
>          # Just save everything else into our own dict so that the output
>          # side can grab it directly as before.  As we move things into more
> @@ -24,3 +25,10 @@ class KdocItem:
>  
>      def __getitem__(self, key):
>          return self.get(key)
> +
> +    #
> +    # Tracking of section information.
> +    #
> +    def set_sections(self, sections, start_lines):
> +        self.sections = sections
> +        self.section_start_lines = start_lines
> diff --git a/scripts/lib/kdoc/kdoc_output.py b/scripts/lib/kdoc/kdoc_output.py
> index 4895c80e4b81..15cb89f91987 100644
> --- a/scripts/lib/kdoc/kdoc_output.py
> +++ b/scripts/lib/kdoc/kdoc_output.py
> @@ -338,11 +338,7 @@ class RestFormat(OutputFormat):
>          starts by putting out the name of the doc section itself, but that
>          tends to duplicate a header already in the template file.
>          """
> -
> -        sections = args.get('sections', {})
> -        section_start_lines = args.get('section_start_lines', {})
> -
> -        for section in sections:
> +        for section, text in args.sections.items():
>              # Skip sections that are in the nosymbol_table
>              if section in self.nosymbol:
>                  continue
> @@ -354,8 +350,8 @@ class RestFormat(OutputFormat):
>              else:
>                  self.data += f'{self.lineprefix}**{section}**\n\n'
>  
> -            self.print_lineno(section_start_lines.get(section, 0))
> -            self.output_highlight(sections[section])
> +            self.print_lineno(args.section_start_lines.get(section, 0))
> +            self.output_highlight(text)
>              self.data += "\n"
>          self.data += "\n"
>  
> @@ -635,23 +631,20 @@ class ManFormat(OutputFormat):
>                  self.data += line + "\n"
>  
>      def out_doc(self, fname, name, args):
> -        sections = args.get('sections', {})
> -
>          if not self.check_doc(name, args):
>              return
>  
>          self.data += f'.TH "{self.modulename}" 9 "{self.modulename}" "{self.man_date}" "API Manual" LINUX' + "\n"
>  
> -        for section in sections:
> +        for section, text in args.sections.items():
>              self.data += f'.SH "{section}"' + "\n"
> -            self.output_highlight(sections.get(section))
> +            self.output_highlight(text)
>  
>      def out_function(self, fname, name, args):
>          """output function in man"""
>  
>          parameterlist = args.get('parameterlist', [])
>          parameterdescs = args.get('parameterdescs', {})
> -        sections = args.get('sections', {})
>  
>          self.data += f'.TH "{args["function"]}" 9 "{args["function"]}" "{self.man_date}" "Kernel Hacker\'s Manual" LINUX' + "\n"
>  
> @@ -692,15 +685,14 @@ class ManFormat(OutputFormat):
>              self.data += f'.IP "{parameter}" 12' + "\n"
>              self.output_highlight(parameterdescs.get(parameter_name, ""))
>  
> -        for section in sections:
> +        for section, text in args.sections.items():
>              self.data += f'.SH "{section.upper()}"' + "\n"
> -            self.output_highlight(sections[section])
> +            self.output_highlight(text)
>  
>      def out_enum(self, fname, name, args):
>  
>          name = args.get('enum', '')
>          parameterlist = args.get('parameterlist', [])
> -        sections = args.get('sections', {})
>  
>          self.data += f'.TH "{self.modulename}" 9 "enum {args["enum"]}" "{self.man_date}" "API Manual" LINUX' + "\n"
>  
> @@ -727,24 +719,23 @@ class ManFormat(OutputFormat):
>              self.data += f'.IP "{parameter}" 12' + "\n"
>              self.output_highlight(args['parameterdescs'].get(parameter_name, ""))
>  
> -        for section in sections:
> +        for section, text in args.sections.items():
>              self.data += f'.SH "{section}"' + "\n"
> -            self.output_highlight(sections[section])
> +            self.output_highlight(text)
>  
>      def out_typedef(self, fname, name, args):
>          module = self.modulename
>          typedef = args.get('typedef')
>          purpose = args.get('purpose')
> -        sections = args.get('sections', {})
>  
>          self.data += f'.TH "{module}" 9 "{typedef}" "{self.man_date}" "API Manual" LINUX' + "\n"
>  
>          self.data += ".SH NAME\n"
>          self.data += f"typedef {typedef} \\- {purpose}\n"
>  
> -        for section in sections:
> +        for section, text in args.sections.items():
>              self.data += f'.SH "{section}"' + "\n"
> -            self.output_highlight(sections.get(section))
> +            self.output_highlight(text)
>  
>      def out_struct(self, fname, name, args):
>          module = self.modulename
> @@ -753,7 +744,6 @@ class ManFormat(OutputFormat):
>          purpose = args.get('purpose')
>          definition = args.get('definition')
>          parameterlist = args.get('parameterlist', [])
> -        sections = args.get('sections', {})
>          parameterdescs = args.get('parameterdescs', {})
>  
>          self.data += f'.TH "{module}" 9 "{struct_type} {struct_name}" "{self.man_date}" "API Manual" LINUX' + "\n"
> @@ -782,6 +772,6 @@ class ManFormat(OutputFormat):
>              self.data += f'.IP "{parameter}" 12' + "\n"
>              self.output_highlight(parameterdescs.get(parameter_name))
>  
> -        for section in sections:
> +        for section, text in args.sections.items():
>              self.data += f'.SH "{section}"' + "\n"
> -            self.output_highlight(sections.get(section))
> +            self.output_highlight(text)
> diff --git a/scripts/lib/kdoc/kdoc_parser.py b/scripts/lib/kdoc/kdoc_parser.py
> index 2e00c8b3a5f2..608f3a1045dc 100644
> --- a/scripts/lib/kdoc/kdoc_parser.py
> +++ b/scripts/lib/kdoc/kdoc_parser.py
> @@ -272,13 +272,13 @@ class KernelDoc:
>          item = KdocItem(name, dtype, self.entry.declaration_start_line, **args)
>          item.warnings = self.entry.warnings
>  
> -        sections = item.get('sections', {})
> -
>          # Drop empty sections
>          # TODO: improve empty sections logic to emit warnings
> +        sections = self.entry.sections
>          for section in ["Description", "Return"]:
>              if section in sections and not sections[section].rstrip():
>                  del sections[section]
> +        item.set_sections(sections, self.entry.section_start_lines)
>  
>          self.entries.append(item)
>  
> @@ -824,8 +824,6 @@ class KernelDoc:
>                                  parameterdescs=self.entry.parameterdescs,
>                                  parametertypes=self.entry.parametertypes,
>                                  parameterdesc_start_lines=self.entry.parameterdesc_start_lines,
> -                                sections=self.entry.sections,
> -                                section_start_lines=self.entry.section_start_lines,
>                                  purpose=self.entry.declaration_purpose)
>  
>      def dump_enum(self, ln, proto):
> @@ -908,8 +906,6 @@ class KernelDoc:
>                                  parameterlist=self.entry.parameterlist,
>                                  parameterdescs=self.entry.parameterdescs,
>                                  parameterdesc_start_lines=self.entry.parameterdesc_start_lines,
> -                                sections=self.entry.sections,
> -                                section_start_lines=self.entry.section_start_lines,
>                                  purpose=self.entry.declaration_purpose)
>  
>      def dump_declaration(self, ln, prototype):
> @@ -1079,8 +1075,6 @@ class KernelDoc:
>                                      parameterdescs=self.entry.parameterdescs,
>                                      parametertypes=self.entry.parametertypes,
>                                      parameterdesc_start_lines=self.entry.parameterdesc_start_lines,
> -                                    sections=self.entry.sections,
> -                                    section_start_lines=self.entry.section_start_lines,
>                                      purpose=self.entry.declaration_purpose,
>                                      func_macro=func_macro)
>          else:
> @@ -1092,8 +1086,6 @@ class KernelDoc:
>                                      parameterdescs=self.entry.parameterdescs,
>                                      parametertypes=self.entry.parametertypes,
>                                      parameterdesc_start_lines=self.entry.parameterdesc_start_lines,
> -                                    sections=self.entry.sections,
> -                                    section_start_lines=self.entry.section_start_lines,
>                                      purpose=self.entry.declaration_purpose,
>                                      func_macro=func_macro)
>  
> @@ -1137,8 +1129,6 @@ class KernelDoc:
>                                      parameterdescs=self.entry.parameterdescs,
>                                      parametertypes=self.entry.parametertypes,
>                                      parameterdesc_start_lines=self.entry.parameterdesc_start_lines,
> -                                    sections=self.entry.sections,
> -                                    section_start_lines=self.entry.section_start_lines,
>                                      purpose=self.entry.declaration_purpose)
>              return
>  
> @@ -1159,8 +1149,6 @@ class KernelDoc:
>  
>              self.output_declaration('typedef', declaration_name,
>                                      typedef=declaration_name,
> -                                    sections=self.entry.sections,
> -                                    section_start_lines=self.entry.section_start_lines,
>                                      purpose=self.entry.declaration_purpose)
>              return
>  
> @@ -1642,9 +1630,7 @@ class KernelDoc:
>  
>          if doc_end.search(line):
>              self.dump_section()
> -            self.output_declaration("doc", self.entry.identifier,
> -                                    sections=self.entry.sections,
> -                                    section_start_lines=self.entry.section_start_lines)
> +            self.output_declaration("doc", self.entry.identifier)
>              self.reset_state(ln)
>  
>          elif doc_content.search(line):



Thanks,
Mauro

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 05/12] docs: kdoc: remove the "struct_actual" machinery
  2025-07-02 22:35 ` [PATCH 05/12] docs: kdoc: remove the "struct_actual" machinery Jonathan Corbet
@ 2025-07-10  6:11   ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 41+ messages in thread
From: Mauro Carvalho Chehab @ 2025-07-10  6:11 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: linux-doc, linux-kernel, Akira Yokosawa

Em Wed,  2 Jul 2025 16:35:17 -0600
Jonathan Corbet <corbet@lwn.net> escreveu:

> The code goes out of its way to create a special list of parameters in
> entry.struct_actual that is just like entry.parameterlist, but with extra
> junk.  The only use of that information, in check_sections(), promptly
> strips all the extra junk back out.  Drop all that extra work and just use
> parameterlist.
> 
> No output changes.

LGTM.

Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

> 
> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
> ---
>  scripts/lib/kdoc/kdoc_parser.py | 32 ++------------------------------
>  1 file changed, 2 insertions(+), 30 deletions(-)
> 
> diff --git a/scripts/lib/kdoc/kdoc_parser.py b/scripts/lib/kdoc/kdoc_parser.py
> index 608f3a1045dc..b28f056365cb 100644
> --- a/scripts/lib/kdoc/kdoc_parser.py
> +++ b/scripts/lib/kdoc/kdoc_parser.py
> @@ -116,7 +116,6 @@ class KernelEntry:
>  
>          self._contents = []
>          self.sectcheck = ""
> -        self.struct_actual = ""
>          self.prototype = ""
>  
>          self.warnings = []
> @@ -366,15 +365,6 @@ class KernelDoc:
>          org_arg = KernRe(r'\s\s+').sub(' ', org_arg)
>          self.entry.parametertypes[param] = org_arg
>  
> -    def save_struct_actual(self, actual):
> -        """
> -        Strip all spaces from the actual param so that it looks like
> -        one string item.
> -        """
> -
> -        actual = KernRe(r'\s*').sub("", actual, count=1)
> -
> -        self.entry.struct_actual += actual + " "
>  
>      def create_parameter_list(self, ln, decl_type, args,
>                                splitter, declaration_name):
> @@ -420,7 +410,6 @@ class KernelDoc:
>                      param = arg
>  
>                  dtype = KernRe(r'([^\(]+\(\*?)\s*' + re.escape(param)).sub(r'\1', arg)
> -                self.save_struct_actual(param)
>                  self.push_parameter(ln, decl_type, param, dtype,
>                                      arg, declaration_name)
>  
> @@ -437,7 +426,6 @@ class KernelDoc:
>  
>                  dtype = KernRe(r'([^\(]+\(\*?)\s*' + re.escape(param)).sub(r'\1', arg)
>  
> -                self.save_struct_actual(param)
>                  self.push_parameter(ln, decl_type, param, dtype,
>                                      arg, declaration_name)
>  
> @@ -470,7 +458,6 @@ class KernelDoc:
>  
>                          param = r.group(1)
>  
> -                        self.save_struct_actual(r.group(2))
>                          self.push_parameter(ln, decl_type, r.group(2),
>                                              f"{dtype} {r.group(1)}",
>                                              arg, declaration_name)
> @@ -482,12 +469,10 @@ class KernelDoc:
>                              continue
>  
>                          if dtype != "":  # Skip unnamed bit-fields
> -                            self.save_struct_actual(r.group(1))
>                              self.push_parameter(ln, decl_type, r.group(1),
>                                                  f"{dtype}:{r.group(2)}",
>                                                  arg, declaration_name)
>                      else:
> -                        self.save_struct_actual(param)
>                          self.push_parameter(ln, decl_type, param, dtype,
>                                              arg, declaration_name)
>  
> @@ -499,24 +484,11 @@ class KernelDoc:
>  
>          sects = sectcheck.split()
>          prms = prmscheck.split()
> -        err = False
>  
>          for sx in range(len(sects)):                  # pylint: disable=C0200
>              err = True
>              for px in range(len(prms)):               # pylint: disable=C0200
> -                prm_clean = prms[px]
> -                prm_clean = KernRe(r'\[.*\]').sub('', prm_clean)
> -                prm_clean = attribute.sub('', prm_clean)
> -
> -                # ignore array size in a parameter string;
> -                # however, the original param string may contain
> -                # spaces, e.g.:  addr[6 + 2]
> -                # and this appears in @prms as "addr[6" since the
> -                # parameter list is split at spaces;
> -                # hence just ignore "[..." for the sections check;
> -                prm_clean = KernRe(r'\[.*').sub('', prm_clean)
> -
> -                if prm_clean == sects[sx]:
> +                if prms[px] == sects[sx]:
>                      err = False
>                      break
>  
> @@ -782,7 +754,7 @@ class KernelDoc:
>          self.create_parameter_list(ln, decl_type, members, ';',
>                                     declaration_name)
>          self.check_sections(ln, declaration_name, decl_type,
> -                            self.entry.sectcheck, self.entry.struct_actual)
> +                            self.entry.sectcheck, ' '.join(self.entry.parameterlist))
>  
>          # Adjust declaration for better display
>          declaration = KernRe(r'([\{;])').sub(r'\1\n', declaration)



Thanks,
Mauro

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 06/12] docs: kdoc: use self.entry.parameterlist directly in check_sections()
  2025-07-02 22:35 ` [PATCH 06/12] docs: kdoc: use self.entry.parameterlist directly in check_sections() Jonathan Corbet
@ 2025-07-10  6:12   ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 41+ messages in thread
From: Mauro Carvalho Chehab @ 2025-07-10  6:12 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: linux-doc, linux-kernel, Akira Yokosawa

Em Wed,  2 Jul 2025 16:35:18 -0600
Jonathan Corbet <corbet@lwn.net> escreveu:

> Callers of check_sections() join parameterlist into a single string, which
> is then immediately split back into the original list.  Rather than do all
> that, just use parameterlist directly in check_sections().
> 
> Signed-off-by: Jonathan Corbet <corbet@lwn.net>

LGTM.
Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

> ---
>  scripts/lib/kdoc/kdoc_parser.py | 14 +++++---------
>  1 file changed, 5 insertions(+), 9 deletions(-)
> 
> diff --git a/scripts/lib/kdoc/kdoc_parser.py b/scripts/lib/kdoc/kdoc_parser.py
> index b28f056365cb..ffd49f9395ae 100644
> --- a/scripts/lib/kdoc/kdoc_parser.py
> +++ b/scripts/lib/kdoc/kdoc_parser.py
> @@ -476,19 +476,18 @@ class KernelDoc:
>                          self.push_parameter(ln, decl_type, param, dtype,
>                                              arg, declaration_name)
>  
> -    def check_sections(self, ln, decl_name, decl_type, sectcheck, prmscheck):
> +    def check_sections(self, ln, decl_name, decl_type, sectcheck):
>          """
>          Check for errors inside sections, emitting warnings if not found
>          parameters are described.
>          """
>  
>          sects = sectcheck.split()
> -        prms = prmscheck.split()
>  
>          for sx in range(len(sects)):                  # pylint: disable=C0200
>              err = True
> -            for px in range(len(prms)):               # pylint: disable=C0200
> -                if prms[px] == sects[sx]:
> +            for param in self.entry.parameterlist:
> +                if param == sects[sx]:
>                      err = False
>                      break
>  
> @@ -753,8 +752,7 @@ class KernelDoc:
>  
>          self.create_parameter_list(ln, decl_type, members, ';',
>                                     declaration_name)
> -        self.check_sections(ln, declaration_name, decl_type,
> -                            self.entry.sectcheck, ' '.join(self.entry.parameterlist))
> +        self.check_sections(ln, declaration_name, decl_type, self.entry.sectcheck)
>  
>          # Adjust declaration for better display
>          declaration = KernRe(r'([\{;])').sub(r'\1\n', declaration)
> @@ -1032,9 +1030,7 @@ class KernelDoc:
>                            f"expecting prototype for {self.entry.identifier}(). Prototype was for {declaration_name}() instead")
>              return
>  
> -        prms = " ".join(self.entry.parameterlist)
> -        self.check_sections(ln, declaration_name, "function",
> -                            self.entry.sectcheck, prms)
> +        self.check_sections(ln, declaration_name, "function", self.entry.sectcheck)
>  
>          self.check_return_section(ln, declaration_name, return_type)
>  



Thanks,
Mauro

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 07/12] docs: kdoc: Coalesce parameter-list handling
  2025-07-02 22:35 ` [PATCH 07/12] docs: kdoc: Coalesce parameter-list handling Jonathan Corbet
@ 2025-07-10  6:20   ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 41+ messages in thread
From: Mauro Carvalho Chehab @ 2025-07-10  6:20 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: linux-doc, linux-kernel, Akira Yokosawa

Em Wed,  2 Jul 2025 16:35:19 -0600
Jonathan Corbet <corbet@lwn.net> escreveu:

> Callers to output_declaration() always pass the parameter information from
> self.entry; remove all of the boilerplate arguments and just get at that
> information directly.  Formalize its placement in the KdocItem class.
> 
> It would be nice to get rid of parameterlist as well, but that has the
> effect of reordering the output of function parameters and struct fields to
> match the order in the kerneldoc comment rather than in the declaration.
> One could argue about which is more correct, but the ordering has been left
> unchanged for now.
> 
> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
> ---
>  scripts/lib/kdoc/kdoc_item.py   | 10 ++++-
>  scripts/lib/kdoc/kdoc_output.py | 75 +++++++++++++--------------------
>  scripts/lib/kdoc/kdoc_parser.py | 23 ++--------
>  3 files changed, 41 insertions(+), 67 deletions(-)
> 
> diff --git a/scripts/lib/kdoc/kdoc_item.py b/scripts/lib/kdoc/kdoc_item.py
> index c8329019a219..51e8669b9a6e 100644
> --- a/scripts/lib/kdoc/kdoc_item.py
> +++ b/scripts/lib/kdoc/kdoc_item.py
> @@ -10,6 +10,8 @@ class KdocItem:
>          self.type = type
>          self.declaration_start_line = start_line
>          self.sections = self.sections_start_lines = { }
> +        self.parameterlist = self.parameterdesc_start_lines = []
> +        self.parameterdescs = self.parametertypes = { }

See my comments on a previous e-mail:

- place "{}" without spaces;
- don't assign the same pointer to multiple variables (*)

(*) On Python, dict and list type assignments are always handled
    as pointers - except if you call copy() or deepcopy().

>          #
>          # Just save everything else into our own dict so that the output
>          # side can grab it directly as before.  As we move things into more
> @@ -27,8 +29,14 @@ class KdocItem:
>          return self.get(key)
>  
>      #
> -    # Tracking of section information.
> +    # Tracking of section and parameter information.
>      #
>      def set_sections(self, sections, start_lines):
>          self.sections = sections
>          self.section_start_lines = start_lines
> +
> +    def set_params(self, names, descs, types, starts):
> +        self.parameterlist = names
> +        self.parameterdescs = descs
> +        self.parametertypes = types
> +        self.parameterdesc_start_lines = starts

Your output was unchanged just because you're replacing all four
lists at the same time here. If we ever modify the code to store them
one by one, or to update previous set values, then we'll have
troubles.

> diff --git a/scripts/lib/kdoc/kdoc_output.py b/scripts/lib/kdoc/kdoc_output.py
> index 15cb89f91987..d6f4d9e7173b 100644
> --- a/scripts/lib/kdoc/kdoc_output.py
> +++ b/scripts/lib/kdoc/kdoc_output.py
> @@ -373,18 +373,13 @@ class RestFormat(OutputFormat):
>                  signature = args['functiontype'] + " "
>              signature += args['function'] + " ("
>  
> -        parameterlist = args.get('parameterlist', [])
> -        parameterdescs = args.get('parameterdescs', {})
> -        parameterdesc_start_lines = args.get('parameterdesc_start_lines', {})
> -
>          ln = args.get('declaration_start_line', 0)
> -
>          count = 0
> -        for parameter in parameterlist:
> +        for parameter in args.parameterlist:
>              if count != 0:
>                  signature += ", "
>              count += 1
> -            dtype = args['parametertypes'].get(parameter, "")
> +            dtype = args.parametertypes.get(parameter, "")
>  
>              if function_pointer.search(dtype):
>                  signature += function_pointer.group(1) + parameter + function_pointer.group(3)
> @@ -419,26 +414,26 @@ class RestFormat(OutputFormat):
>          # function prototypes apart
>          self.lineprefix = "  "
>  
> -        if parameterlist:
> +        if args.parameterlist:
>              self.data += ".. container:: kernelindent\n\n"
>              self.data += f"{self.lineprefix}**Parameters**\n\n"
>  
> -        for parameter in parameterlist:
> +        for parameter in args.parameterlist:
>              parameter_name = KernRe(r'\[.*').sub('', parameter)
> -            dtype = args['parametertypes'].get(parameter, "")
> +            dtype = args.parametertypes.get(parameter, "")
>  
>              if dtype:
>                  self.data += f"{self.lineprefix}``{dtype}``\n"
>              else:
>                  self.data += f"{self.lineprefix}``{parameter}``\n"
>  
> -            self.print_lineno(parameterdesc_start_lines.get(parameter_name, 0))
> +            self.print_lineno(args.parameterdesc_start_lines.get(parameter_name, 0))
>  
>              self.lineprefix = "    "
> -            if parameter_name in parameterdescs and \
> -               parameterdescs[parameter_name] != KernelDoc.undescribed:
> +            if parameter_name in args.parameterdescs and \
> +               args.parameterdescs[parameter_name] != KernelDoc.undescribed:
>  
> -                self.output_highlight(parameterdescs[parameter_name])
> +                self.output_highlight(args.parameterdescs[parameter_name])
>                  self.data += "\n"
>              else:
>                  self.data += f"{self.lineprefix}*undescribed*\n\n"
> @@ -451,8 +446,6 @@ class RestFormat(OutputFormat):
>  
>          oldprefix = self.lineprefix
>          name = args.get('enum', '')
> -        parameterlist = args.get('parameterlist', [])
> -        parameterdescs = args.get('parameterdescs', {})
>          ln = args.get('declaration_start_line', 0)
>  
>          self.data += f"\n\n.. c:enum:: {name}\n\n"
> @@ -467,11 +460,11 @@ class RestFormat(OutputFormat):
>          self.lineprefix = outer + "  "
>          self.data += f"{outer}**Constants**\n\n"
>  
> -        for parameter in parameterlist:
> +        for parameter in args.parameterlist:
>              self.data += f"{outer}``{parameter}``\n"
>  
> -            if parameterdescs.get(parameter, '') != KernelDoc.undescribed:
> -                self.output_highlight(parameterdescs[parameter])
> +            if args.parameterdescs.get(parameter, '') != KernelDoc.undescribed:
> +                self.output_highlight(args.parameterdescs[parameter])
>              else:
>                  self.data += f"{self.lineprefix}*undescribed*\n\n"
>              self.data += "\n"
> @@ -505,10 +498,6 @@ class RestFormat(OutputFormat):
>          dtype = args.get('type', "struct")
>          ln = args.get('declaration_start_line', 0)
>  
> -        parameterlist = args.get('parameterlist', [])
> -        parameterdescs = args.get('parameterdescs', {})
> -        parameterdesc_start_lines = args.get('parameterdesc_start_lines', {})
> -
>          self.data += f"\n\n.. c:{dtype}:: {name}\n\n"
>  
>          self.print_lineno(ln)
> @@ -531,21 +520,21 @@ class RestFormat(OutputFormat):
>  
>          self.lineprefix = "  "
>          self.data += f"{self.lineprefix}**Members**\n\n"
> -        for parameter in parameterlist:
> +        for parameter in args.parameterlist:
>              if not parameter or parameter.startswith("#"):
>                  continue
>  
>              parameter_name = parameter.split("[", maxsplit=1)[0]
>  
> -            if parameterdescs.get(parameter_name) == KernelDoc.undescribed:
> +            if args.parameterdescs.get(parameter_name) == KernelDoc.undescribed:
>                  continue
>  
> -            self.print_lineno(parameterdesc_start_lines.get(parameter_name, 0))
> +            self.print_lineno(args.parameterdesc_start_lines.get(parameter_name, 0))
>  
>              self.data += f"{self.lineprefix}``{parameter}``\n"
>  
>              self.lineprefix = "    "
> -            self.output_highlight(parameterdescs[parameter_name])
> +            self.output_highlight(args.parameterdescs[parameter_name])
>              self.lineprefix = "  "
>  
>              self.data += "\n"
> @@ -643,9 +632,6 @@ class ManFormat(OutputFormat):
>      def out_function(self, fname, name, args):
>          """output function in man"""
>  
> -        parameterlist = args.get('parameterlist', [])
> -        parameterdescs = args.get('parameterdescs', {})
> -
>          self.data += f'.TH "{args["function"]}" 9 "{args["function"]}" "{self.man_date}" "Kernel Hacker\'s Manual" LINUX' + "\n"
>  
>          self.data += ".SH NAME\n"
> @@ -661,11 +647,11 @@ class ManFormat(OutputFormat):
>          parenth = "("
>          post = ","
>  
> -        for parameter in parameterlist:
> -            if count == len(parameterlist) - 1:
> +        for parameter in args.parameterlist:
> +            if count == len(args.parameterlist) - 1:
>                  post = ");"
>  
> -            dtype = args['parametertypes'].get(parameter, "")
> +            dtype = args.parametertypes.get(parameter, "")
>              if function_pointer.match(dtype):
>                  # Pointer-to-function
>                  self.data += f'".BI "{parenth}{function_pointer.group(1)}" " ") ({function_pointer.group(2)}){post}"' + "\n"
> @@ -676,14 +662,14 @@ class ManFormat(OutputFormat):
>              count += 1
>              parenth = ""
>  
> -        if parameterlist:
> +        if args.parameterlist:
>              self.data += ".SH ARGUMENTS\n"
>  
> -        for parameter in parameterlist:
> +        for parameter in args.parameterlist:
>              parameter_name = re.sub(r'\[.*', '', parameter)
>  
>              self.data += f'.IP "{parameter}" 12' + "\n"
> -            self.output_highlight(parameterdescs.get(parameter_name, ""))
> +            self.output_highlight(args.parameterdescs.get(parameter_name, ""))
>  
>          for section, text in args.sections.items():
>              self.data += f'.SH "{section.upper()}"' + "\n"
> @@ -692,7 +678,6 @@ class ManFormat(OutputFormat):
>      def out_enum(self, fname, name, args):
>  
>          name = args.get('enum', '')
> -        parameterlist = args.get('parameterlist', [])
>  
>          self.data += f'.TH "{self.modulename}" 9 "enum {args["enum"]}" "{self.man_date}" "API Manual" LINUX' + "\n"
>  
> @@ -703,9 +688,9 @@ class ManFormat(OutputFormat):
>          self.data += f"enum {args['enum']}" + " {\n"
>  
>          count = 0
> -        for parameter in parameterlist:
> +        for parameter in args.parameterlist:
>              self.data += f'.br\n.BI "    {parameter}"' + "\n"
> -            if count == len(parameterlist) - 1:
> +            if count == len(args.parameterlist) - 1:
>                  self.data += "\n};\n"
>              else:
>                  self.data += ", \n.br\n"
> @@ -714,10 +699,10 @@ class ManFormat(OutputFormat):
>  
>          self.data += ".SH Constants\n"
>  
> -        for parameter in parameterlist:
> +        for parameter in args.parameterlist:
>              parameter_name = KernRe(r'\[.*').sub('', parameter)
>              self.data += f'.IP "{parameter}" 12' + "\n"
> -            self.output_highlight(args['parameterdescs'].get(parameter_name, ""))
> +            self.output_highlight(args.parameterdescs.get(parameter_name, ""))
>  
>          for section, text in args.sections.items():
>              self.data += f'.SH "{section}"' + "\n"
> @@ -743,8 +728,6 @@ class ManFormat(OutputFormat):
>          struct_name = args.get('struct')
>          purpose = args.get('purpose')
>          definition = args.get('definition')
> -        parameterlist = args.get('parameterlist', [])
> -        parameterdescs = args.get('parameterdescs', {})
>  
>          self.data += f'.TH "{module}" 9 "{struct_type} {struct_name}" "{self.man_date}" "API Manual" LINUX' + "\n"
>  
> @@ -760,17 +743,17 @@ class ManFormat(OutputFormat):
>          self.data += f'.BI "{declaration}\n' + "};\n.br\n\n"
>  
>          self.data += ".SH Members\n"
> -        for parameter in parameterlist:
> +        for parameter in args.parameterlist:
>              if parameter.startswith("#"):
>                  continue
>  
>              parameter_name = re.sub(r"\[.*", "", parameter)
>  
> -            if parameterdescs.get(parameter_name) == KernelDoc.undescribed:
> +            if args.parameterdescs.get(parameter_name) == KernelDoc.undescribed:
>                  continue
>  
>              self.data += f'.IP "{parameter}" 12' + "\n"
> -            self.output_highlight(parameterdescs.get(parameter_name))
> +            self.output_highlight(args.parameterdescs.get(parameter_name))
>  
>          for section, text in args.sections.items():
>              self.data += f'.SH "{section}"' + "\n"
> diff --git a/scripts/lib/kdoc/kdoc_parser.py b/scripts/lib/kdoc/kdoc_parser.py
> index ffd49f9395ae..298abd260264 100644
> --- a/scripts/lib/kdoc/kdoc_parser.py
> +++ b/scripts/lib/kdoc/kdoc_parser.py
> @@ -278,7 +278,9 @@ class KernelDoc:
>              if section in sections and not sections[section].rstrip():
>                  del sections[section]
>          item.set_sections(sections, self.entry.section_start_lines)
> -
> +        item.set_params(self.entry.parameterlist, self.entry.parameterdescs,
> +                        self.entry.parametertypes,
> +                        self.entry.parameterdesc_start_lines)
>          self.entries.append(item)
>  
>          self.config.log.debug("Output: %s:%s = %s", dtype, name, pformat(args))
> @@ -790,10 +792,6 @@ class KernelDoc:
>          self.output_declaration(decl_type, declaration_name,
>                                  struct=declaration_name,
>                                  definition=declaration,
> -                                parameterlist=self.entry.parameterlist,
> -                                parameterdescs=self.entry.parameterdescs,
> -                                parametertypes=self.entry.parametertypes,
> -                                parameterdesc_start_lines=self.entry.parameterdesc_start_lines,
>                                  purpose=self.entry.declaration_purpose)
>  
>      def dump_enum(self, ln, proto):
> @@ -873,9 +871,6 @@ class KernelDoc:
>  
>          self.output_declaration('enum', declaration_name,
>                                  enum=declaration_name,
> -                                parameterlist=self.entry.parameterlist,
> -                                parameterdescs=self.entry.parameterdescs,
> -                                parameterdesc_start_lines=self.entry.parameterdesc_start_lines,
>                                  purpose=self.entry.declaration_purpose)
>  
>      def dump_declaration(self, ln, prototype):
> @@ -1039,10 +1034,6 @@ class KernelDoc:
>                                      function=declaration_name,
>                                      typedef=True,
>                                      functiontype=return_type,
> -                                    parameterlist=self.entry.parameterlist,
> -                                    parameterdescs=self.entry.parameterdescs,
> -                                    parametertypes=self.entry.parametertypes,
> -                                    parameterdesc_start_lines=self.entry.parameterdesc_start_lines,
>                                      purpose=self.entry.declaration_purpose,
>                                      func_macro=func_macro)
>          else:
> @@ -1050,10 +1041,6 @@ class KernelDoc:
>                                      function=declaration_name,
>                                      typedef=False,
>                                      functiontype=return_type,
> -                                    parameterlist=self.entry.parameterlist,
> -                                    parameterdescs=self.entry.parameterdescs,
> -                                    parametertypes=self.entry.parametertypes,
> -                                    parameterdesc_start_lines=self.entry.parameterdesc_start_lines,
>                                      purpose=self.entry.declaration_purpose,
>                                      func_macro=func_macro)
>  
> @@ -1093,10 +1080,6 @@ class KernelDoc:
>                                      function=declaration_name,
>                                      typedef=True,
>                                      functiontype=return_type,
> -                                    parameterlist=self.entry.parameterlist,
> -                                    parameterdescs=self.entry.parameterdescs,
> -                                    parametertypes=self.entry.parametertypes,
> -                                    parameterdesc_start_lines=self.entry.parameterdesc_start_lines,
>                                      purpose=self.entry.declaration_purpose)
>              return
>  



Thanks,
Mauro

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 08/12] docs: kdoc: Regularize the use of the declaration name
  2025-07-02 22:35 ` [PATCH 08/12] docs: kdoc: Regularize the use of the declaration name Jonathan Corbet
@ 2025-07-10  6:22   ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 41+ messages in thread
From: Mauro Carvalho Chehab @ 2025-07-10  6:22 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: linux-doc, linux-kernel, Akira Yokosawa

Em Wed,  2 Jul 2025 16:35:20 -0600
Jonathan Corbet <corbet@lwn.net> escreveu:

> Each declaration type passes through the name in a unique field of the
> "args" blob - even though we have always just passed the name separately.
> Get rid of all the weird names and just use the common version.
> 
> Signed-off-by: Jonathan Corbet <corbet@lwn.net>

LGTM.
Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

> ---
>  scripts/lib/kdoc/kdoc_output.py | 39 +++++++++++++--------------------
>  scripts/lib/kdoc/kdoc_parser.py |  6 -----
>  2 files changed, 15 insertions(+), 30 deletions(-)
> 
> diff --git a/scripts/lib/kdoc/kdoc_output.py b/scripts/lib/kdoc/kdoc_output.py
> index d6f4d9e7173b..8a31b637ffd2 100644
> --- a/scripts/lib/kdoc/kdoc_output.py
> +++ b/scripts/lib/kdoc/kdoc_output.py
> @@ -367,11 +367,11 @@ class RestFormat(OutputFormat):
>  
>          func_macro = args.get('func_macro', False)
>          if func_macro:
> -            signature = args['function']
> +            signature = name
>          else:
>              if args.get('functiontype'):
>                  signature = args['functiontype'] + " "
> -            signature += args['function'] + " ("
> +            signature += name + " ("
>  
>          ln = args.get('declaration_start_line', 0)
>          count = 0
> @@ -391,7 +391,7 @@ class RestFormat(OutputFormat):
>  
>          self.print_lineno(ln)
>          if args.get('typedef') or not args.get('functiontype'):
> -            self.data += f".. c:macro:: {args['function']}\n\n"
> +            self.data += f".. c:macro:: {name}\n\n"
>  
>              if args.get('typedef'):
>                  self.data += "   **Typedef**: "
> @@ -445,7 +445,6 @@ class RestFormat(OutputFormat):
>      def out_enum(self, fname, name, args):
>  
>          oldprefix = self.lineprefix
> -        name = args.get('enum', '')
>          ln = args.get('declaration_start_line', 0)
>  
>          self.data += f"\n\n.. c:enum:: {name}\n\n"
> @@ -475,7 +474,6 @@ class RestFormat(OutputFormat):
>      def out_typedef(self, fname, name, args):
>  
>          oldprefix = self.lineprefix
> -        name = args.get('typedef', '')
>          ln = args.get('declaration_start_line', 0)
>  
>          self.data += f"\n\n.. c:type:: {name}\n\n"
> @@ -492,7 +490,6 @@ class RestFormat(OutputFormat):
>  
>      def out_struct(self, fname, name, args):
>  
> -        name = args.get('struct', "")
>          purpose = args.get('purpose', "")
>          declaration = args.get('definition', "")
>          dtype = args.get('type', "struct")
> @@ -632,16 +629,16 @@ class ManFormat(OutputFormat):
>      def out_function(self, fname, name, args):
>          """output function in man"""
>  
> -        self.data += f'.TH "{args["function"]}" 9 "{args["function"]}" "{self.man_date}" "Kernel Hacker\'s Manual" LINUX' + "\n"
> +        self.data += f'.TH "{name}" 9 "{name}" "{self.man_date}" "Kernel Hacker\'s Manual" LINUX' + "\n"
>  
>          self.data += ".SH NAME\n"
> -        self.data += f"{args['function']} \\- {args['purpose']}\n"
> +        self.data += f"{name} \\- {args['purpose']}\n"
>  
>          self.data += ".SH SYNOPSIS\n"
>          if args.get('functiontype', ''):
> -            self.data += f'.B "{args["functiontype"]}" {args["function"]}' + "\n"
> +            self.data += f'.B "{args["functiontype"]}" {name}' + "\n"
>          else:
> -            self.data += f'.B "{args["function"]}' + "\n"
> +            self.data += f'.B "{name}' + "\n"
>  
>          count = 0
>          parenth = "("
> @@ -676,16 +673,13 @@ class ManFormat(OutputFormat):
>              self.output_highlight(text)
>  
>      def out_enum(self, fname, name, args):
> -
> -        name = args.get('enum', '')
> -
> -        self.data += f'.TH "{self.modulename}" 9 "enum {args["enum"]}" "{self.man_date}" "API Manual" LINUX' + "\n"
> +        self.data += f'.TH "{self.modulename}" 9 "enum {name}" "{self.man_date}" "API Manual" LINUX' + "\n"
>  
>          self.data += ".SH NAME\n"
> -        self.data += f"enum {args['enum']} \\- {args['purpose']}\n"
> +        self.data += f"enum {name} \\- {args['purpose']}\n"
>  
>          self.data += ".SH SYNOPSIS\n"
> -        self.data += f"enum {args['enum']}" + " {\n"
> +        self.data += f"enum {name}" + " {\n"
>  
>          count = 0
>          for parameter in args.parameterlist:
> @@ -710,13 +704,12 @@ class ManFormat(OutputFormat):
>  
>      def out_typedef(self, fname, name, args):
>          module = self.modulename
> -        typedef = args.get('typedef')
>          purpose = args.get('purpose')
>  
> -        self.data += f'.TH "{module}" 9 "{typedef}" "{self.man_date}" "API Manual" LINUX' + "\n"
> +        self.data += f'.TH "{module}" 9 "{name}" "{self.man_date}" "API Manual" LINUX' + "\n"
>  
>          self.data += ".SH NAME\n"
> -        self.data += f"typedef {typedef} \\- {purpose}\n"
> +        self.data += f"typedef {name} \\- {purpose}\n"
>  
>          for section, text in args.sections.items():
>              self.data += f'.SH "{section}"' + "\n"
> @@ -724,22 +717,20 @@ class ManFormat(OutputFormat):
>  
>      def out_struct(self, fname, name, args):
>          module = self.modulename
> -        struct_type = args.get('type')
> -        struct_name = args.get('struct')
>          purpose = args.get('purpose')
>          definition = args.get('definition')
>  
> -        self.data += f'.TH "{module}" 9 "{struct_type} {struct_name}" "{self.man_date}" "API Manual" LINUX' + "\n"
> +        self.data += f'.TH "{module}" 9 "{args.type} {name}" "{self.man_date}" "API Manual" LINUX' + "\n"
>  
>          self.data += ".SH NAME\n"
> -        self.data += f"{struct_type} {struct_name} \\- {purpose}\n"
> +        self.data += f"{args.type} {name} \\- {purpose}\n"
>  
>          # Replace tabs with two spaces and handle newlines
>          declaration = definition.replace("\t", "  ")
>          declaration = KernRe(r"\n").sub('"\n.br\n.BI "', declaration)
>  
>          self.data += ".SH SYNOPSIS\n"
> -        self.data += f"{struct_type} {struct_name} " + "{" + "\n.br\n"
> +        self.data += f"{args.type} {name} " + "{" + "\n.br\n"
>          self.data += f'.BI "{declaration}\n' + "};\n.br\n\n"
>  
>          self.data += ".SH Members\n"
> diff --git a/scripts/lib/kdoc/kdoc_parser.py b/scripts/lib/kdoc/kdoc_parser.py
> index 298abd260264..6e35e508608b 100644
> --- a/scripts/lib/kdoc/kdoc_parser.py
> +++ b/scripts/lib/kdoc/kdoc_parser.py
> @@ -790,7 +790,6 @@ class KernelDoc:
>                  level += 1
>  
>          self.output_declaration(decl_type, declaration_name,
> -                                struct=declaration_name,
>                                  definition=declaration,
>                                  purpose=self.entry.declaration_purpose)
>  
> @@ -870,7 +869,6 @@ class KernelDoc:
>                                f"Excess enum value '%{k}' description in '{declaration_name}'")
>  
>          self.output_declaration('enum', declaration_name,
> -                                enum=declaration_name,
>                                  purpose=self.entry.declaration_purpose)
>  
>      def dump_declaration(self, ln, prototype):
> @@ -1031,14 +1029,12 @@ class KernelDoc:
>  
>          if 'typedef' in return_type:
>              self.output_declaration(decl_type, declaration_name,
> -                                    function=declaration_name,
>                                      typedef=True,
>                                      functiontype=return_type,
>                                      purpose=self.entry.declaration_purpose,
>                                      func_macro=func_macro)
>          else:
>              self.output_declaration(decl_type, declaration_name,
> -                                    function=declaration_name,
>                                      typedef=False,
>                                      functiontype=return_type,
>                                      purpose=self.entry.declaration_purpose,
> @@ -1077,7 +1073,6 @@ class KernelDoc:
>              self.create_parameter_list(ln, decl_type, args, ',', declaration_name)
>  
>              self.output_declaration(decl_type, declaration_name,
> -                                    function=declaration_name,
>                                      typedef=True,
>                                      functiontype=return_type,
>                                      purpose=self.entry.declaration_purpose)
> @@ -1099,7 +1094,6 @@ class KernelDoc:
>                  return
>  
>              self.output_declaration('typedef', declaration_name,
> -                                    typedef=declaration_name,
>                                      purpose=self.entry.declaration_purpose)
>              return
>  



Thanks,
Mauro

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 09/12] docs: kdoc: straighten up dump_declaration()
  2025-07-02 22:35 ` [PATCH 09/12] docs: kdoc: straighten up dump_declaration() Jonathan Corbet
@ 2025-07-10  6:25   ` Mauro Carvalho Chehab
  2025-07-10 13:27     ` Jonathan Corbet
  0 siblings, 1 reply; 41+ messages in thread
From: Mauro Carvalho Chehab @ 2025-07-10  6:25 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: linux-doc, linux-kernel, Akira Yokosawa

Em Wed,  2 Jul 2025 16:35:21 -0600
Jonathan Corbet <corbet@lwn.net> escreveu:

> Get rid of the excess "return" statements in dump_declaration(), along with
> a line of never-executed dead code.
> 
> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
> ---
>  scripts/lib/kdoc/kdoc_parser.py | 15 +++++----------
>  1 file changed, 5 insertions(+), 10 deletions(-)
> 
> diff --git a/scripts/lib/kdoc/kdoc_parser.py b/scripts/lib/kdoc/kdoc_parser.py
> index 6e35e508608b..7191fa94e17a 100644
> --- a/scripts/lib/kdoc/kdoc_parser.py
> +++ b/scripts/lib/kdoc/kdoc_parser.py
> @@ -878,18 +878,13 @@ class KernelDoc:
>  
>          if self.entry.decl_type == "enum":
>              self.dump_enum(ln, prototype)
> -            return
> -
> -        if self.entry.decl_type == "typedef":
> +        elif self.entry.decl_type == "typedef":
>              self.dump_typedef(ln, prototype)
> -            return
> -
> -        if self.entry.decl_type in ["union", "struct"]:
> +        elif self.entry.decl_type in ["union", "struct"]:
>              self.dump_struct(ln, prototype)
> -            return
> -

The above LGTM.

> -        self.output_declaration(self.entry.decl_type, prototype,
> -                                entry=self.entry)
> +        else:
> +            # This would be a bug
> +            self.emit_message(ln, f'Unknown declaration type: {self.entry.decl_type}')

Hmm... Are you sure about that? If I'm not mistaken, this was used for
other types of arguments, like DOC: tags.

>  
>      def dump_function(self, ln, prototype):
>          """

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 10/12] docs: kdoc: directly access the always-there KdocItem fields
  2025-07-02 22:35 ` [PATCH 10/12] docs: kdoc: directly access the always-there KdocItem fields Jonathan Corbet
@ 2025-07-10  6:27   ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 41+ messages in thread
From: Mauro Carvalho Chehab @ 2025-07-10  6:27 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: linux-doc, linux-kernel, Akira Yokosawa

Em Wed,  2 Jul 2025 16:35:22 -0600
Jonathan Corbet <corbet@lwn.net> escreveu:

> They are part of the interface, so use them directly.  This allows the
> removal of the transitional __dict__ hack in KdocItem.
> 
> Signed-off-by: Jonathan Corbet <corbet@lwn.net>

LGTM.
Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

> ---
>  scripts/lib/kdoc/kdoc_item.py   |  5 +----
>  scripts/lib/kdoc/kdoc_output.py | 16 +++++++---------
>  2 files changed, 8 insertions(+), 13 deletions(-)
> 
> diff --git a/scripts/lib/kdoc/kdoc_item.py b/scripts/lib/kdoc/kdoc_item.py
> index 51e8669b9a6e..807290678984 100644
> --- a/scripts/lib/kdoc/kdoc_item.py
> +++ b/scripts/lib/kdoc/kdoc_item.py
> @@ -20,10 +20,7 @@ class KdocItem:
>          self.other_stuff = other_stuff
>  
>      def get(self, key, default = None):
> -        ret = self.other_stuff.get(key, default)
> -        if ret == default:
> -            return self.__dict__.get(key, default)
> -        return ret
> +        return self.other_stuff.get(key, default)
>  
>      def __getitem__(self, key):
>          return self.get(key)
> diff --git a/scripts/lib/kdoc/kdoc_output.py b/scripts/lib/kdoc/kdoc_output.py
> index 8a31b637ffd2..ea8914537ba0 100644
> --- a/scripts/lib/kdoc/kdoc_output.py
> +++ b/scripts/lib/kdoc/kdoc_output.py
> @@ -124,9 +124,7 @@ class OutputFormat:
>          Output warnings for identifiers that will be displayed.
>          """
>  
> -        warnings = args.get('warnings', [])
> -
> -        for log_msg in warnings:
> +        for log_msg in args.warnings:
>              self.config.warning(log_msg)
>  
>      def check_doc(self, name, args):
> @@ -184,7 +182,7 @@ class OutputFormat:
>  
>          self.data = ""
>  
> -        dtype = args.get('type', "")
> +        dtype = args.type
>  
>          if dtype == "doc":
>              self.out_doc(fname, name, args)
> @@ -373,7 +371,7 @@ class RestFormat(OutputFormat):
>                  signature = args['functiontype'] + " "
>              signature += name + " ("
>  
> -        ln = args.get('declaration_start_line', 0)
> +        ln = args.declaration_start_line
>          count = 0
>          for parameter in args.parameterlist:
>              if count != 0:
> @@ -445,7 +443,7 @@ class RestFormat(OutputFormat):
>      def out_enum(self, fname, name, args):
>  
>          oldprefix = self.lineprefix
> -        ln = args.get('declaration_start_line', 0)
> +        ln = args.declaration_start_line
>  
>          self.data += f"\n\n.. c:enum:: {name}\n\n"
>  
> @@ -474,7 +472,7 @@ class RestFormat(OutputFormat):
>      def out_typedef(self, fname, name, args):
>  
>          oldprefix = self.lineprefix
> -        ln = args.get('declaration_start_line', 0)
> +        ln = args.declaration_start_line
>  
>          self.data += f"\n\n.. c:type:: {name}\n\n"
>  
> @@ -492,8 +490,8 @@ class RestFormat(OutputFormat):
>  
>          purpose = args.get('purpose', "")
>          declaration = args.get('definition', "")
> -        dtype = args.get('type', "struct")
> -        ln = args.get('declaration_start_line', 0)
> +        dtype = args.type
> +        ln = args.declaration_start_line
>  
>          self.data += f"\n\n.. c:{dtype}:: {name}\n\n"
>  



Thanks,
Mauro

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 11/12] docs: kdoc: clean up check_sections()
  2025-07-02 22:35 ` [PATCH 11/12] docs: kdoc: clean up check_sections() Jonathan Corbet
@ 2025-07-10  6:29   ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 41+ messages in thread
From: Mauro Carvalho Chehab @ 2025-07-10  6:29 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: linux-doc, linux-kernel, Akira Yokosawa

Em Wed,  2 Jul 2025 16:35:23 -0600
Jonathan Corbet <corbet@lwn.net> escreveu:

> entry.sectcheck is just a duplicate of our list of sections that is only
> passed to check_sections(); its main purpose seems to be to avoid checking
> the special named sections.  Rework check_sections() to not use that field
> (which is then deleted), tocheck for the known sections directly, and
> tighten up the logic in general.
> 
> Signed-off-by: Jonathan Corbet <corbet@lwn.net>

LGTM.
Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

> ---
>  scripts/lib/kdoc/kdoc_parser.py | 31 +++++++++++--------------------
>  1 file changed, 11 insertions(+), 20 deletions(-)
> 
> diff --git a/scripts/lib/kdoc/kdoc_parser.py b/scripts/lib/kdoc/kdoc_parser.py
> index 7191fa94e17a..fdde14b045fe 100644
> --- a/scripts/lib/kdoc/kdoc_parser.py
> +++ b/scripts/lib/kdoc/kdoc_parser.py
> @@ -42,9 +42,11 @@ doc_decl = doc_com + KernRe(r'(\w+)', cache=False)
>  #         @{section-name}:
>  # while trying to not match literal block starts like "example::"
>  #
> +known_section_names = 'description|context|returns?|notes?|examples?'
> +known_sections = KernRe(known_section_names, flags = re.I)
>  doc_sect = doc_com + \
> -            KernRe(r'\s*(\@[.\w]+|\@\.\.\.|description|context|returns?|notes?|examples?)\s*:([^:].*)?$',
> -                flags=re.I, cache=False)
> +    KernRe(r'\s*(\@[.\w]+|\@\.\.\.|' + known_section_names + r')\s*:([^:].*)?$',
> +           flags=re.I, cache=False)
>  
>  doc_content = doc_com_body + KernRe(r'(.*)', cache=False)
>  doc_inline_start = KernRe(r'^\s*/\*\*\s*$', cache=False)
> @@ -115,7 +117,6 @@ class KernelEntry:
>          self.config = config
>  
>          self._contents = []
> -        self.sectcheck = ""
>          self.prototype = ""
>  
>          self.warnings = []
> @@ -187,7 +188,6 @@ class KernelEntry:
>              self.parameterdescs[name] = contents
>              self.parameterdesc_start_lines[name] = self.new_start_line
>  
> -            self.sectcheck += name + " "
>              self.new_start_line = 0
>  
>          else:
> @@ -478,29 +478,20 @@ class KernelDoc:
>                          self.push_parameter(ln, decl_type, param, dtype,
>                                              arg, declaration_name)
>  
> -    def check_sections(self, ln, decl_name, decl_type, sectcheck):
> +    def check_sections(self, ln, decl_name, decl_type):
>          """
>          Check for errors inside sections, emitting warnings if not found
>          parameters are described.
>          """
> -
> -        sects = sectcheck.split()
> -
> -        for sx in range(len(sects)):                  # pylint: disable=C0200
> -            err = True
> -            for param in self.entry.parameterlist:
> -                if param == sects[sx]:
> -                    err = False
> -                    break
> -
> -            if err:
> +        for section in self.entry.sections:
> +            if section not in self.entry.parameterlist and \
> +               not known_sections.search(section):
>                  if decl_type == 'function':
>                      dname = f"{decl_type} parameter"
>                  else:
>                      dname = f"{decl_type} member"
> -
>                  self.emit_msg(ln,
> -                              f"Excess {dname} '{sects[sx]}' description in '{decl_name}'")
> +                              f"Excess {dname} '{section}' description in '{decl_name}'")
>  
>      def check_return_section(self, ln, declaration_name, return_type):
>          """
> @@ -754,7 +745,7 @@ class KernelDoc:
>  
>          self.create_parameter_list(ln, decl_type, members, ';',
>                                     declaration_name)
> -        self.check_sections(ln, declaration_name, decl_type, self.entry.sectcheck)
> +        self.check_sections(ln, declaration_name, decl_type)
>  
>          # Adjust declaration for better display
>          declaration = KernRe(r'([\{;])').sub(r'\1\n', declaration)
> @@ -1018,7 +1009,7 @@ class KernelDoc:
>                            f"expecting prototype for {self.entry.identifier}(). Prototype was for {declaration_name}() instead")
>              return
>  
> -        self.check_sections(ln, declaration_name, "function", self.entry.sectcheck)
> +        self.check_sections(ln, declaration_name, "function")
>  
>          self.check_return_section(ln, declaration_name, return_type)
>  



Thanks,
Mauro

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 12/12] docs: kdoc: Improve the output text accumulation
  2025-07-02 22:35 ` [PATCH 12/12] docs: kdoc: Improve the output text accumulation Jonathan Corbet
@ 2025-07-10  6:41   ` Mauro Carvalho Chehab
  2025-07-10  7:13     ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 41+ messages in thread
From: Mauro Carvalho Chehab @ 2025-07-10  6:41 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: linux-doc, linux-kernel, Akira Yokosawa

Em Wed,  2 Jul 2025 16:35:24 -0600
Jonathan Corbet <corbet@lwn.net> escreveu:

> Building strings with repeated concatenation is somewhat inefficient in
> Python; it is better to make a list and glom them all together at the end.
> Add a small set of methods to the OutputFormat superclass to manage the
> output string, and use them throughout.
> 
> Signed-off-by: Jonathan Corbet <corbet@lwn.net>

The patch looks good to me. Just a minor nit below.

> ---
>  scripts/lib/kdoc/kdoc_output.py | 185 +++++++++++++++++---------------
>  1 file changed, 98 insertions(+), 87 deletions(-)
> 
> diff --git a/scripts/lib/kdoc/kdoc_output.py b/scripts/lib/kdoc/kdoc_output.py
> index ea8914537ba0..d4aabdaa9c51 100644
> --- a/scripts/lib/kdoc/kdoc_output.py
> +++ b/scripts/lib/kdoc/kdoc_output.py
> @@ -73,7 +73,19 @@ class OutputFormat:
>          self.config = None
>          self.no_doc_sections = False
>  
> -        self.data = ""
> +    #
> +    # Accumulation and management of the output text.
> +    #
> +    def reset_output(self):
> +        self._output = []
> +
> +    def emit(self, text):
> +        """Add a string to out output text"""
> +        self._output.append(text)
> +
> +    def output(self):
> +        """Obtain the accumulated output text"""
> +        return ''.join(self._output)

I would prefer to use a more Pythonic name for this function:

	def __str__(self)

This way, all it takes to get the final string is to use str():

	out_str = str(out)

With that:

Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

>  
>      def set_config(self, config):
>          """
> @@ -180,32 +192,31 @@ class OutputFormat:
>          Handles a single entry from kernel-doc parser
>          """
>  
> -        self.data = ""
> -
> +        self.reset_output()
>          dtype = args.type
>  
>          if dtype == "doc":
>              self.out_doc(fname, name, args)
> -            return self.data
> +            return self.output()
>  
>          if not self.check_declaration(dtype, name, args):
> -            return self.data
> +            return self.output()
>  
>          if dtype == "function":
>              self.out_function(fname, name, args)
> -            return self.data
> +            return self.output()
>  
>          if dtype == "enum":
>              self.out_enum(fname, name, args)
> -            return self.data
> +            return self.output()
>  
>          if dtype == "typedef":
>              self.out_typedef(fname, name, args)
> -            return self.data
> +            return self.output()
>  
>          if dtype in ["struct", "union"]:
>              self.out_struct(fname, name, args)
> -            return self.data
> +            return self.output()
>  
>          # Warn if some type requires an output logic
>          self.config.log.warning("doesn't now how to output '%s' block",
> @@ -274,7 +285,7 @@ class RestFormat(OutputFormat):
>  
>          if self.enable_lineno and ln is not None:
>              ln += 1
> -            self.data += f".. LINENO {ln}\n"
> +            self.emit(f".. LINENO {ln}\n")
>  
>      def output_highlight(self, args):
>          """
> @@ -326,7 +337,7 @@ class RestFormat(OutputFormat):
>  
>          # Print the output with the line prefix
>          for line in output.strip("\n").split("\n"):
> -            self.data += self.lineprefix + line + "\n"
> +            self.emit(self.lineprefix + line + "\n")
>  
>      def out_section(self, args, out_docblock=False):
>          """
> @@ -343,15 +354,15 @@ class RestFormat(OutputFormat):
>  
>              if out_docblock:
>                  if not self.out_mode == self.OUTPUT_INCLUDE:
> -                    self.data += f".. _{section}:\n\n"
> -                    self.data += f'{self.lineprefix}**{section}**\n\n'
> +                    self.emit(f".. _{section}:\n\n")
> +                    self.emit(f'{self.lineprefix}**{section}**\n\n')
>              else:
> -                self.data += f'{self.lineprefix}**{section}**\n\n'
> +                self.emit(f'{self.lineprefix}**{section}**\n\n')
>  
>              self.print_lineno(args.section_start_lines.get(section, 0))
>              self.output_highlight(text)
> -            self.data += "\n"
> -        self.data += "\n"
> +            self.emit("\n")
> +        self.emit("\n")
>  
>      def out_doc(self, fname, name, args):
>          if not self.check_doc(name, args):
> @@ -389,41 +400,41 @@ class RestFormat(OutputFormat):
>  
>          self.print_lineno(ln)
>          if args.get('typedef') or not args.get('functiontype'):
> -            self.data += f".. c:macro:: {name}\n\n"
> +            self.emit(f".. c:macro:: {name}\n\n")
>  
>              if args.get('typedef'):
> -                self.data += "   **Typedef**: "
> +                self.emit("   **Typedef**: ")
>                  self.lineprefix = ""
>                  self.output_highlight(args.get('purpose', ""))
> -                self.data += "\n\n**Syntax**\n\n"
> -                self.data += f"  ``{signature}``\n\n"
> +                self.emit("\n\n**Syntax**\n\n")
> +                self.emit(f"  ``{signature}``\n\n")
>              else:
> -                self.data += f"``{signature}``\n\n"
> +                self.emit(f"``{signature}``\n\n")
>          else:
> -            self.data += f".. c:function:: {signature}\n\n"
> +            self.emit(f".. c:function:: {signature}\n\n")
>  
>          if not args.get('typedef'):
>              self.print_lineno(ln)
>              self.lineprefix = "   "
>              self.output_highlight(args.get('purpose', ""))
> -            self.data += "\n"
> +            self.emit("\n")
>  
>          # Put descriptive text into a container (HTML <div>) to help set
>          # function prototypes apart
>          self.lineprefix = "  "
>  
>          if args.parameterlist:
> -            self.data += ".. container:: kernelindent\n\n"
> -            self.data += f"{self.lineprefix}**Parameters**\n\n"
> +            self.emit(".. container:: kernelindent\n\n")
> +            self.emit(f"{self.lineprefix}**Parameters**\n\n")
>  
>          for parameter in args.parameterlist:
>              parameter_name = KernRe(r'\[.*').sub('', parameter)
>              dtype = args.parametertypes.get(parameter, "")
>  
>              if dtype:
> -                self.data += f"{self.lineprefix}``{dtype}``\n"
> +                self.emit(f"{self.lineprefix}``{dtype}``\n")
>              else:
> -                self.data += f"{self.lineprefix}``{parameter}``\n"
> +                self.emit(f"{self.lineprefix}``{parameter}``\n")
>  
>              self.print_lineno(args.parameterdesc_start_lines.get(parameter_name, 0))
>  
> @@ -432,9 +443,9 @@ class RestFormat(OutputFormat):
>                 args.parameterdescs[parameter_name] != KernelDoc.undescribed:
>  
>                  self.output_highlight(args.parameterdescs[parameter_name])
> -                self.data += "\n"
> +                self.emit("\n")
>              else:
> -                self.data += f"{self.lineprefix}*undescribed*\n\n"
> +                self.emit(f"{self.lineprefix}*undescribed*\n\n")
>              self.lineprefix = "  "
>  
>          self.out_section(args)
> @@ -445,26 +456,26 @@ class RestFormat(OutputFormat):
>          oldprefix = self.lineprefix
>          ln = args.declaration_start_line
>  
> -        self.data += f"\n\n.. c:enum:: {name}\n\n"
> +        self.emit(f"\n\n.. c:enum:: {name}\n\n")
>  
>          self.print_lineno(ln)
>          self.lineprefix = "  "
>          self.output_highlight(args.get('purpose', ''))
> -        self.data += "\n"
> +        self.emit("\n")
>  
> -        self.data += ".. container:: kernelindent\n\n"
> +        self.emit(".. container:: kernelindent\n\n")
>          outer = self.lineprefix + "  "
>          self.lineprefix = outer + "  "
> -        self.data += f"{outer}**Constants**\n\n"
> +        self.emit(f"{outer}**Constants**\n\n")
>  
>          for parameter in args.parameterlist:
> -            self.data += f"{outer}``{parameter}``\n"
> +            self.emit(f"{outer}``{parameter}``\n")
>  
>              if args.parameterdescs.get(parameter, '') != KernelDoc.undescribed:
>                  self.output_highlight(args.parameterdescs[parameter])
>              else:
> -                self.data += f"{self.lineprefix}*undescribed*\n\n"
> -            self.data += "\n"
> +                self.emit(f"{self.lineprefix}*undescribed*\n\n")
> +            self.emit("\n")
>  
>          self.lineprefix = oldprefix
>          self.out_section(args)
> @@ -474,14 +485,14 @@ class RestFormat(OutputFormat):
>          oldprefix = self.lineprefix
>          ln = args.declaration_start_line
>  
> -        self.data += f"\n\n.. c:type:: {name}\n\n"
> +        self.emit(f"\n\n.. c:type:: {name}\n\n")
>  
>          self.print_lineno(ln)
>          self.lineprefix = "   "
>  
>          self.output_highlight(args.get('purpose', ''))
>  
> -        self.data += "\n"
> +        self.emit("\n")
>  
>          self.lineprefix = oldprefix
>          self.out_section(args)
> @@ -493,7 +504,7 @@ class RestFormat(OutputFormat):
>          dtype = args.type
>          ln = args.declaration_start_line
>  
> -        self.data += f"\n\n.. c:{dtype}:: {name}\n\n"
> +        self.emit(f"\n\n.. c:{dtype}:: {name}\n\n")
>  
>          self.print_lineno(ln)
>  
> @@ -501,20 +512,20 @@ class RestFormat(OutputFormat):
>          self.lineprefix += "  "
>  
>          self.output_highlight(purpose)
> -        self.data += "\n"
> +        self.emit("\n")
>  
> -        self.data += ".. container:: kernelindent\n\n"
> -        self.data += f"{self.lineprefix}**Definition**::\n\n"
> +        self.emit(".. container:: kernelindent\n\n")
> +        self.emit(f"{self.lineprefix}**Definition**::\n\n")
>  
>          self.lineprefix = self.lineprefix + "  "
>  
>          declaration = declaration.replace("\t", self.lineprefix)
>  
> -        self.data += f"{self.lineprefix}{dtype} {name}" + ' {' + "\n"
> -        self.data += f"{declaration}{self.lineprefix}" + "};\n\n"
> +        self.emit(f"{self.lineprefix}{dtype} {name}" + ' {' + "\n")
> +        self.emit(f"{declaration}{self.lineprefix}" + "};\n\n")
>  
>          self.lineprefix = "  "
> -        self.data += f"{self.lineprefix}**Members**\n\n"
> +        self.emit(f"{self.lineprefix}**Members**\n\n")
>          for parameter in args.parameterlist:
>              if not parameter or parameter.startswith("#"):
>                  continue
> @@ -526,15 +537,15 @@ class RestFormat(OutputFormat):
>  
>              self.print_lineno(args.parameterdesc_start_lines.get(parameter_name, 0))
>  
> -            self.data += f"{self.lineprefix}``{parameter}``\n"
> +            self.emit(f"{self.lineprefix}``{parameter}``\n")
>  
>              self.lineprefix = "    "
>              self.output_highlight(args.parameterdescs[parameter_name])
>              self.lineprefix = "  "
>  
> -            self.data += "\n"
> +            self.emit("\n")
>  
> -        self.data += "\n"
> +        self.emit("\n")
>  
>          self.lineprefix = oldprefix
>          self.out_section(args)
> @@ -610,33 +621,33 @@ class ManFormat(OutputFormat):
>                  continue
>  
>              if line[0] == ".":
> -                self.data += "\\&" + line + "\n"
> +                self.emit("\\&" + line + "\n")
>              else:
> -                self.data += line + "\n"
> +                self.emit(line + "\n")
>  
>      def out_doc(self, fname, name, args):
>          if not self.check_doc(name, args):
>              return
>  
> -        self.data += f'.TH "{self.modulename}" 9 "{self.modulename}" "{self.man_date}" "API Manual" LINUX' + "\n"
> +        self.emit(f'.TH "{self.modulename}" 9 "{self.modulename}" "{self.man_date}" "API Manual" LINUX' + "\n")
>  
>          for section, text in args.sections.items():
> -            self.data += f'.SH "{section}"' + "\n"
> +            self.emit(f'.SH "{section}"' + "\n")
>              self.output_highlight(text)
>  
>      def out_function(self, fname, name, args):
>          """output function in man"""
>  
> -        self.data += f'.TH "{name}" 9 "{name}" "{self.man_date}" "Kernel Hacker\'s Manual" LINUX' + "\n"
> +        self.emit(f'.TH "{name}" 9 "{name}" "{self.man_date}" "Kernel Hacker\'s Manual" LINUX' + "\n")
>  
> -        self.data += ".SH NAME\n"
> -        self.data += f"{name} \\- {args['purpose']}\n"
> +        self.emit(".SH NAME\n")
> +        self.emit(f"{name} \\- {args['purpose']}\n")
>  
> -        self.data += ".SH SYNOPSIS\n"
> +        self.emit(".SH SYNOPSIS\n")
>          if args.get('functiontype', ''):
> -            self.data += f'.B "{args["functiontype"]}" {name}' + "\n"
> +            self.emit(f'.B "{args["functiontype"]}" {name}' + "\n")
>          else:
> -            self.data += f'.B "{name}' + "\n"
> +            self.emit(f'.B "{name}' + "\n")
>  
>          count = 0
>          parenth = "("
> @@ -649,68 +660,68 @@ class ManFormat(OutputFormat):
>              dtype = args.parametertypes.get(parameter, "")
>              if function_pointer.match(dtype):
>                  # Pointer-to-function
> -                self.data += f'".BI "{parenth}{function_pointer.group(1)}" " ") ({function_pointer.group(2)}){post}"' + "\n"
> +                self.emit(f'".BI "{parenth}{function_pointer.group(1)}" " ") ({function_pointer.group(2)}){post}"' + "\n")
>              else:
>                  dtype = KernRe(r'([^\*])$').sub(r'\1 ', dtype)
>  
> -                self.data += f'.BI "{parenth}{dtype}"  "{post}"' + "\n"
> +                self.emit(f'.BI "{parenth}{dtype}"  "{post}"' + "\n")
>              count += 1
>              parenth = ""
>  
>          if args.parameterlist:
> -            self.data += ".SH ARGUMENTS\n"
> +            self.emit(".SH ARGUMENTS\n")
>  
>          for parameter in args.parameterlist:
>              parameter_name = re.sub(r'\[.*', '', parameter)
>  
> -            self.data += f'.IP "{parameter}" 12' + "\n"
> +            self.emit(f'.IP "{parameter}" 12' + "\n")
>              self.output_highlight(args.parameterdescs.get(parameter_name, ""))
>  
>          for section, text in args.sections.items():
> -            self.data += f'.SH "{section.upper()}"' + "\n"
> +            self.emit(f'.SH "{section.upper()}"' + "\n")
>              self.output_highlight(text)
>  
>      def out_enum(self, fname, name, args):
> -        self.data += f'.TH "{self.modulename}" 9 "enum {name}" "{self.man_date}" "API Manual" LINUX' + "\n"
> +        self.emit(f'.TH "{self.modulename}" 9 "enum {name}" "{self.man_date}" "API Manual" LINUX' + "\n")
>  
> -        self.data += ".SH NAME\n"
> -        self.data += f"enum {name} \\- {args['purpose']}\n"
> +        self.emit(".SH NAME\n")
> +        self.emit(f"enum {name} \\- {args['purpose']}\n")
>  
> -        self.data += ".SH SYNOPSIS\n"
> -        self.data += f"enum {name}" + " {\n"
> +        self.emit(".SH SYNOPSIS\n")
> +        self.emit(f"enum {name}" + " {\n")
>  
>          count = 0
>          for parameter in args.parameterlist:
> -            self.data += f'.br\n.BI "    {parameter}"' + "\n"
> +            self.emit(f'.br\n.BI "    {parameter}"' + "\n")
>              if count == len(args.parameterlist) - 1:
> -                self.data += "\n};\n"
> +                self.emit("\n};\n")
>              else:
> -                self.data += ", \n.br\n"
> +                self.emit(", \n.br\n")
>  
>              count += 1
>  
> -        self.data += ".SH Constants\n"
> +        self.emit(".SH Constants\n")
>  
>          for parameter in args.parameterlist:
>              parameter_name = KernRe(r'\[.*').sub('', parameter)
> -            self.data += f'.IP "{parameter}" 12' + "\n"
> +            self.emit(f'.IP "{parameter}" 12' + "\n")
>              self.output_highlight(args.parameterdescs.get(parameter_name, ""))
>  
>          for section, text in args.sections.items():
> -            self.data += f'.SH "{section}"' + "\n"
> +            self.emit(f'.SH "{section}"' + "\n")
>              self.output_highlight(text)
>  
>      def out_typedef(self, fname, name, args):
>          module = self.modulename
>          purpose = args.get('purpose')
>  
> -        self.data += f'.TH "{module}" 9 "{name}" "{self.man_date}" "API Manual" LINUX' + "\n"
> +        self.emit(f'.TH "{module}" 9 "{name}" "{self.man_date}" "API Manual" LINUX' + "\n")
>  
> -        self.data += ".SH NAME\n"
> -        self.data += f"typedef {name} \\- {purpose}\n"
> +        self.emit(".SH NAME\n")
> +        self.emit(f"typedef {name} \\- {purpose}\n")
>  
>          for section, text in args.sections.items():
> -            self.data += f'.SH "{section}"' + "\n"
> +            self.emit(f'.SH "{section}"' + "\n")
>              self.output_highlight(text)
>  
>      def out_struct(self, fname, name, args):
> @@ -718,20 +729,20 @@ class ManFormat(OutputFormat):
>          purpose = args.get('purpose')
>          definition = args.get('definition')
>  
> -        self.data += f'.TH "{module}" 9 "{args.type} {name}" "{self.man_date}" "API Manual" LINUX' + "\n"
> +        self.emit(f'.TH "{module}" 9 "{args.type} {name}" "{self.man_date}" "API Manual" LINUX' + "\n")
>  
> -        self.data += ".SH NAME\n"
> -        self.data += f"{args.type} {name} \\- {purpose}\n"
> +        self.emit(".SH NAME\n")
> +        self.emit(f"{args.type} {name} \\- {purpose}\n")
>  
>          # Replace tabs with two spaces and handle newlines
>          declaration = definition.replace("\t", "  ")
>          declaration = KernRe(r"\n").sub('"\n.br\n.BI "', declaration)
>  
> -        self.data += ".SH SYNOPSIS\n"
> -        self.data += f"{args.type} {name} " + "{" + "\n.br\n"
> -        self.data += f'.BI "{declaration}\n' + "};\n.br\n\n"
> +        self.emit(".SH SYNOPSIS\n")
> +        self.emit(f"{args.type} {name} " + "{" + "\n.br\n")
> +        self.emit(f'.BI "{declaration}\n' + "};\n.br\n\n")
>  
> -        self.data += ".SH Members\n"
> +        self.emit(".SH Members\n")
>          for parameter in args.parameterlist:
>              if parameter.startswith("#"):
>                  continue
> @@ -741,9 +752,9 @@ class ManFormat(OutputFormat):
>              if args.parameterdescs.get(parameter_name) == KernelDoc.undescribed:
>                  continue
>  
> -            self.data += f'.IP "{parameter}" 12' + "\n"
> +            self.emit(f'.IP "{parameter}" 12' + "\n")
>              self.output_highlight(args.parameterdescs.get(parameter_name))
>  
>          for section, text in args.sections.items():
> -            self.data += f'.SH "{section}"' + "\n"
> +            self.emit(f'.SH "{section}"' + "\n")
>              self.output_highlight(text)



Thanks,
Mauro

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 12/12] docs: kdoc: Improve the output text accumulation
  2025-07-10  6:41   ` Mauro Carvalho Chehab
@ 2025-07-10  7:13     ` Mauro Carvalho Chehab
  2025-07-10  8:19       ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 41+ messages in thread
From: Mauro Carvalho Chehab @ 2025-07-10  7:13 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: linux-doc, linux-kernel, Akira Yokosawa

Em Thu, 10 Jul 2025 08:41:19 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> escreveu:

> Em Wed,  2 Jul 2025 16:35:24 -0600
> Jonathan Corbet <corbet@lwn.net> escreveu:
> 
> > Building strings with repeated concatenation is somewhat inefficient in
> > Python; it is better to make a list and glom them all together at the end.
> > Add a small set of methods to the OutputFormat superclass to manage the
> > output string, and use them throughout.
> > 
> > Signed-off-by: Jonathan Corbet <corbet@lwn.net>  
> 
> The patch looks good to me. Just a minor nit below.
> 
> > ---
> >  scripts/lib/kdoc/kdoc_output.py | 185 +++++++++++++++++---------------
> >  1 file changed, 98 insertions(+), 87 deletions(-)
> > 
> > diff --git a/scripts/lib/kdoc/kdoc_output.py b/scripts/lib/kdoc/kdoc_output.py
> > index ea8914537ba0..d4aabdaa9c51 100644
> > --- a/scripts/lib/kdoc/kdoc_output.py
> > +++ b/scripts/lib/kdoc/kdoc_output.py
> > @@ -73,7 +73,19 @@ class OutputFormat:
> >          self.config = None
> >          self.no_doc_sections = False
> >  
> > -        self.data = ""
> > +    #
> > +    # Accumulation and management of the output text.
> > +    #
> > +    def reset_output(self):
> > +        self._output = []
> > +
> > +    def emit(self, text):
> > +        """Add a string to out output text"""
> > +        self._output.append(text)
> > +
> > +    def output(self):
> > +        """Obtain the accumulated output text"""
> > +        return ''.join(self._output)  
> 
> I would prefer to use a more Pythonic name for this function:
> 
> 	def __str__(self)
> 
> This way, all it takes to get the final string is to use str():
> 
> 	out_str = str(out)
> 
> With that:
> 
> Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>


Hmm... actually, I would code it on a different way, using something like:

class OutputString:
    def __init__(self):
	"""Initialize internal list"""
        self._output = []
    
    # Probably not needed - The code can simply do, instead:
    # a = OutputString() to create a new string.
    def reset(self):
        """Reset the output text"""
        self._output = []
    
    def __add__(self, text):
	"""Add a string to out output text"""
        if not isinstance(text, str):
            raise TypeError("Can only append strings")
        self._output.append(text)
        return self

    def __str__(self):
        return ''.join(self._output)
    
    # and, if needed, add a getter/setter:

    @property
    def data(self):
        """Getter for the current output"""
        return ''.join(self._output)

    @data.setter
    def data(self, new_value):
        if isinstance(new_value, str):
	    self._output = [new_value]
	elif isinstance(new_value, list):
            self._output = new_value
        else:
            raise TypeError("Value should be either list or string")

That would allow things like:

	out = OutputString()
	out = out + "Foo" + " " + "Bar"
	print(out)

	out = OutputString()
	out += "Foo"
	out += " "
	out += "Bar"
	return(str(out))

and won't require much changes at the output logic, and IMO will
provide a cleaner code.

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 12/12] docs: kdoc: Improve the output text accumulation
  2025-07-10  7:13     ` Mauro Carvalho Chehab
@ 2025-07-10  8:19       ` Mauro Carvalho Chehab
  2025-07-10 10:10         ` Mauro Carvalho Chehab
  2025-07-10 23:30         ` Jonathan Corbet
  0 siblings, 2 replies; 41+ messages in thread
From: Mauro Carvalho Chehab @ 2025-07-10  8:19 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: linux-doc, linux-kernel, Akira Yokosawa

Em Thu, 10 Jul 2025 09:13:52 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> escreveu:

> Em Thu, 10 Jul 2025 08:41:19 +0200
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> escreveu:
> 
> > Em Wed,  2 Jul 2025 16:35:24 -0600
> > Jonathan Corbet <corbet@lwn.net> escreveu:
> >   
> > > Building strings with repeated concatenation is somewhat inefficient in
> > > Python; it is better to make a list and glom them all together at the end.
> > > Add a small set of methods to the OutputFormat superclass to manage the
> > > output string, and use them throughout.
> > > 
> > > Signed-off-by: Jonathan Corbet <corbet@lwn.net>    
> > 
> > The patch looks good to me. Just a minor nit below.
> >   
> > > ---
> > >  scripts/lib/kdoc/kdoc_output.py | 185 +++++++++++++++++---------------
> > >  1 file changed, 98 insertions(+), 87 deletions(-)
> > > 
> > > diff --git a/scripts/lib/kdoc/kdoc_output.py b/scripts/lib/kdoc/kdoc_output.py
> > > index ea8914537ba0..d4aabdaa9c51 100644
> > > --- a/scripts/lib/kdoc/kdoc_output.py
> > > +++ b/scripts/lib/kdoc/kdoc_output.py
> > > @@ -73,7 +73,19 @@ class OutputFormat:
> > >          self.config = None
> > >          self.no_doc_sections = False
> > >  
> > > -        self.data = ""
> > > +    #
> > > +    # Accumulation and management of the output text.
> > > +    #
> > > +    def reset_output(self):
> > > +        self._output = []
> > > +
> > > +    def emit(self, text):
> > > +        """Add a string to out output text"""
> > > +        self._output.append(text)
> > > +
> > > +    def output(self):
> > > +        """Obtain the accumulated output text"""
> > > +        return ''.join(self._output)    
> > 
> > I would prefer to use a more Pythonic name for this function:
> > 
> > 	def __str__(self)
> > 
> > This way, all it takes to get the final string is to use str():
> > 
> > 	out_str = str(out)
> > 
> > With that:
> > 
> > Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>  
> 
> 
> Hmm... actually, I would code it on a different way, using something like:
> 
> class OutputString:
>     def __init__(self):
> 	"""Initialize internal list"""
>         self._output = []
>     
>     # Probably not needed - The code can simply do, instead:
>     # a = OutputString() to create a new string.
>     def reset(self):
>         """Reset the output text"""
>         self._output = []
>     
>     def __add__(self, text):
> 	"""Add a string to out output text"""
>         if not isinstance(text, str):
>             raise TypeError("Can only append strings")
>         self._output.append(text)
>         return self
> 
>     def __str__(self):
>         return ''.join(self._output)
>     
>     # and, if needed, add a getter/setter:
> 
>     @property
>     def data(self):
>         """Getter for the current output"""
>         return ''.join(self._output)
> 
>     @data.setter
>     def data(self, new_value):
>         if isinstance(new_value, str):
> 	    self._output = [new_value]
> 	elif isinstance(new_value, list):
>             self._output = new_value
>         else:
>             raise TypeError("Value should be either list or string")
> 
> That would allow things like:
> 
> 	out = OutputString()
> 	out = out + "Foo" + " " + "Bar"
> 	print(out)
> 
> 	out = OutputString()
> 	out += "Foo"
> 	out += " "
> 	out += "Bar"
> 	return(str(out))
> 
> and won't require much changes at the output logic, and IMO will
> provide a cleaner code.
> 
> Thanks,
> Mauro

Heh, on those times where LLM can quickly code trivial things for us,
I actually decided to test 3 different variants:

- using string +=
- using list append
- using __add__
- using __iadd__

Except if the LLM-generated did something wrong (I double checked, and
was unable to find any issues), the results on Python 3.13.5 are:

$ /tmp/bench.py
Benchmarking 1,000 ops × 1000 strings = 1,000,000 appends

Run    str+=        ExplicitList __add__      __iadd__    
------------------------------------------------------------
1      25.26        29.44        53.42        50.71       
2      29.34        29.35        53.45        50.61       
3      29.44        29.56        53.41        50.67       
4      29.28        29.23        53.26        50.64       
5      29.28        29.20        45.90        40.47       
6      23.53        23.62        42.74        40.61       
7      23.43        23.76        42.97        40.78       
8      23.51        23.59        42.67        40.61       
9      23.43        23.52        42.77        40.72       
10     23.53        23.54        42.78        40.67       
11     23.83        23.63        42.98        40.87       
12     23.49        23.45        42.67        40.53       
13     23.43        23.69        42.75        40.66       
14     23.47        23.49        42.70        40.56       
15     23.44        23.63        42.72        40.52       
16     23.51        23.56        42.65        40.66       
17     23.48        23.60        42.86        40.81       
18     23.67        23.53        42.73        40.59       
19     23.75        23.62        42.78        40.58       
20     23.68        23.55        42.77        40.54       
21     23.65        23.67        42.76        40.59       
22     23.73        23.49        42.78        40.61       
23     23.61        23.59        42.78        40.58       
24     23.66        23.51        42.73        40.55       
------------------------------------------------------------
Avg    24.60        24.78        44.67        42.30       

Summary:
ExplicitList : 100.74% slower than str+=
__add__      : 181.56% slower than str+=
__iadd__     : 171.93% slower than str+=

(running it a couple of times sometimes it sometimes show list
 addition a little bit better, bu all at the +/- 1% range)

In practice, it means that my suggestion of using __add__ (or even
using the __iadd__ variant) was not good, but it also showed
that Python 3.13 implementation is actually very efficient
with str += operations.

With that, I would just drop this patch, as the performance is
almost identical, and using "emit()" instead of "+=" IMO makes
the code less clear.

-

Btw, with Python 3.9, "".join(list) is a lot worse than str += :

$ python3.9 /tmp/bench.py
Benchmarking 1,000 ops × 1000 strings = 1,000,000 appends

Run    str+=        ExplicitList __add__      __iadd__    
------------------------------------------------------------
1      28.27        87.24        96.03        88.81       
2      32.76        87.35        87.40        88.92       
3      32.69        85.98        73.01        70.87       
4      26.28        69.80        70.62        71.90       
5      27.21        70.54        71.04        72.00       
6      27.77        70.06        70.22        70.92       
7      27.03        69.75        70.30        70.89       
8      33.31        72.63        70.57        70.59       
9      26.33        70.15        70.27        70.97       
10     26.29        69.84        71.60        70.94       
11     26.59        69.60        70.16        71.26       
12     26.38        69.57        71.64        70.95       
13     26.41        69.89        70.11        70.85       
14     26.38        69.86        70.36        70.93       
15     26.43        69.57        70.18        70.90       
16     26.38        70.04        70.26        71.19       
17     26.40        70.02        80.50        71.01       
18     26.41        71.74        80.39        71.90       
19     28.06        69.60        71.95        70.88       
20     28.28        69.90        71.12        71.07       
21     26.34        69.74        72.42        71.02       
22     26.33        69.86        70.25        70.97       
23     26.40        69.78        71.64        71.10       
24     26.44        69.73        70.23        70.83       
------------------------------------------------------------
Avg    27.55        72.18        73.43        72.57       

Summary:
ExplicitList : 262.00% slower than str+=
__add__      : 266.54% slower than str+=
__iadd__     : 263.42% slower than str+=


Thanks,
Mauro

---

#!/usr/bin/env python3

import timeit

class ExplicitList:
    def __init__(self):
        self._output = []

    def emit(self, text):
        self._output.append(text)

    def output(self):
        return ''.join(self._output)

class OutputStringAdd:
    def __init__(self):
        self._output = []

    def __add__(self, text):
        self._output.append(text)
        return self

    def __str__(self):
        return ''.join(self._output)

class OutputStringIAdd:
    def __init__(self):
        self._output = []

    def __iadd__(self, text):
        self._output.append(text)
        return self

    def __str__(self):
        return ''.join(self._output)

def calculate_comparison(base_time, compare_time):
    """Returns tuple of (is_faster, percentage)"""
    if compare_time < base_time:
        return (True, (1 - compare_time/base_time)*100)
    else:
        return (False, (compare_time/base_time)*100)

def benchmark():
    N = 1000       # Operations
    STRINGS_PER_OP = 1000
    REPEATS = 24

    # Generate test data (1000 unique 10-character strings)
    test_strings = [f"string_{i:03d}" for i in range(STRINGS_PER_OP)]

    print(f"Benchmarking {N:,} ops × {STRINGS_PER_OP} strings = {N*STRINGS_PER_OP:,} appends\n")
    headers = ['Run', 'str+=', 'ExplicitList', '__add__', '__iadd__']
    print(f"{headers[0]:<6} {headers[1]:<12} {headers[2]:<12} {headers[3]:<12} {headers[4]:<12}")
    print("-" * 60)

    results = []

    for i in range(REPEATS):
        # Benchmark normal string +=
        t_str = timeit.timeit(
            'result = ""\nfor s in test_strings: result += s',
            globals={'test_strings': test_strings},
            number=N
        ) * 1000

        # Benchmark ExplicitList
        t_explicit = timeit.timeit(
            'obj = ExplicitList()\nfor s in test_strings: obj.emit(s)',
            globals={'test_strings': test_strings, 'ExplicitList': ExplicitList},
            number=N
        ) * 1000

        # Benchmark __add__ version
        t_add = timeit.timeit(
            'obj = OutputStringAdd()\nfor s in test_strings: obj += s',
            globals={'test_strings': test_strings, 'OutputStringAdd': OutputStringAdd},
            number=N
        ) * 1000

        # Benchmark __iadd__ version
        t_iadd = timeit.timeit(
            'obj = OutputStringIAdd()\nfor s in test_strings: obj += s',
            globals={'test_strings': test_strings, 'OutputStringIAdd': OutputStringIAdd},
            number=N
        ) * 1000

        results.append((t_str, t_explicit, t_add, t_iadd))
        print(f"{i+1:<6} {t_str:<12.2f} {t_explicit:<12.2f} {t_add:<12.2f} {t_iadd:<12.2f}")

    # Calculate averages
    avg_str = sum(r[0] for r in results) / REPEATS
    avg_explicit = sum(r[1] for r in results) / REPEATS
    avg_add = sum(r[2] for r in results) / REPEATS
    avg_iadd = sum(r[3] for r in results) / REPEATS

    print("-" * 60)
    print(f"{'Avg':<6} {avg_str:<12.2f} {avg_explicit:<12.2f} {avg_add:<12.2f} {avg_iadd:<12.2f}")

    print()
    print("Summary:")
    # Calculate and print comparisons
    for name, time in [("ExplicitList", avg_explicit),
                      ("__add__", avg_add),
                      ("__iadd__", avg_iadd)]:
        is_faster, percentage = calculate_comparison(avg_str, time)
        if is_faster:
            print(f"{name:<12} : {percentage:.2f}% faster than str+=")
        else:
            print(f"{name:<12} : {percentage:.2f}% slower than str+=")


if __name__ == "__main__":
    benchmark()



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 12/12] docs: kdoc: Improve the output text accumulation
  2025-07-10  8:19       ` Mauro Carvalho Chehab
@ 2025-07-10 10:10         ` Mauro Carvalho Chehab
  2025-07-10 10:31           ` Mauro Carvalho Chehab
  2025-07-10 23:30         ` Jonathan Corbet
  1 sibling, 1 reply; 41+ messages in thread
From: Mauro Carvalho Chehab @ 2025-07-10 10:10 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: linux-doc, linux-kernel, Akira Yokosawa

Em Thu, 10 Jul 2025 10:19:31 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> escreveu:

> Em Thu, 10 Jul 2025 09:13:52 +0200
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> escreveu:
> 

> Heh, on those times where LLM can quickly code trivial things for us,
> I actually decided to test 3 different variants:
> 
> - using string +=
> - using list append
> - using __add__
> - using __iadd__

Manually reorganized the LLM-generated code, in order to get more
precise results. Script enclosed at the end.

  $ for i in python3.9 python3.13 python3.13t; do echo "  $i:"; $i /tmp/bench.py 100000 10 1; $i /tmp/bench.py 1000 1000 1; done
  python3.9:
    10 strings in a loop with 100000 interactions, repeating 24 times
        str +=       : time: 25.21
        list join    : time: 72.65: 188.18% slower than str +=
        __add__      : time: 71.82: 184.88% slower than str +=
        __iadd__     : time: 67.84: 169.09% slower than str +=
    1000 strings in a loop with 1000 interactions, repeating 24 times
        str +=       : time: 24.29
        list join    : time: 58.76: 141.88% slower than str +=
        __add__      : time: 58.68: 141.54% slower than str +=
        __iadd__     : time: 55.48: 128.37% slower than str +=
  python3.13:
    10 strings in a loop with 100000 interactions, repeating 24 times
        str +=       : time: 28.01
        list join    : time: 32.46: 15.91% slower than str +=
        __add__      : time: 52.56: 87.66% slower than str +=
        __iadd__     : time: 58.69: 109.55% slower than str +=
    1000 strings in a loop with 1000 interactions, repeating 24 times
        str +=       : time: 22.03
        list join    : time: 23.38: 6.12% slower than str +=
        __add__      : time: 44.25: 100.86% slower than str +=
        __iadd__     : time: 40.70: 84.74% slower than str +=
  python3.13t:
    10 strings in a loop with 100000 interactions, repeating 24 times
        str +=       : time: 25.65
        list join    : time: 74.95: 192.18% slower than str +=
        __add__      : time: 83.04: 223.71% slower than str +=
        __iadd__     : time: 79.07: 208.23% slower than str +=
    1000 strings in a loop with 1000 interactions, repeating 24 times
        str +=       : time: 57.39
        list join    : time: 62.31: 8.58% slower than str +=
        __add__      : time: 70.65: 23.10% slower than str +=
        __iadd__     : time: 68.67: 19.65% slower than str +=

From the above:

- It is not worth applying patch 12/12 as it makes the code slower;
- Python 3.13t (no-GIL version) had very bad results. It seems it
  still requires optimization;
- Python 3.9 is a lot worse (140% to 190%) when using list append;
- when there are not many concats, Python 3.13 is about 15% slower
  with lists than concat strings. It only approaches str concat
  when the number of concats is high.

With the above, clearly str += is faster than list append.

So, except if I did something wrong on this benchmark script, please
don't apply patch 12/12.

Regards,
Mauro

---

Benchmark code:

#!/usr/bin/env python3

import argparse
import time
import sys

def benchmark_str_concat(test_strings, n_ops):
    start = time.time()
    for _ in range(n_ops):
        result = ""
        for s in test_strings:
            result += s
    return (time.time() - start) * 1000

def benchmark_explicit_list(test_strings, n_ops):
    class ExplicitList:
        def __init__(self):
            self._output = []

        def emit(self, text):
            self._output.append(text)

        def output(self):
            return ''.join(self._output)

    start = time.time()
    for _ in range(n_ops):
        obj = ExplicitList()
        for s in test_strings:
            obj.emit(s)
    return (time.time() - start) * 1000

def benchmark_add_overload(test_strings, n_ops):
    class OutputStringAdd:
        def __init__(self):
            self._output = []

        def __add__(self, text):
            self._output.append(text)
            return self

        def __str__(self):
            return ''.join(self._output)

    start = time.time()
    for _ in range(n_ops):
        obj = OutputStringAdd()
        for s in test_strings:
            obj += s
    return (time.time() - start) * 1000

def benchmark_iadd_overload(test_strings, n_ops):
    class OutputStringIAdd:
        def __init__(self):
            self._output = []

        def __iadd__(self, text):
            self._output.append(text)
            return self

        def __str__(self):
            return ''.join(self._output)

    start = time.time()
    for _ in range(n_ops):
        obj = OutputStringIAdd()
        for s in test_strings:
            obj += s
    return (time.time() - start) * 1000

def calculate_comparison(base_time, compare_time):
    if compare_time < base_time:
        return (True, (1 - compare_time/base_time)*100)
    return (False, (compare_time/base_time - 1)*100)

def benchmark(num_reps, strings_per_run, repeats, detail):
    test_strings = [f"string_{i:03d}" for i in range(strings_per_run)]

    # Create benchmark execution order list
    benchmarks = [
        ("str +=", benchmark_str_concat),
        ("list join", benchmark_explicit_list),
        ("__add__", benchmark_add_overload),
        ("__iadd__", benchmark_iadd_overload)
    ]

    # Use all possible permutations of benchmark order to reduce any
    # noise due to CPU caches
    all_orders = [
        (0, 1, 2, 3), (0, 1, 3, 2), (0, 2, 1, 3), (0, 2, 3, 1),
        (0, 3, 1, 2), (0, 3, 2, 1), (1, 0, 2, 3), (1, 0, 3, 2),
        (1, 2, 0, 3), (1, 2, 3, 0), (1, 3, 0, 2), (1, 3, 2, 0),
        (2, 0, 1, 3), (2, 0, 3, 1), (2, 1, 0, 3), (2, 1, 3, 0),
        (2, 3, 0, 1), (2, 3, 1, 0), (3, 0, 1, 2), (3, 0, 2, 1),
        (3, 1, 0, 2), (3, 1, 2, 0), (3, 2, 0, 1), (3, 2, 1, 0)
    ]

    results = {}
    for name, _ in benchmarks:
        results[name] = 0

    # Warm-up phase to reduce caching issues
    for name, fn in benchmarks:
        fn(test_strings, 1)

    n_repeats = len(all_orders) * repeats
    print(f"    {strings_per_run} strings in a loop with {num_reps} interactions, repeating {n_repeats} times")

    # Actual benchmark starts here
    i = 0
    if detail:
        headers = ['Run'] + [name for name, _ in benchmarks]
        print()
        print(f"\t{headers[0]:<6} {headers[1]:<12} {headers[2]:<12} {headers[3]:<12} {headers[4]:<12}")
        print("\t" + "-" * 60)
    for _ in range(repeats):
        # Shuffle execution order each run
        for order in all_orders:
            run_results = {}
            for idx in order:
                name, func = benchmarks[idx]
                run_results[name] = func(test_strings, num_reps)
                results[name] += run_results[name]

            if detail:
                # Output results in consistent order
                print(f"\t{i+1:<6}", end=" ")
                for name, _ in benchmarks:
                    print(f"{run_results[name]:<12.2f}", end=" ")
                print()

            i += 1

    avg_results = {}
    for name, _ in benchmarks:
        avg_results[name] = results[name] / repeats / len(all_orders)

    if detail:
        print("\t" + "-" * 60)
        print(f"\t      ", end=" ")
        for name, _ in benchmarks:
            print(f"{avg_results[name]:<12.2f}", end=" ")
        print()
        print()

    ref = benchmarks.pop(0)

    print(f"\t{ref[0]:<12} : time: {avg_results[ref[0]]:3.2f}")
    for name, _ in benchmarks:
        is_faster, percentage = calculate_comparison(avg_results[ref[0]], avg_results[name])
        direction = "faster" if is_faster else "slower"
        print(f"\t{name:<12} : time: {avg_results[name]:3.2f}: {percentage:3.2f}% {direction} than {ref[0]}")



if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument('-d', '--detail', action='store_true',
                       help='Enable detailed output')

    args, remaining = parser.parse_known_args()

    # Then handle the positional arguments manually
    if len(remaining) != 3:
        print(f"Usage: {sys.argv[0]} [-d] <num_repetitions> <strings_per_op> <repeats>")
        sys.exit(1)

    num_reps = int(remaining[0])
    strings_per_op = int(remaining[1])
    repeats = int(remaining[2])

    num_reps = int(sys.argv[1])
    strings_per_op = int(sys.argv[2])
    repeats = int(sys.argv[3])

    benchmark(num_reps, strings_per_op, repeats, args.detail)


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 12/12] docs: kdoc: Improve the output text accumulation
  2025-07-10 10:10         ` Mauro Carvalho Chehab
@ 2025-07-10 10:31           ` Mauro Carvalho Chehab
  2025-07-10 10:59             ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 41+ messages in thread
From: Mauro Carvalho Chehab @ 2025-07-10 10:31 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: linux-doc, linux-kernel, Akira Yokosawa

Em Thu, 10 Jul 2025 12:10:33 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> escreveu:

> From the above:
> 
> - It is not worth applying patch 12/12 as it makes the code slower;
> - Python 3.13t (no-GIL version) had very bad results. It seems it
>   still requires optimization;
> - Python 3.9 is a lot worse (140% to 190%) when using list append;
> - when there are not many concats, Python 3.13 is about 15% slower
>   with lists than concat strings. It only approaches str concat
>   when the number of concats is high.
> 
> With the above, clearly str += is faster than list append.
> 
> So, except if I did something wrong on this benchmark script, please
> don't apply patch 12/12.

And I did: I forgot the final line at the concat code to get the
result as strings.

For explicit list:
	result = obj.output()

For implicit ones:
	result = str(obj)

Yet, the conclusion is similar. With Python 3.13:

    $ for i in python3.13; do for j in 1 10 100 1000; do $i /tmp/bench.py $((1000000/$j)) $j 1; done; done
    1 strings in a loop with 1000000 interactions, repeating 24 times
        str +=       : time: 41.42
        list join    : time: 127.33: 207.42% slower than str +=
    10 strings in a loop with 100000 interactions, repeating 24 times
        str +=       : time: 27.15
        list join    : time: 39.19: 44.36% slower than str +=
    100 strings in a loop with 10000 interactions, repeating 24 times
        str +=       : time: 24.84
        list join    : time: 30.70: 23.57% slower than str +=
    1000 strings in a loop with 1000 interactions, repeating 24 times
        str +=       : time: 21.84
        list join    : time: 27.85: 27.50% slower than str +=

Explict list concat was between ~30% to ~200% worse than str concat.


Thanks,
Mauro

---

#!/usr/bin/env python3

import argparse
import time
import sys

def benchmark_str_concat(test_strings, n_ops):
    start = time.time()
    for _ in range(n_ops):
        result = ""
        for s in test_strings:
            result += s
    return (time.time() - start) * 1000

def benchmark_explicit_list(test_strings, n_ops):
    class ExplicitList:
        def __init__(self):
            self._output = []

        def emit(self, text):
            self._output.append(text)

        def output(self):
            return ''.join(self._output)

    start = time.time()
    for _ in range(n_ops):
        obj = ExplicitList()
        for s in test_strings:
            obj.emit(s)

        result = obj.output()

    return (time.time() - start) * 1000

def benchmark_add_overload(test_strings, n_ops):
    class OutputStringAdd:
        def __init__(self):
            self._output = []

        def __add__(self, text):
            self._output.append(text)
            return self

        def __str__(self):
            return ''.join(self._output)

    start = time.time()
    for _ in range(n_ops):
        obj = OutputStringAdd()
        for s in test_strings:
            obj += s

        result = str(obj)

    return (time.time() - start) * 1000

def benchmark_iadd_overload(test_strings, n_ops):
    class OutputStringIAdd:
        def __init__(self):
            self._output = []

        def __iadd__(self, text):
            self._output.append(text)
            return self

        def __str__(self):
            return ''.join(self._output)

    start = time.time()
    for _ in range(n_ops):
        obj = OutputStringIAdd()
        for s in test_strings:
            obj += s
        result = str(obj)

    return (time.time() - start) * 1000

def calculate_comparison(base_time, compare_time):
    if compare_time < base_time:
        return (True, (1 - compare_time/base_time)*100)
    return (False, (compare_time/base_time - 1)*100)

def benchmark(num_reps, strings_per_run, repeats, detail):
    test_strings = [f"string_{i:03d}" for i in range(strings_per_run)]

    # Create benchmark execution order list
    benchmarks = [
        ("str +=", benchmark_str_concat),
        ("list join", benchmark_explicit_list),
        ("__add__", benchmark_add_overload),
        ("__iadd__", benchmark_iadd_overload)
    ]

    # Use all possible permutations of benchmark order to reduce any
    # noise due to CPU caches
    all_orders = [
        (0, 1, 2, 3), (0, 1, 3, 2), (0, 2, 1, 3), (0, 2, 3, 1),
        (0, 3, 1, 2), (0, 3, 2, 1), (1, 0, 2, 3), (1, 0, 3, 2),
        (1, 2, 0, 3), (1, 2, 3, 0), (1, 3, 0, 2), (1, 3, 2, 0),
        (2, 0, 1, 3), (2, 0, 3, 1), (2, 1, 0, 3), (2, 1, 3, 0),
        (2, 3, 0, 1), (2, 3, 1, 0), (3, 0, 1, 2), (3, 0, 2, 1),
        (3, 1, 0, 2), (3, 1, 2, 0), (3, 2, 0, 1), (3, 2, 1, 0)
    ]

    results = {}
    for name, _ in benchmarks:
        results[name] = 0

    # Warm-up phase to reduce caching issues
    for name, fn in benchmarks:
        fn(test_strings, 1)

    n_repeats = len(all_orders) * repeats
    print(f"    {strings_per_run} strings in a loop with {num_reps} interactions, repeating {n_repeats} times")

    # Actual benchmark starts here
    i = 0
    if detail:
        headers = ['Run'] + [name for name, _ in benchmarks]
        print()
        print(f"\t{headers[0]:<6} {headers[1]:<12} {headers[2]:<12} {headers[3]:<12} {headers[4]:<12}")
        print("\t" + "-" * 60)
    for _ in range(repeats):
        # Shuffle execution order each run
        for order in all_orders:
            run_results = {}
            for idx in order:
                name, func = benchmarks[idx]
                run_results[name] = func(test_strings, num_reps)
                results[name] += run_results[name]

            if detail:
                # Output results in consistent order
                print(f"\t{i+1:<6}", end=" ")
                for name, _ in benchmarks:
                    print(f"{run_results[name]:<12.2f}", end=" ")
                print()

            i += 1

    avg_results = {}
    for name, _ in benchmarks:
        avg_results[name] = results[name] / repeats / len(all_orders)

    if detail:
        print("\t" + "-" * 60)
        print(f"\t      ", end=" ")
        for name, _ in benchmarks:
            print(f"{avg_results[name]:<12.2f}", end=" ")
        print()
        print()

    ref = benchmarks.pop(0)

    print(f"\t{ref[0]:<12} : time: {avg_results[ref[0]]:3.2f}")
    for name, _ in benchmarks:
        is_faster, percentage = calculate_comparison(avg_results[ref[0]], avg_results[name])
        direction = "faster" if is_faster else "slower"
        print(f"\t{name:<12} : time: {avg_results[name]:3.2f}: {percentage:3.2f}% {direction} than {ref[0]}")



if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument('-d', '--detail', action='store_true',
                       help='Enable detailed output')

    args, remaining = parser.parse_known_args()

    # Then handle the positional arguments manually
    if len(remaining) != 3:
        print(f"Usage: {sys.argv[0]} [-d] <num_repetitions> <strings_per_op> <repeats>")
        sys.exit(1)

    num_reps = int(remaining[0])
    strings_per_op = int(remaining[1])
    repeats = int(remaining[2])

    num_reps = int(sys.argv[1])
    strings_per_op = int(sys.argv[2])
    repeats = int(sys.argv[3])

    benchmark(num_reps, strings_per_op, repeats, args.detail)




^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 12/12] docs: kdoc: Improve the output text accumulation
  2025-07-10 10:31           ` Mauro Carvalho Chehab
@ 2025-07-10 10:59             ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 41+ messages in thread
From: Mauro Carvalho Chehab @ 2025-07-10 10:59 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: linux-doc, linux-kernel, Akira Yokosawa

Em Thu, 10 Jul 2025 12:31:55 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> escreveu:

> Em Thu, 10 Jul 2025 12:10:33 +0200
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> escreveu:
> 
> > From the above:
> > 
> > - It is not worth applying patch 12/12 as it makes the code slower;
> > - Python 3.13t (no-GIL version) had very bad results. It seems it
> >   still requires optimization;
> > - Python 3.9 is a lot worse (140% to 190%) when using list append;
> > - when there are not many concats, Python 3.13 is about 15% slower
> >   with lists than concat strings. It only approaches str concat
> >   when the number of concats is high.
> > 
> > With the above, clearly str += is faster than list append.
> > 
> > So, except if I did something wrong on this benchmark script, please
> > don't apply patch 12/12.  
> 
> And I did: I forgot the final line at the concat code to get the
> result as strings.
> 
> For explicit list:
> 	result = obj.output()
> 
> For implicit ones:
> 	result = str(obj)
> 
> Yet, the conclusion is similar. With Python 3.13:
> 
>     $ for i in python3.13; do for j in 1 10 100 1000; do $i /tmp/bench.py $((1000000/$j)) $j 1; done; done
>     1 strings in a loop with 1000000 interactions, repeating 24 times
>         str +=       : time: 41.42
>         list join    : time: 127.33: 207.42% slower than str +=
>     10 strings in a loop with 100000 interactions, repeating 24 times
>         str +=       : time: 27.15
>         list join    : time: 39.19: 44.36% slower than str +=
>     100 strings in a loop with 10000 interactions, repeating 24 times
>         str +=       : time: 24.84
>         list join    : time: 30.70: 23.57% slower than str +=
>     1000 strings in a loop with 1000 interactions, repeating 24 times
>         str +=       : time: 21.84
>         list join    : time: 27.85: 27.50% slower than str +=
> 
> Explict list concat was between ~30% to ~200% worse than str concat.

Looking for an explanation, PEP 509 and PEP 393 did Short String Optimization
and Inline Caching. This was applied Python 3.6, and Python 3.13 came with
extra string optimizations.

On the other hand, lists do more memory allocations, have some logic to
extend list growth and has an extra concat loop.

With that, contrary to popular belief, it sounds that str concat are
nowadays faster.

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 04/12] docs: kdoc: Centralize handling of the item section list
  2025-07-10  5:45   ` Mauro Carvalho Chehab
@ 2025-07-10 13:25     ` Jonathan Corbet
  0 siblings, 0 replies; 41+ messages in thread
From: Jonathan Corbet @ 2025-07-10 13:25 UTC (permalink / raw)
  To: Mauro Carvalho Chehab; +Cc: linux-doc, linux-kernel, Akira Yokosawa

Mauro Carvalho Chehab <mchehab+huawei@kernel.org> writes:

> Em Wed,  2 Jul 2025 16:35:16 -0600
> Jonathan Corbet <corbet@lwn.net> escreveu:
>
>> The section list always comes directly from the under-construction entry
>> and is used uniformly.  Formalize section handling in the KdocItem class,
>> and have output_declaration() load the sections directly from the entry,
>> eliminating a lot of duplicated, verbose parameters.
>> 
>> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
>> ---
>>  scripts/lib/kdoc/kdoc_item.py   |  8 ++++++++
>>  scripts/lib/kdoc/kdoc_output.py | 36 ++++++++++++---------------------
>>  scripts/lib/kdoc/kdoc_parser.py | 20 +++---------------
>>  3 files changed, 24 insertions(+), 40 deletions(-)
>> 
>> diff --git a/scripts/lib/kdoc/kdoc_item.py b/scripts/lib/kdoc/kdoc_item.py
>> index add2cc772fec..c8329019a219 100644
>> --- a/scripts/lib/kdoc/kdoc_item.py
>> +++ b/scripts/lib/kdoc/kdoc_item.py
>> @@ -9,6 +9,7 @@ class KdocItem:
>>          self.name = name
>>          self.type = type
>>          self.declaration_start_line = start_line
>> +        self.sections = self.sections_start_lines = { }
>
> Nitpicks:
> - to make coding-style uniform, please use "{}" without spaces;
> - Please place one statement per line, just like we (usually) do in Kernel. 

Sure, fine.

>   In this specific case, I strongly suspect that what you coded is not
>   implementing the semantics you want. See:
>
> 	1. are you creating a single dict and placing the same dict on two
> 	   variables?
>   or:
> 	2. are you initializing two different vars with their own empty
> 	   dict?
>
> The subsequent code assumes (2), but a quick check with python3 command
> line:

As you note, the subsequent code does *not* actually assume that; I know
the way Python semantics work :)  But I can separate the lines and make
things explicit.

Thanks,

jon

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 09/12] docs: kdoc: straighten up dump_declaration()
  2025-07-10  6:25   ` Mauro Carvalho Chehab
@ 2025-07-10 13:27     ` Jonathan Corbet
  2025-07-10 22:13       ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 41+ messages in thread
From: Jonathan Corbet @ 2025-07-10 13:27 UTC (permalink / raw)
  To: Mauro Carvalho Chehab; +Cc: linux-doc, linux-kernel, Akira Yokosawa

Mauro Carvalho Chehab <mchehab+huawei@kernel.org> writes:

> Em Wed,  2 Jul 2025 16:35:21 -0600
> Jonathan Corbet <corbet@lwn.net> escreveu:
>
>> Get rid of the excess "return" statements in dump_declaration(), along with
>> a line of never-executed dead code.
>> 
>> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
>> ---
>>  scripts/lib/kdoc/kdoc_parser.py | 15 +++++----------
>>  1 file changed, 5 insertions(+), 10 deletions(-)
>> 
>> diff --git a/scripts/lib/kdoc/kdoc_parser.py b/scripts/lib/kdoc/kdoc_parser.py
>> index 6e35e508608b..7191fa94e17a 100644
>> --- a/scripts/lib/kdoc/kdoc_parser.py
>> +++ b/scripts/lib/kdoc/kdoc_parser.py
>> @@ -878,18 +878,13 @@ class KernelDoc:
>>  
>>          if self.entry.decl_type == "enum":
>>              self.dump_enum(ln, prototype)
>> -            return
>> -
>> -        if self.entry.decl_type == "typedef":
>> +        elif self.entry.decl_type == "typedef":
>>              self.dump_typedef(ln, prototype)
>> -            return
>> -
>> -        if self.entry.decl_type in ["union", "struct"]:
>> +        elif self.entry.decl_type in ["union", "struct"]:
>>              self.dump_struct(ln, prototype)
>> -            return
>> -
>
> The above LGTM.
>
>> -        self.output_declaration(self.entry.decl_type, prototype,
>> -                                entry=self.entry)
>> +        else:
>> +            # This would be a bug
>> +            self.emit_message(ln, f'Unknown declaration type: {self.entry.decl_type}')
>
> Hmm... Are you sure about that? If I'm not mistaken, this was used for
> other types of arguments, like DOC: tags.

DOC tags are handled in a different path entirely.  I did ensure that
the code in question was never executed ... but then left the message in
place just in case.

Thanks,

jon

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 09/12] docs: kdoc: straighten up dump_declaration()
  2025-07-10 13:27     ` Jonathan Corbet
@ 2025-07-10 22:13       ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 41+ messages in thread
From: Mauro Carvalho Chehab @ 2025-07-10 22:13 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: linux-doc, linux-kernel, Akira Yokosawa

Em Thu, 10 Jul 2025 07:27:07 -0600
Jonathan Corbet <corbet@lwn.net> escreveu:

> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> writes:
> 
> > Em Wed,  2 Jul 2025 16:35:21 -0600
> > Jonathan Corbet <corbet@lwn.net> escreveu:
> >  
> >> Get rid of the excess "return" statements in dump_declaration(), along with
> >> a line of never-executed dead code.
> >> 
> >> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
> >> ---
> >>  scripts/lib/kdoc/kdoc_parser.py | 15 +++++----------
> >>  1 file changed, 5 insertions(+), 10 deletions(-)
> >> 
> >> diff --git a/scripts/lib/kdoc/kdoc_parser.py b/scripts/lib/kdoc/kdoc_parser.py
> >> index 6e35e508608b..7191fa94e17a 100644
> >> --- a/scripts/lib/kdoc/kdoc_parser.py
> >> +++ b/scripts/lib/kdoc/kdoc_parser.py
> >> @@ -878,18 +878,13 @@ class KernelDoc:
> >>  
> >>          if self.entry.decl_type == "enum":
> >>              self.dump_enum(ln, prototype)
> >> -            return
> >> -
> >> -        if self.entry.decl_type == "typedef":
> >> +        elif self.entry.decl_type == "typedef":
> >>              self.dump_typedef(ln, prototype)
> >> -            return
> >> -
> >> -        if self.entry.decl_type in ["union", "struct"]:
> >> +        elif self.entry.decl_type in ["union", "struct"]:
> >>              self.dump_struct(ln, prototype)
> >> -            return
> >> -  
> >
> > The above LGTM.
> >  
> >> -        self.output_declaration(self.entry.decl_type, prototype,
> >> -                                entry=self.entry)
> >> +        else:
> >> +            # This would be a bug
> >> +            self.emit_message(ln, f'Unknown declaration type: {self.entry.decl_type}')  
> >
> > Hmm... Are you sure about that? If I'm not mistaken, this was used for
> > other types of arguments, like DOC: tags.  
> 
> DOC tags are handled in a different path entirely.  I did ensure that
> the code in question was never executed ... but then left the message in
> place just in case.

OK.


If the output didn't change neither for ReST nor for man, that's fine
for me. Besides being a port from Perl, I'm almost sure I hit this
code before during the conversion, but it it is now a dead code,
your approach is better ;-)

While I didn't test, I trust you. So feel free to add:

Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 12/12] docs: kdoc: Improve the output text accumulation
  2025-07-10  8:19       ` Mauro Carvalho Chehab
  2025-07-10 10:10         ` Mauro Carvalho Chehab
@ 2025-07-10 23:30         ` Jonathan Corbet
  2025-07-11  6:14           ` Mauro Carvalho Chehab
  1 sibling, 1 reply; 41+ messages in thread
From: Jonathan Corbet @ 2025-07-10 23:30 UTC (permalink / raw)
  To: Mauro Carvalho Chehab; +Cc: linux-doc, linux-kernel, Akira Yokosawa

Mauro Carvalho Chehab <mchehab+huawei@kernel.org> writes:

> With that, I would just drop this patch, as the performance is
> almost identical, and using "emit()" instead of "+=" IMO makes
> the code less clear.

I've dropped the patch - for now - but I really disagree with the latter
part of that sentence.  It is far better, IMO, to encapsulate the
construction of our output rather than spreading vast numbers of direct
string concatenations throughout the code.  So this one will likely be
back in a different form :)

Thanks,

jon

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 12/12] docs: kdoc: Improve the output text accumulation
  2025-07-10 23:30         ` Jonathan Corbet
@ 2025-07-11  6:14           ` Mauro Carvalho Chehab
  2025-07-11 12:49             ` Jonathan Corbet
  0 siblings, 1 reply; 41+ messages in thread
From: Mauro Carvalho Chehab @ 2025-07-11  6:14 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: linux-doc, linux-kernel, Akira Yokosawa

Em Thu, 10 Jul 2025 17:30:20 -0600
Jonathan Corbet <corbet@lwn.net> escreveu:

> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> writes:
> 
> > With that, I would just drop this patch, as the performance is
> > almost identical, and using "emit()" instead of "+=" IMO makes
> > the code less clear.  
> 
> I've dropped the patch - for now - but I really disagree with the latter
> part of that sentence.  It is far better, IMO, to encapsulate the
> construction of our output rather than spreading vast numbers of direct
> string concatenations throughout the code.  So this one will likely be
> back in a different form :)

The main concern was related to performance penalty - as based on
the latest test results, Pyhon currently handles very poorly list
concat (30% to 200% slower at the latest test results).

Yet, at least for me with my C-trained brain parsing, I find "=+" a
lot easier to understand than some_function().

Btw, IMHO Python is not particularly great with names for concat/accumulate
commands. For list, it is append(), for set it is add(). Yet, "+=" is almost
universal (from standard types, only sets don't accept it, using, 
instead, "|=", which kind of makes sense).

Adding a function naming emit() - at least for me - requires an extra brain 
processing time to remember that emit is actually a function that doesn't
produce any emission: it just stores data for a future output - that may 
even not happen if one calls the script with "--none".

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 12/12] docs: kdoc: Improve the output text accumulation
  2025-07-11  6:14           ` Mauro Carvalho Chehab
@ 2025-07-11 12:49             ` Jonathan Corbet
  2025-07-11 16:28               ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 41+ messages in thread
From: Jonathan Corbet @ 2025-07-11 12:49 UTC (permalink / raw)
  To: Mauro Carvalho Chehab; +Cc: linux-doc, linux-kernel, Akira Yokosawa

Mauro Carvalho Chehab <mchehab+huawei@kernel.org> writes:

> Em Thu, 10 Jul 2025 17:30:20 -0600
> Jonathan Corbet <corbet@lwn.net> escreveu:
>
>> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> writes:
>> 
>> > With that, I would just drop this patch, as the performance is
>> > almost identical, and using "emit()" instead of "+=" IMO makes
>> > the code less clear.  
>> 
>> I've dropped the patch - for now - but I really disagree with the latter
>> part of that sentence.  It is far better, IMO, to encapsulate the
>> construction of our output rather than spreading vast numbers of direct
>> string concatenations throughout the code.  So this one will likely be
>> back in a different form :)
>
> The main concern was related to performance penalty - as based on
> the latest test results, Pyhon currently handles very poorly list
> concat (30% to 200% slower at the latest test results).

Yes, I understood that part

> Yet, at least for me with my C-trained brain parsing, I find "=+" a
> lot easier to understand than some_function().
>
> Btw, IMHO Python is not particularly great with names for concat/accumulate
> commands. For list, it is append(), for set it is add(). Yet, "+=" is almost
> universal (from standard types, only sets don't accept it, using, 
> instead, "|=", which kind of makes sense).
>
> Adding a function naming emit() - at least for me - requires an extra brain 
> processing time to remember that emit is actually a function that doesn't
> produce any emission: it just stores data for a future output - that may 
> even not happen if one calls the script with "--none".

OK, I'll ponder on a different name :)

Perhaps the new not_emit() could even be aware of --none and just drop
the data on the floor.

Thanks,

jon

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 12/12] docs: kdoc: Improve the output text accumulation
  2025-07-11 12:49             ` Jonathan Corbet
@ 2025-07-11 16:28               ` Mauro Carvalho Chehab
  2025-07-11 16:39                 ` Jonathan Corbet
  0 siblings, 1 reply; 41+ messages in thread
From: Mauro Carvalho Chehab @ 2025-07-11 16:28 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: linux-doc, linux-kernel, Akira Yokosawa

Em Fri, 11 Jul 2025 06:49:26 -0600
Jonathan Corbet <corbet@lwn.net> escreveu:

> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> writes:
> 
> > Em Thu, 10 Jul 2025 17:30:20 -0600
> > Jonathan Corbet <corbet@lwn.net> escreveu:
> >  
> >> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> writes:
> >>   
> >> > With that, I would just drop this patch, as the performance is
> >> > almost identical, and using "emit()" instead of "+=" IMO makes
> >> > the code less clear.    
> >> 
> >> I've dropped the patch - for now - but I really disagree with the latter
> >> part of that sentence.  It is far better, IMO, to encapsulate the
> >> construction of our output rather than spreading vast numbers of direct
> >> string concatenations throughout the code.  So this one will likely be
> >> back in a different form :)  
> >
> > The main concern was related to performance penalty - as based on
> > the latest test results, Pyhon currently handles very poorly list
> > concat (30% to 200% slower at the latest test results).  
> 
> Yes, I understood that part
> 
> > Yet, at least for me with my C-trained brain parsing, I find "=+" a
> > lot easier to understand than some_function().
> >
> > Btw, IMHO Python is not particularly great with names for concat/accumulate
> > commands. For list, it is append(), for set it is add(). Yet, "+=" is almost
> > universal (from standard types, only sets don't accept it, using, 
> > instead, "|=", which kind of makes sense).
> >
> > Adding a function naming emit() - at least for me - requires an extra brain 
> > processing time to remember that emit is actually a function that doesn't
> > produce any emission: it just stores data for a future output - that may 
> > even not happen if one calls the script with "--none".  
> 
> OK, I'll ponder on a different name :)

I'm fine with that.

> Perhaps the new not_emit() could even be aware of --none and just drop
> the data on the floor.

The code already does that on a much more optimized way. This
is actually one of the improvements over the Perl version: we
don't need to implement anything special for none.

When --none is passed, the code sets out_style = OutputFormat(), 
which is pretty much an abstract class that doesn't do any output 
at all, and from where the ManOutput and Restformat classes
are inherited.

It only does two things:

- Applying filters, in order to filter-out warnings from things
  according with --import/--internal/--function arguments;

- print warnings for symbols after filtering them, with:

    def out_warnings(self, args):
        """
        Output warnings for identifiers that will be displayed.
        """

        warnings = args.get('warnings', [])

        for log_msg in warnings:
            self.config.warning(log_msg)

So, there's no emit()/no_emit()/print()... there. All output
types do nothing:

    # Virtual methods to be overridden by inherited classes
    # At the base class, those do nothing.
    def out_doc(self, fname, name, args):
        """Outputs a DOC block"""

    def out_function(self, fname, name, args):
        """Outputs a function"""

    def out_enum(self, fname, name, args):
        """Outputs an enum"""

    def out_typedef(self, fname, name, args):
        """Outputs a typedef"""

    def out_struct(self, fname, name, args):
        """Outputs a struct"""

Regards,
Mauro


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 12/12] docs: kdoc: Improve the output text accumulation
  2025-07-11 16:28               ` Mauro Carvalho Chehab
@ 2025-07-11 16:39                 ` Jonathan Corbet
  0 siblings, 0 replies; 41+ messages in thread
From: Jonathan Corbet @ 2025-07-11 16:39 UTC (permalink / raw)
  To: Mauro Carvalho Chehab; +Cc: linux-doc, linux-kernel, Akira Yokosawa

Mauro Carvalho Chehab <mchehab+huawei@kernel.org> writes:

>> Perhaps the new not_emit() could even be aware of --none and just drop
>> the data on the floor.
>
> The code already does that on a much more optimized way. This
> is actually one of the improvements over the Perl version: we
> don't need to implement anything special for none.

So your comment on emit() being mis-named for the --none case didn't
actually apply... :)

jon

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2025-07-11 16:39 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-02 22:35 [PATCH 00/12] [PATCH 00/11] Thrash up the parser/output interface Jonathan Corbet
2025-07-02 22:35 ` [PATCH 01/12] docs: kdoc; Add a rudimentary class to represent output items Jonathan Corbet
2025-07-10  5:28   ` Mauro Carvalho Chehab
2025-07-02 22:35 ` [PATCH 02/12] docs: kdoc: simplify the output-item passing Jonathan Corbet
2025-07-10  5:29   ` Mauro Carvalho Chehab
2025-07-02 22:35 ` [PATCH 03/12] docs: kdoc: drop "sectionlist" Jonathan Corbet
2025-07-09 16:27   ` Mauro Carvalho Chehab
2025-07-02 22:35 ` [PATCH 04/12] docs: kdoc: Centralize handling of the item section list Jonathan Corbet
2025-07-10  5:45   ` Mauro Carvalho Chehab
2025-07-10 13:25     ` Jonathan Corbet
2025-07-02 22:35 ` [PATCH 05/12] docs: kdoc: remove the "struct_actual" machinery Jonathan Corbet
2025-07-10  6:11   ` Mauro Carvalho Chehab
2025-07-02 22:35 ` [PATCH 06/12] docs: kdoc: use self.entry.parameterlist directly in check_sections() Jonathan Corbet
2025-07-10  6:12   ` Mauro Carvalho Chehab
2025-07-02 22:35 ` [PATCH 07/12] docs: kdoc: Coalesce parameter-list handling Jonathan Corbet
2025-07-10  6:20   ` Mauro Carvalho Chehab
2025-07-02 22:35 ` [PATCH 08/12] docs: kdoc: Regularize the use of the declaration name Jonathan Corbet
2025-07-10  6:22   ` Mauro Carvalho Chehab
2025-07-02 22:35 ` [PATCH 09/12] docs: kdoc: straighten up dump_declaration() Jonathan Corbet
2025-07-10  6:25   ` Mauro Carvalho Chehab
2025-07-10 13:27     ` Jonathan Corbet
2025-07-10 22:13       ` Mauro Carvalho Chehab
2025-07-02 22:35 ` [PATCH 10/12] docs: kdoc: directly access the always-there KdocItem fields Jonathan Corbet
2025-07-10  6:27   ` Mauro Carvalho Chehab
2025-07-02 22:35 ` [PATCH 11/12] docs: kdoc: clean up check_sections() Jonathan Corbet
2025-07-10  6:29   ` Mauro Carvalho Chehab
2025-07-02 22:35 ` [PATCH 12/12] docs: kdoc: Improve the output text accumulation Jonathan Corbet
2025-07-10  6:41   ` Mauro Carvalho Chehab
2025-07-10  7:13     ` Mauro Carvalho Chehab
2025-07-10  8:19       ` Mauro Carvalho Chehab
2025-07-10 10:10         ` Mauro Carvalho Chehab
2025-07-10 10:31           ` Mauro Carvalho Chehab
2025-07-10 10:59             ` Mauro Carvalho Chehab
2025-07-10 23:30         ` Jonathan Corbet
2025-07-11  6:14           ` Mauro Carvalho Chehab
2025-07-11 12:49             ` Jonathan Corbet
2025-07-11 16:28               ` Mauro Carvalho Chehab
2025-07-11 16:39                 ` Jonathan Corbet
2025-07-03  2:07 ` [PATCH 00/12] [PATCH 00/11] Thrash up the parser/output interface Yanteng Si
2025-07-09 15:29 ` Jonathan Corbet
2025-07-09 16:21   ` Mauro Carvalho Chehab

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).