[PATCH 3/9] docs: maintainers_include.py: split state machine on multiple funcs

Linux Documentation
 help / color / mirror / Atom feed

From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
To: Jonathan Corbet <corbet@lwn.net>,
	Linux Doc Mailing List <linux-doc@vger.kernel.org>,
	Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>,
	linux-kernel@vger.kernel.org, rust-for-linux@vger.kernel.org,
	Shuah Khan <skhan@linuxfoundation.org>
Subject: [PATCH 3/9] docs: maintainers_include.py: split state machine on multiple funcs
Date: Mon,  4 May 2026 17:51:12 +0200	[thread overview]
Message-ID: <7cdfae61b68c7613663ddd528020f6b4a4ccf8ec.1777908711.git.mchehab+huawei@kernel.org> (raw)
In-Reply-To: <cover.1777908711.git.mchehab+huawei@kernel.org>

Instead of one big __init__ code, split the MaintainersParser
code in a way that the state machine remains on __init__, but
the actual parser for descriptions and subsystems are moved
to separate functions.

To make parser easier, instead storing parsed results on a list,
place them directly on a string.

That granted 15% of performance increase(*) with Python 3.14 and
made the logic simpler.

(*) measured by creating a new directory under Documentation/,
    and placing justmaintainers.rst and an index file there,
    building it via sphinx-build-wrapper.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
---
 Documentation/sphinx/maintainers_include.py | 299 +++++++++++---------
 1 file changed, 159 insertions(+), 140 deletions(-)

diff --git a/Documentation/sphinx/maintainers_include.py b/Documentation/sphinx/maintainers_include.py
index e679acf0633d..8867ecc0aad3 100755
--- a/Documentation/sphinx/maintainers_include.py
+++ b/Documentation/sphinx/maintainers_include.py
@@ -47,168 +47,187 @@ class MaintainersParser:
         self.profile_toc = set()
         self.profile_entries = {}
 
-        result = list()
-        result.append(".. _maintainers:")
-        result.append("")
+        self.output = ".. _maintainers:\n\n"
 
         # Poor man's state machine.
-        descriptions = False
-        maintainers = False
-        subsystems = False
+        self.descriptions = False
+        self.maintainers = False
+        self.subsystems = False
 
         # Field letter to field name mapping.
-        field_letter = None
-        fields = dict()
+        self.field_letter = None
+        self.fields = dict()
+
+        self.field_prev = ""
+        self.field_content = ""
+        self.subsystem_name = None
+
+        self.app_dir = app_dir
+        self.base_dir, self.doc_dir, self.sphinx_dir = app_dir.partition("Documentation")
+
+        self.re_doc = re.compile(r'(Documentation/([^\s\?\*]*)\.rst)')
 
         prev = None
-        field_prev = ""
-        field_content = ""
-        subsystem_name = None
-
-        base_dir, doc_dir, sphinx_dir = app_dir.partition("Documentation")
-
         for line in open(path):
-            # Have we reached the end of the preformatted Descriptions text?
-            if descriptions and line.startswith('Maintainers'):
-                descriptions = False
-                # Ensure a blank line following the last "|"-prefixed line.
-                result.append("")
-
-            # Start subsystem processing? This is to skip processing the text
-            # between the Maintainers heading and the first subsystem name.
-            if maintainers and not subsystems:
+            if self.descriptions:
+                self.parse_descriptions(line)
+            elif self.maintainers and not self.subsystems:
                 if re.search('^[A-Z0-9]', line):
-                    subsystems = True
-
-            # Drop needless input whitespace.
-            line = line.rstrip()
-
-            #
-            # Handle profile entries - either as files or as https refs
-            #
-            match = re.match(rf"P:\s*({doc_dir})(/\S+)\.rst", line)
-            if match:
-                name = "".join(match.groups())
-                entry = os.path.relpath(base_dir + name, app_dir)
-
-                full_name = os.path.join(base_dir, name)
-                path = os.path.relpath(full_name, app_dir)
-                #
-                # When SPHINXDIRS is used, it will try to reference files
-                # outside srctree, causing warnings. To avoid that, point
-                # to the latest official documentation
-                #
-                if path.startswith("../"):
-                    entry = KERNELDOC_URL + match.group(2) + ".html"
+                    self.subsystems = True
+                    self.parse_subsystems(line)
                 else:
-                    entry = "/" + entry
-
-                if "*" in entry:
-                    for e in glob(entry):
-                        self.profile_toc.add(e)
-                        self.profile_entries[subsystem_name] = e
-                else:
-                    self.profile_toc.add(entry)
-                    self.profile_entries[subsystem_name] = entry
-            else:
-                match = re.match(r"P:\s*(https?://.*)", line)
-                if match:
-                    entry = match.group(1).strip()
-                    self.profile_entries[subsystem_name] = entry
-
-            # Linkify all non-wildcard refs to ReST files in Documentation/.
-            pat = r'(Documentation/([^\s\?\*]*)\.rst)'
-            m = re.search(pat, line)
-            if m:
-                # maintainers.rst is in a subdirectory, so include "../".
-                line = re.sub(pat, ':doc:`%s <../%s>`' % (m.group(2), m.group(2)), line)
-
-            # Check state machine for output rendering behavior.
-            output = None
-            if descriptions:
-                # Escape the escapes in preformatted text.
-                output = "| %s" % (line.replace("\\", "\\\\")
-                                        .replace("**", "\\**"))
-                # Look for and record field letter to field name mappings:
-                #   R: Designated *reviewer*: FullName <address@domain>
-                m = re.search(r"\s(\S):\s", line)
-                if m:
-                    field_letter = m.group(1)
-                if field_letter and not field_letter in fields:
-                    m = re.search(r"\*([^\*]+)\*", line)
-                    if m:
-                        fields[field_letter] = m.group(1)
-            elif subsystems:
-                # Skip empty lines: subsystem parser adds them as needed.
-                if len(line) == 0:
-                    continue
-                # Subsystem fields are batched into "field_content"
-                if line[1] != ':':
-                    # Render a subsystem entry as:
-                    #   SUBSYSTEM NAME
-                    #   ~~~~~~~~~~~~~~
-
-                    # Flush pending field content.
-                    output = field_content + "\n\n"
-                    field_content = ""
-
-                    subsystem_name = line.title()
-
-                    # Collapse whitespace in subsystem name.
-                    heading = re.sub(r"\s+", " ", line)
-                    output = output + "%s\n%s" % (heading, "~" * len(heading))
-                    field_prev = ""
-                else:
-                    # Render a subsystem field as:
-                    #   :Field: entry
-                    #           entry...
-                    field, details = line.split(':', 1)
-                    details = details.strip()
-
-                    # Mark paths (and regexes) as literal text for improved
-                    # readability and to escape any escapes.
-                    if field in ['F', 'N', 'X', 'K']:
-                        # But only if not already marked :)
-                        if not ':doc:' in details:
-                            details = '``%s``' % (details)
-
-                    # Comma separate email field continuations.
-                    if field == field_prev and field_prev in ['M', 'R', 'L']:
-                        field_content = field_content + ","
-
-                    # Do not repeat field names, so that field entries
-                    # will be collapsed together.
-                    if field != field_prev:
-                        output = field_content + "\n"
-                        field_content = ":%s:" % (fields.get(field, field))
-                    field_content = field_content + "\n\t%s" % (details)
-                    field_prev = field
+                    self.output += line
+            elif self.subsystems:
+                self.parse_subsystems(line)
             else:
-                output = line
-
-            # Re-split on any added newlines in any above parsing.
-            if output != None:
-                for separated in output.split('\n'):
-                    result.append(separated)
+                self.output += line
 
             # Update the state machine when we find heading separators.
             if line.startswith('----------'):
                 if prev.startswith('Descriptions'):
-                    descriptions = True
+                    self.descriptions = True
                 if prev.startswith('Maintainers'):
-                    maintainers = True
+                    self.maintainers = True
 
             # Retain previous line for state machine transitions.
             prev = line
 
         # Flush pending field contents.
-        if field_content != "":
-            for separated in field_content.split('\n'):
-                result.append(separated)
+        if self.field_content:
+            self.output += self.field_content + "\n\n"
 
-        self.output = "\n".join(result)
+        self.output = self.output.rstrip()
+
+    def parse_descriptions(self, line):
+        """Handle contents of the descriptions section."""
+
+        # Have we reached the end of the preformatted Descriptions text?
+        if line.startswith('Maintainers'):
+            self.descriptions = False
+            self.output += "\n" + line
+            return
+
+        # Linkify all non-wildcard refs to ReST files in Documentation/.
+        m = self.re_doc.search(line)
+        if m:
+            # maintainers.rst is in a subdirectory, so include "../".
+            line = self.re_doc.sub(':doc:`%s <../%s>`' % (m.group(2), m.group(2)), line)
+
+        # Escape the escapes in preformatted text.
+        output = "| %s" % (line.replace("\\", "\\\\")
+                                .replace("**", "\\**"))
+
+        # Look for and record field letter to field name mappings:
+        #   R: Designated *reviewer*: FullName <address@domain>
+        m = re.search(r"\s(\S):\s", line)
+        if m:
+            self.field_letter = m.group(1)
+
+        if self.field_letter and self.field_letter not in self.fields:
+            m = re.search(r"\*([^\*]+)\*", line)
+            if m:
+                self.fields[self.field_letter] = m.group(1)
+
+        # Append parsed content to self.output
+        self.output += output
+
+    def parse_subsystems(self, line):
+        """Handle contents of the per-subsystem sections."""
+
+        # Drop needless input whitespace.
+        line = line.rstrip()
+
+        #
+        # Handle profile entries - either as files or as https refs
+        #
+        match = re.match(rf"P:\s*({self.doc_dir})(/\S+)\.rst", line)
+        if match:
+            name = "".join(match.groups())
+            entry = os.path.relpath(self.base_dir + name, self.app_dir)
+
+            full_name = os.path.join(self.base_dir, name)
+            path = os.path.relpath(full_name, self.app_dir)
+            #
+            # When SPHINXDIRS is used, it will try to reference files
+            # outside srctree, causing warnings. To avoid that, point
+            # to the latest official documentation
+            #
+            if path.startswith("../"):
+                entry = KERNELDOC_URL + match.group(2) + ".html"
+            else:
+                entry = "/" + entry
+
+            if "*" in entry:
+                for e in glob(entry):
+                    self.profile_toc.add(e)
+                    self.profile_entries[self.subsystem_name] = e
+            else:
+                self.profile_toc.add(entry)
+                self.profile_entries[self.subsystem_name] = entry
+        else:
+            match = re.match(r"P:\s*(https?://.*)", line)
+            if match:
+                entry = match.group(1).strip()
+                self.profile_entries[self.subsystem_name] = entry
+
+        # Linkify all non-wildcard refs to ReST files in Documentation/.
+        m = self.re_doc.search(line)
+        if m:
+            # maintainers.rst is in a subdirectory, so include "../".
+            line = self.re_doc.sub(':doc:`%s <../%s>`' % (m.group(2), m.group(2)), line)
+
+        # Check state machine for output rendering behavior.
+        output = None
+        if self.subsystems:
+            # Skip empty lines: subsystem parser adds them as needed.
+            if len(line) == 0:
+                return
+            # Subsystem fields are batched into "field_content"
+            if line[1] != ':':
+                # Render a subsystem entry as:
+                #   SUBSYSTEM NAME
+                #   ~~~~~~~~~~~~~~
+                # Flush pending field content.
+                output = self.field_content + "\n\n"
+                self.field_content = ""
+
+                self.subsystem_name = line.title()
+
+                # Collapse whitespace in subsystem name.
+                heading = re.sub(r"\s+", " ", line)
+                output = output + "%s\n%s" % (heading, "~" * len(heading))
+                self.field_prev = ""
+            else:
+                # Render a subsystem field as:
+                #   :Field: entry
+                #           entry...
+                field, details = line.split(':', 1)
+                details = details.strip()
+
+                # Mark paths (and regexes) as literal text for improved
+                # readability and to escape any escapes.
+                if field in ['F', 'N', 'X', 'K']:
+                    # But only if not already marked :)
+                    if not ':doc:' in details:
+                        details = '``%s``' % (details)
+
+                # Comma separate email field continuations.
+                if field == self.field_prev and self.field_prev in ['M', 'R', 'L']:
+                    self.field_content = self.field_content + ","
+
+                # Do not repeat field names, so that field entries
+                # will be collapsed together.
+                if field != self.field_prev:
+                    output = self.field_content + "\n"
+                    self.field_content = ":%s:" % (self.fields.get(field, field))
+                self.field_content = self.field_content + "\n\t%s" % (details)
+                self.field_prev = field
+        elif not self.descriptions:
+            output = line
+
+        if output is not None:
+            self.output += output + "\n"
 
-        # Create a TOC class
 
 class MaintainersInclude(Include):
     """MaintainersInclude (``maintainers-include``) directive"""
-- 
2.54.0

next prev parent reply	other threads:[~2026-05-04 15:51 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-04 15:51 [PATCH 0/9] Improve process/maintainers output Mauro Carvalho Chehab
2026-05-04 15:51 ` [PATCH 1/9] docs: maintainers_include: keep hidden TOC sorted Mauro Carvalho Chehab
2026-05-04 15:51 ` [PATCH 2/9] docs: escape ** glob pattern in MAINTAINERS descriptions Mauro Carvalho Chehab
2026-05-04 21:20   ` Randy Dunlap
2026-05-05  3:19     ` Joe Perches
2026-05-05  5:57       ` Mauro Carvalho Chehab
2026-05-05  6:46         ` Mauro Carvalho Chehab
2026-05-04 15:51 ` Mauro Carvalho Chehab [this message]
2026-05-04 15:51 ` [PATCH 4/9] docs: maintainers_include: cleanup the code Mauro Carvalho Chehab
2026-05-04 15:51 ` [PATCH 5/9] docs: maintainers_include.py: clean most SPHINXDIRS=process warnings Mauro Carvalho Chehab
2026-05-04 15:51 ` [PATCH 6/9] docs: maintainers_include: do some coding style cleanups Mauro Carvalho Chehab
2026-05-04 15:51 ` [PATCH 7/9] docs: maintainers_include: store maintainers entries on a dict Mauro Carvalho Chehab
2026-05-04 15:51 ` [PATCH 8/9] docs: maintainers_include: don't ignore invalid profile entries Mauro Carvalho Chehab
2026-05-04 16:08   ` Miguel Ojeda
2026-05-04 20:26     ` Mauro Carvalho Chehab
2026-05-04 22:37       ` Gary Guo
2026-05-04 23:23         ` Mauro Carvalho Chehab
2026-05-05  0:25           ` Gary Guo
2026-05-04 23:34       ` Miguel Ojeda
2026-05-05  0:08         ` Mauro Carvalho Chehab
2026-05-05  0:20           ` Miguel Ojeda
2026-05-05  5:45             ` Mauro Carvalho Chehab
2026-05-05 11:16               ` Gary Guo
2026-05-05 13:09                 ` Mauro Carvalho Chehab
2026-05-05 14:37               ` Miguel Ojeda
2026-05-04 15:51 ` [PATCH 9/9] docs: maintainers: add a filtering javascript Mauro Carvalho Chehab
2026-05-04 21:12   ` Randy Dunlap
2026-05-05 13:00     ` Mauro Carvalho Chehab
2026-05-04 21:13 ` [PATCH 0/9] Improve process/maintainers output Randy Dunlap
2026-05-05 12:50   ` Mauro Carvalho Chehab

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:e679acf0633 dfblob:8867ecc0aad )
 OR (
bs:"[PATCH 3/9] docs: maintainers_include.py: split state machine on multiple funcs" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7cdfae61b68c7613663ddd528020f6b4a4ccf8ec.1777908711.git.mchehab+huawei@kernel.org \
    --to=mchehab+huawei@kernel.org \
    --cc=corbet@lwn.net \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchehab@kernel.org \
    --cc=rust-for-linux@vger.kernel.org \
    --cc=skhan@linuxfoundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox