Re: [PATCH v2 00/28] kernel-doc: use a C lexical tokenizer for transforms

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
To: Jonathan Corbet <corbet@lwn.net>,
	Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Kees Cook <kees@kernel.org>,
	linux-doc@vger.kernel.org, linux-hardening@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	"Gustavo A. R. Silva" <gustavoars@kernel.org>,
	Aleksandr Loktionov <aleksandr.loktionov@intel.com>,
	Randy Dunlap <rdunlap@infradead.org>,
	Shuah Khan <skhan@linuxfoundation.org>,
	Vincent Mailhol <mailhol@kernel.org>
Subject: Re: [PATCH v2 00/28] kernel-doc: use a C lexical tokenizer for transforms
Date: Fri, 13 Mar 2026 10:17:58 +0100	[thread overview]
Message-ID: <20260313101758.1dc691bc@foz.lan> (raw)
In-Reply-To: <cover.1773326442.git.mchehab+huawei@kernel.org>

Hi Jon,

On Thu, 12 Mar 2026 15:54:20 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> Also, I didn't notice any relevant change on the documentation build
> time. 

After more tests, I actually noticed an issue after this changeset:

https://lore.kernel.org/linux-doc/2b957decdb6cedab4268f71a166c25b7abdb9a61.1773326442.git.mchehab+huawei@kernel.org/

Basically, a broken kernel-doc like this:

	
	/**
	 * enum dmub_abm_ace_curve_type - ACE curve type.
	 */
	enum dmub_abm_ace_curve_type {
	        /**
	         * ACE curve as defined by the SW layer.
	         */
	        ABM_ACE_CURVE_TYPE__SW = 0,
	        /**
	         * ACE curve as defined by the SW to HW translation interface layer.
	         */
	        ABM_ACE_CURVE_TYPE__SW_IF = 1,
	};

where the inlined markups don't have "@symbol" doesn't parse well. If
you run current kernel-doc, it would produce:

	.. c:enum:: dmub_abm_ace_curve_type

	  ACE curve type.

	.. container:: kernelindent

	    **Constants**

	    ``*/ ABM_ACE_CURVE_TYPE__SW = 0``
	      *undescribed*


	    `` */ ABM_ACE_CURVE_TYPE__SW_IF = 1``
	      *undescribed*

Because Kernel-doc currently drops the "/**" line. My fix patch
above fixes it, but inlined comments confuse enum/struct detection.
To avoid that, we need to strip comments earlier at dump_struct and
dump_enum:

	https://lore.kernel.org/linux-doc/d112804ace83e0ad8496f687977596bb7f091560.1773390831.git.mchehab+huawei@kernel.org/T/#u

After such fix, the output is now:

	.. c:enum:: dmub_abm_ace_curve_type

	  ACE curve type.

	.. container:: kernelindent

	    **Constants**

	    ``ABM_ACE_CURVE_TYPE__SW``
	      *undescribed*


	    ``ABM_ACE_CURVE_TYPE__SW_IF``
	      *undescribed*

which is the result expected when there's no proper inlined
kernel-doc markups.

Due to this issue, I ended adding a 29/28 patch on this series.

> With that regards, right now, every time a CMatch replacement
> rule takes in place, it does:
> 
>     for each transform:
>     - tokenizes the source code;
>     - handle CMatch;
>     - convert tokens back to a string.
> 
> A possible optimization would be to do, instead:
> 
>     - tokenizes source code;
>     - for each transform handle CMatch;
>     - convert tokens back to a string.
> 
> For now, I opted not do do it, because:
> 
>     - too much changes on a single row;
>     - docs build time is taking ~3:30 minutes, which is
>       about the same time it ws taken before the changes;
>     - there is a very dirty hack inside function_xforms:
>          (KernRe(r"_noprof"), ""). This is meant to change
>       function prototypes instead of function arguments.
> 
> So, if ok for you, I would prefer to merge this one first. We can later
> optimize kdoc_parser to avoid multiple token <-> string conversions.

I did such optimization and it worked fine. So, I ended adding
a 30/28 patch at the end. With that, running kernel-doc before/after
the entire series won't have significant performance changes.

	# Current approach
	$ time ./scripts/kernel-doc . -man >original 2>&1

	real    0m37.344s
	user    0m36.447s
	sys     0m0.712s

	# Tokenizer running multiple times (patch 29)
	$ time ./scripts/kernel-doc . -man >before 2>&1

	real    1m32.427s
	user    1m25.377s
	sys     0m1.293s

	# After optimization (patch 30)
	$ time ./scripts/kernel-doc . -man >after 2>&1

	real    0m47.094s
	user    0m46.106s
	sys     0m0.751s

10 seconds slower than before when parsing everything, which affects
make mandocs, but the time differences spent at kernel-doc parser during
make htmldocs is minimal: ir is about ~4 seconds(*):

	$  run_kdoc.py -none 2>/dev/null
	Checking what files are currently used on documentation...
	Running kernel-doc

	Elapsed time: 0:00:04.348008

(*) the slowest logic when building docs with Sphinx is inside its
    RST parser code.

See the enclosed script to see how I measured the parsing time for
existing ".. kernel-doc::" markups inside Documentation.


Thanks,
Mauro

---

This is the run_kdoc.py script I'm using here to pick the same files
as make htmldocs do:

#!/bin/env python3

import os
import re
import subprocess
import sys

from datetime import datetime
from glob import glob

print("Checking what files are currently used on documentation...")

kdoc_files = set()
re_kernel_doc = re.compile(r"^\.\.\s+kernel-doc::\s*(\S+)")

for fname in glob(os.path.join(".", "**"), recursive=True):
    if os.path.isfile(fname) and fname.endswith(".rst"):
        with open(fname, "r", encoding="utf-8") as in_fp:
            data = in_fp.read()

        for line in data.split("\n"):
            match = re_kernel_doc.match(line)
            if match:
                if os.path.isfile(match.group(1)):
                    kdoc_files.add(match.group(1))

if not kdoc_files:
    sys.exit(f"Directory doesn't contain kernel-doc tags")

cmd = [ "./tools/docs/kernel-doc" ]
cmd += sys.argv[1:]
cmd += sorted(kdoc_files)

print("Running kernel-doc")

start_time = datetime.now()

try:
    result = subprocess.run(cmd, check=True)
except subprocess.CalledProcessError as e:
    print(f"kernel-doc failed: {repr(e)}")

elapsed = datetime.now() - start_time
print(f"\nElapsed time: {elapsed}")

next prev parent reply	other threads:[~2026-03-13  9:18 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-12 14:54 [PATCH v2 00/28] kernel-doc: use a C lexical tokenizer for transforms Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 01/28] docs: python: add helpers to run unit tests Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 02/28] unittests: add a testbench to check public/private kdoc comments Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 03/28] docs: kdoc: don't add broken comments inside prototypes Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 04/28] docs: kdoc: properly handle empty enum arguments Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 05/28] docs: kdoc_re: add a C tokenizer Mauro Carvalho Chehab
2026-03-16 23:01   ` Jonathan Corbet
2026-03-17  7:59     ` Mauro Carvalho Chehab
2026-03-16 23:03   ` Jonathan Corbet
2026-03-16 23:29     ` Randy Dunlap
2026-03-16 23:40       ` Jonathan Corbet
2026-03-17  8:21         ` Mauro Carvalho Chehab
2026-03-17 17:04           ` Jonathan Corbet
2026-03-17  7:03       ` Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 06/28] docs: kdoc: use tokenizer to handle comments on structs Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 07/28] docs: kdoc: move C Tokenizer to c_lex module Mauro Carvalho Chehab
2026-03-16 23:30   ` Jonathan Corbet
2026-03-17  8:02     ` Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 08/28] unittests: test_private: modify it to use CTokenizer directly Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 09/28] unittests: test_tokenizer: check if the tokenizer works Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 10/28] unittests: add a runner to execute all unittests Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 11/28] docs: kdoc: create a CMatch to match nested C blocks Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 12/28] tools: unittests: add tests for CMatch Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 13/28] docs: c_lex: properly implement a sub() method " Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 14/28] unittests: test_cmatch: add tests for sub() Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 15/28] docs: kdoc: replace NestedMatch with CMatch Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 16/28] docs: kdoc_re: get rid of NestedMatch class Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 17/28] docs: xforms_lists: handle struct_group directly Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 18/28] docs: xforms_lists: better evaluate struct_group macros Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 19/28] docs: c_lex: add support to work with pure name ids Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 20/28] docs: xforms_lists: use CMatch for all identifiers Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 21/28] docs: c_lex: add "@" operator Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 22/28] docs: c_lex: don't exclude an extra token Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 23/28] docs: c_lex: setup a logger to report tokenizer issues Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 24/28] docs: unittests: add and adjust tests to check for errors Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 25/28] docs: c_lex: better handle BEGIN/END at search Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 26/28] docs: kernel-doc.rst: document private: scope propagation Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 27/28] docs: c_lex: produce a cleaner str() representation Mauro Carvalho Chehab
2026-03-12 14:54 ` [PATCH v2 28/28] unittests: test_cmatch: remove weird stuff from expected results Mauro Carvalho Chehab
2026-03-13  8:34 ` [PATCH v2 29/28] docs: kdoc: ensure that comments are dropped before calling split_struct_proto() Mauro Carvalho Chehab
2026-03-13  8:34   ` [PATCH v2 30/28] docs: kdoc_parser: avoid tokenizing structs everytime Mauro Carvalho Chehab
2026-03-13 11:05     ` Loktionov, Aleksandr
2026-03-13 11:05   ` [PATCH v2 29/28] docs: kdoc: ensure that comments are dropped before calling split_struct_proto() Loktionov, Aleksandr
2026-03-13  9:17 ` Mauro Carvalho Chehab [this message]
2026-03-17 17:12 ` [PATCH v2 00/28] kernel-doc: use a C lexical tokenizer for transforms Jonathan Corbet
2026-03-17 18:00   ` Mauro Carvalho Chehab
2026-03-17 18:57   ` Mauro Carvalho Chehab

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260313101758.1dc691bc@foz.lan \
    --to=mchehab+huawei@kernel.org \
    --cc=aleksandr.loktionov@intel.com \
    --cc=corbet@lwn.net \
    --cc=gustavoars@kernel.org \
    --cc=kees@kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-hardening@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mailhol@kernel.org \
    --cc=mchehab@kernel.org \
    --cc=rdunlap@infradead.org \
    --cc=skhan@linuxfoundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.