public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
To: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>,
	Alexander Lobakin <aleksander.lobakin@intel.com>,
	Kees Cook <kees@kernel.org>,
	Mauro Carvalho Chehab <mchehab@kernel.org>,
	intel-wired-lan@lists.osuosl.org, linux-doc@vger.kernel.org,
	linux-hardening@vger.kernel.org, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org,
	"Gustavo A. R. Silva" <gustavoars@kernel.org>,
	Aleksandr Loktionov <aleksandr.loktionov@intel.com>,
	Randy Dunlap <rdunlap@infradead.org>,
	Shuah Khan <skhan@linuxfoundation.org>
Subject: Re: [Intel-wired-lan] [PATCH 00/38] docs: several improvements to kernel-doc
Date: Fri, 13 Mar 2026 11:48:45 +0100	[thread overview]
Message-ID: <20260313114845.53eb8611@localhost> (raw)
In-Reply-To: <352c3f9f8ffd2d031c86a476e532a8ea6ffcf1ed@intel.com>

On Wed, 04 Mar 2026 12:07:45 +0200
Jani Nikula <jani.nikula@linux.intel.com> wrote:

> On Mon, 23 Feb 2026, Jonathan Corbet <corbet@lwn.net> wrote:
> > Jani Nikula <jani.nikula@linux.intel.com> writes:
> >  
> >> There's always the question, if you're putting a lot of effort into
> >> making kernel-doc closer to an actual C parser, why not put all that
> >> effort into using and adapting to, you know, an actual C parser?  
> >
> > Not speaking to the current effort but ... in the past, when I have
> > contemplated this (using, say, tree-sitter), the real problem is that
> > those parsers simply strip out the comments.  Kerneldoc without comments
> > ... doesn't work very well.  If there were a parser without those
> > problems, and which could be made to do the right thing with all of our
> > weird macro usage, it would certainly be worth considering.  
> 
> I think e.g. libclang and its Python bindings can be made to work. The
> main problems with that are passing proper compiler options (because
> it'll need to include stuff to know about types etc. because it is a
> proper parser), preprocessing everything is going to take time, you need
> to invest a bunch into it to know how slow exactly compared to the
> current thing and whether it's prohitive, and it introduces an extra
> dependency.
> 
> So yeah, there are definitely tradeoffs there. But it's not like this
> constant patching of kernel-doc is exactly burden free either. 

On my tests with a simple C tokenizer:

	https://lore.kernel.org/linux-doc/cover.1773326442.git.mchehab+huawei@kernel.org/

The tokenizer is working fine and didn't make it much slow: it
increases the time to pass the entire Kernel tree from 37s to 47s
for man pages generation, but should not change much the time for
htmldocs, as right now only ~4 seconds is needed to read files
pointed by Documentation kernel-doc tags and parse them.

The code can still be cleaned up, as there are still some things
hardcoded on the various dump_* functions that could be better
implemented (*).

The advantage of the approach I'm using is that it allows to
gradually migrate to rely at the tokenized code, as it can be done
incrementally.

(*) for instance, __attribute__ and a couple of other macros are parsed
    twice at dump_struct() logic, on different places.

> I don't
> know, is it just me, but I'd like to think as a profession we'd be past
> writing ad hoc C parsers by now.

Probably not, but I don't think we need a C parser, as kernel-doc
just needs to understand data types (enum, struct, typedef, union,
vars) and function/macro prototypes.

For such purpose, a tokenizer sounds enough.

Now, there is the code that it is now inside:
	https://github.com/mchehab/linux/blob/tokenizer-v5/tools/lib/python/kdoc/xforms_lists.py

which contains a list of C/gcc/clang keywords that will
be ignored, like:

	__attribute__
	static
	extern
	inline

Together with a sanitized version of the kernel macros it needs
to handle or ignore:

	DECLARE_BITMAP
	DECLARE_HASHTABLE
 	__acquires
	__init
	__exit
	struct_group
	...


Once we finish cleaning up kdoc_parser.py to rely only
on it for prototype transformations, this will be the only file
that will require changes when more macros start affecting 
kernel-doc.

As this is complex, and may require manual adjustments, it
is probably better to not try to auto-generate xforms list
in runtime. A better approach is, IMO, to have a C pre-processor
code to help periodically update it, like using a target like:

	make kdoc-xforms

that would use either cpp or clang to generate a patch to
update xforms_list content after adding new macros that
affect docs generation.

-- 
Thanks,
Mauro

  parent reply	other threads:[~2026-03-13 10:48 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-18 10:12 [PATCH 00/38] docs: several improvements to kernel-doc Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 01/38] docs: kdoc_re: add support for groups() Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 02/38] docs: kdoc_re: don't go past the end of a line Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 03/38] docs: kdoc_parser: move var transformers to the beginning Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 04/38] docs: kdoc_parser: don't mangle with function defines Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 05/38] docs: kdoc_parser: add functions support for NestedMatch Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 06/38] docs: kdoc_parser: use NestedMatch to handle __attribute__ on functions Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 07/38] docs: kdoc_parser: fix variable regexes to work with size_t Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 08/38] docs: kdoc_parser: fix the default_value logic for variables Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 09/38] docs: kdoc_parser: add some debug for variable parsing Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 10/38] docs: kdoc_parser: don't exclude defaults from prototype Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 11/38] docs: kdoc_parser: fix parser to support multi-word types Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 12/38] docs: kdoc_parser: ignore context analysis and lock attributes Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 13/38] docs: kdoc_parser: add support for LIST_HEAD Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 14/38] docs: kdoc_parser: handle struct member macro VIRTIO_DECLARE_FEATURES(name) Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 15/38] docs: kdoc_re: properly handle strings and escape chars on it Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 16/38] docs: kdoc_re: better show KernRe() at documentation Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 17/38] docs: kdoc_re: don't recompile NestedMatch regex every time Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 18/38] docs: kdoc_re: Change NestedMath args replacement to \0 Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 19/38] docs: kdoc_re: make NestedMatch use KernRe Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 20/38] docs: kdoc_re: add support on NestedMatch for argument replacement Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 21/38] docs: kdoc_parser: better handle struct_group macros Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 22/38] docs: kdoc_re: fix a parse bug on struct page_pool_params Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 23/38] docs: kdoc_re: add a helper class to declare C function matches Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 24/38] docs: kdoc_parser: use the new CFunction class Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 25/38] docs: kdoc_parser: minimize differences with struct_group_tagged Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 26/38] docs: kdoc_parser: move transform lists to a separate file Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 27/38] docs: kdoc_re: don't remove the trailing ";" with NestedMatch Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 28/38] docs: kdoc_re: prevent adding whitespaces on sub replacements Mauro Carvalho Chehab
2026-02-18 10:12 ` [PATCH 29/38] docs: xforms_lists.py: use CFuntion to handle all function macros Mauro Carvalho Chehab
2026-02-18 10:13 ` [PATCH 30/38] docs: kdoc_files: allows the caller to use a different xforms class Mauro Carvalho Chehab
2026-02-18 10:13 ` [PATCH 31/38] docs: kdoc_re: Fix NestedMatch.sub() which causes PDF builds to break Mauro Carvalho Chehab
2026-02-18 10:13 ` [PATCH 32/38] docs: kdoc_files: document KernelFiles() ABI Mauro Carvalho Chehab
2026-02-18 10:13 ` [PATCH 33/38] docs: kdoc_output: add optional args to ManOutput class Mauro Carvalho Chehab
2026-02-18 10:13 ` [PATCH 34/38] docs: sphinx-build-wrapper: better handle troff .TH markups Mauro Carvalho Chehab
2026-02-18 10:13 ` [PATCH 35/38] docs: kdoc_output: use a more standard order for .TH on man pages Mauro Carvalho Chehab
2026-02-18 10:13 ` [PATCH 36/38] docs: sphinx-build-wrapper: don't allow "/" on file names Mauro Carvalho Chehab
2026-02-18 10:13 ` [PATCH 37/38] docs: kdoc_output: describe the class init parameters Mauro Carvalho Chehab
2026-02-18 10:13 ` [PATCH 38/38] docs: kdoc_output: pick a better default for modulename Mauro Carvalho Chehab
2026-02-21  1:24 ` [PATCH 00/38] docs: several improvements to kernel-doc Randy Dunlap
2026-02-22  1:24   ` Randy Dunlap
2026-02-23 13:47 ` Jani Nikula
2026-02-23 15:02   ` Jonathan Corbet
2026-02-24 13:25     ` [Intel-wired-lan] " Mauro Carvalho Chehab
2026-03-04 10:07     ` Jani Nikula
2026-03-04 12:20       ` [Intel-wired-lan] " Mauro Carvalho Chehab
2026-03-04 22:34       ` Jonathan Corbet
2026-03-13 10:48       ` Mauro Carvalho Chehab [this message]
2026-03-03 14:53   ` Mauro Carvalho Chehab
2026-03-03 15:12     ` Loktionov, Aleksandr
2026-03-03 16:09       ` [Intel-wired-lan] " Mauro Carvalho Chehab
2026-03-04  9:51     ` Jani Nikula
2026-02-23 21:58 ` Jonathan Corbet
2026-03-02 15:54   ` [Intel-wired-lan] " Mauro Carvalho Chehab
2026-03-02 16:14     ` Jonathan Corbet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260313114845.53eb8611@localhost \
    --to=mchehab+huawei@kernel.org \
    --cc=aleksander.lobakin@intel.com \
    --cc=aleksandr.loktionov@intel.com \
    --cc=corbet@lwn.net \
    --cc=gustavoars@kernel.org \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=jani.nikula@linux.intel.com \
    --cc=kees@kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-hardening@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchehab@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=rdunlap@infradead.org \
    --cc=skhan@linuxfoundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox