All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
To: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	Akira Yokosawa <akiyks@gmail.com>
Subject: Re: [PATCH 04/12] docs: kdoc: move the prefix transforms out of dump_struct()
Date: Fri, 1 Aug 2025 07:35:30 +0200	[thread overview]
Message-ID: <20250801073530.661e2078@foz.lan> (raw)
In-Reply-To: <20250801072841.0246eeac@foz.lan>

Em Fri, 1 Aug 2025 07:28:41 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> escreveu:

> Em Thu, 31 Jul 2025 18:13:18 -0600
> Jonathan Corbet <corbet@lwn.net> escreveu:
> 
> > dump_struct is one of the longest functions in the kdoc_parser class,
> > making it hard to read and reason about.  Move the definition of the prefix
> > transformations out of the function, join them with the definition of
> > "attribute" (which was defined at the top of the file but only used here),
> > and reformat the code slightly for shorter line widths.
> > 
> > Just code movement in the end.  
> 
> This patch itself LGTM:
> 
> Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

In time, my R-B from patch 4 and above assumes that patch 3 is
dropped, as I'm not re-checking the regular expressions.

> 
> but see my notes below:
> 
> > +struct_prefixes = [
> > +    # Strip attributes
> > +    (struct_attribute, ' '),
> > +    (KernRe(r'\s*__aligned\s*\([^;]*\)', re.S), ' '),
> > +    (KernRe(r'\s*__counted_by\s*\([^;]*\)', re.S), ' '),
> > +    (KernRe(r'\s*__counted_by_(le|be)\s*\([^;]*\)', re.S), ' '),
> > +    (KernRe(r'\s*__packed\s*', re.S), ' '),
> > +    (KernRe(r'\s*CRYPTO_MINALIGN_ATTR', re.S), ' '),
> > +    (KernRe(r'\s*____cacheline_aligned_in_smp', re.S), ' '),
> > +    (KernRe(r'\s*____cacheline_aligned', re.S), ' '),
> > +    #
> > +    # Unwrap struct_group macros based on this definition:
> > +    # __struct_group(TAG, NAME, ATTRS, MEMBERS...)
> > +    # which has variants like: struct_group(NAME, MEMBERS...)
> > +    # Only MEMBERS arguments require documentation.
> > +    #
> > +    # Parsing them happens on two steps:
> > +    #
> > +    # 1. drop struct group arguments that aren't at MEMBERS,
> > +    #    storing them as STRUCT_GROUP(MEMBERS)
> > +    #
> > +    # 2. remove STRUCT_GROUP() ancillary macro.
> > +    #
> > +    # The original logic used to remove STRUCT_GROUP() using an
> > +    # advanced regex:
> > +    #
> > +    #   \bSTRUCT_GROUP(\(((?:(?>[^)(]+)|(?1))*)\))[^;]*;
> > +    #
> > +    # with two patterns that are incompatible with
> > +    # Python re module, as it has:
> > +    #
> > +    #   - a recursive pattern: (?1)
> > +    #   - an atomic grouping: (?>...)
> > +    #
> > +    # I tried a simpler version: but it didn't work either:
> > +    #   \bSTRUCT_GROUP\(([^\)]+)\)[^;]*;
> > +    #
> > +    # As it doesn't properly match the end parenthesis on some cases.
> > +    #
> > +    # So, a better solution was crafted: there's now a NestedMatch
> > +    # class that ensures that delimiters after a search are properly
> > +    # matched. So, the implementation to drop STRUCT_GROUP() will be
> > +    # handled in separate.
> > +    #
> > +    (KernRe(r'\bstruct_group\s*\(([^,]*,)', re.S), r'STRUCT_GROUP('),
> > +    (KernRe(r'\bstruct_group_attr\s*\(([^,]*,){2}', re.S), r'STRUCT_GROUP('),
> > +    (KernRe(r'\bstruct_group_tagged\s*\(([^,]*),([^,]*),', re.S), r'struct \1 \2; STRUCT_GROUP('),
> > +    (KernRe(r'\b__struct_group\s*\(([^,]*,){3}', re.S), r'STRUCT_GROUP('),
> > +    #
> > +    # Replace macros
> > +    #
> > +    # TODO: use NestedMatch for FOO($1, $2, ...) matches  
> 
> This comment is actually related to patch 03/12: regex cleanups:
> 
> If you want to simplify a lot the regular expressions here, the best
> is to take a look at the NestedMatch class and improve it. There are lots
> of regular expressions here that are very complex because they try
> to ensure that something like these:
> 
> 	1. function(<arg1>)
> 	2. function(<arg1>, <arg2>,<arg3>,...)
> 
> are properly parsed[1], but if we turn it into something that handle (2) as 
> well, we could use it like:
> 
> 	match = NestedMatch.search("function", string)
> 	# or, alternatively:
> 	# match = NestedMatch.search("function($1, $2, $3)", string)
> 
> 	if match:
> 		arg1 = match.group(1)
> 		arg2 = match.group(2)
> 		arg3 = match.group(3)
> 
> or even do more complex changes like:
> 
> 	NestedMatch.sub("foo($1, $2)", "new_name($2)", string)
> 
> A class implementing that will help to transform all sorts of functions
> and simplify the more complex regexes on kernel-doc. Doing that will
> very likely simplify a lot the struct_prefixes, replacing it by something
> a lot more easier to understand:
> 
> 	# Nice and simpler set of replacement rules
> 	struct_nested_matches = [
> 		("__aligned", ""),
> 		("__counted_by", ""),
> 		("__counted_by_(be|le)", ""),
> 	...
> 		# Picked those from stddef.h macro replacement rules
> 		("struct_group(NAME, MEMBERS...)", "__struct_group(, NAME, , MEMBERS)"),
> 		("struct_group(TAG, NAME, ATTRS, MEMBERS...)",
> 		 """	__struct_group(TAG, NAME, ATTRS, MEMBERS...)
> 		        union {
> 		                struct { MEMBERS } ATTRS;
> 		                struct __struct_group_tag(TAG) { MEMBERS } ATTRS NAME;
> 		        } ATTRS"""),
> 	...
> 	]
> 
> 	members = trim_private_members(members)
> 	for from, to in struct_nested_matches:
>               members = NestedMatch.sub(from, to, members)
> 		
> Granted, wiring this up takes some time and lots of testing - we should
> likely have some unit tests to catch issues there - but IMO it is
> worth the effort.
> 
> -
> 
> [1] NestedMatch() is currently limited to match function(<args>), as it was
>     written to replace really complex regular expressions with 
>     recursive patterns and atomic grouping, that were used only to
>     capture macro calls for: 
> 
> 	STRUCT_GROUP(...)
> 
>    I might have used instead "import regex", but I didn't want to add the
>    extra dependency of a non-standard Python library at the Kernel build.
> 
> Thanks,
> Mauro



Thanks,
Mauro

  reply	other threads:[~2025-08-01  5:35 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-01  0:13 [PATCH 00/12] docs: kdoc: thrash up dump_struct() Jonathan Corbet
2025-08-01  0:13 ` [PATCH 01/12] docs: kdoc: consolidate the stripping of private struct/union members Jonathan Corbet
2025-08-01  5:29   ` Mauro Carvalho Chehab
2025-08-01  0:13 ` [PATCH 02/12] docs: kdoc: Move a regex line in dump_struct() Jonathan Corbet
2025-08-01  5:29   ` Mauro Carvalho Chehab
2025-08-01  0:13 ` [PATCH 03/12] docs: kdoc: backslashectomy in kdoc_parser Jonathan Corbet
2025-08-01  4:27   ` Mauro Carvalho Chehab
2025-08-01 14:21     ` Jonathan Corbet
2025-08-04 12:58       ` Mauro Carvalho Chehab
2025-08-04 16:00         ` Mauro Carvalho Chehab
2025-08-04 18:29           ` Jonathan Corbet
2025-08-01  0:13 ` [PATCH 04/12] docs: kdoc: move the prefix transforms out of dump_struct() Jonathan Corbet
2025-08-01  5:28   ` Mauro Carvalho Chehab
2025-08-01  5:35     ` Mauro Carvalho Chehab [this message]
2025-08-01  0:13 ` [PATCH 05/12] docs: kdoc: split top-level prototype parsing " Jonathan Corbet
2025-08-01  5:34   ` Mauro Carvalho Chehab
2025-08-01 14:10     ` Jonathan Corbet
2025-08-04 12:20       ` Mauro Carvalho Chehab
2025-08-01  0:13 ` [PATCH 06/12] docs: kdoc: split struct-member rewriting " Jonathan Corbet
2025-08-01  5:37   ` Mauro Carvalho Chehab
2025-08-01  0:13 ` [PATCH 07/12] docs: kdoc: rework the rewrite_struct_members() main loop Jonathan Corbet
2025-08-01  5:42   ` Mauro Carvalho Chehab
2025-08-01  0:13 ` [PATCH 08/12] docs: kdoc: remove an extraneous strip() call Jonathan Corbet
2025-08-01  5:45   ` Mauro Carvalho Chehab
2025-08-01  0:13 ` [PATCH 09/12] docs: kdoc: Some rewrite_struct_members() commenting Jonathan Corbet
2025-08-01  5:50   ` Mauro Carvalho Chehab
2025-08-01  0:13 ` [PATCH 10/12] docs: kdoc: further rewrite_struct_members() cleanup Jonathan Corbet
2025-08-01  6:07   ` Mauro Carvalho Chehab
2025-08-01 22:52     ` Jonathan Corbet
2025-08-04 13:15       ` Mauro Carvalho Chehab
2025-08-05 22:46         ` Jonathan Corbet
2025-08-06  9:05           ` Mauro Carvalho Chehab
2025-08-06 13:00             ` Jonathan Corbet
2025-08-06 21:27               ` Mauro Carvalho Chehab
2025-08-01  0:13 ` [PATCH 11/12] docs: kdoc: extract output formatting from dump_struct() Jonathan Corbet
2025-08-01  6:09   ` Mauro Carvalho Chehab
2025-08-01  0:13 ` [PATCH 12/12] docs: kdoc: a few final dump_struct() touches Jonathan Corbet
2025-08-01  6:10   ` Mauro Carvalho Chehab
2025-08-01  6:23 ` [PATCH 00/12] docs: kdoc: thrash up dump_struct() Mauro Carvalho Chehab

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250801073530.661e2078@foz.lan \
    --to=mchehab+huawei@kernel.org \
    --cc=akiyks@gmail.com \
    --cc=corbet@lwn.net \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.