From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
To: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
Akira Yokosawa <akiyks@gmail.com>
Subject: Re: [PATCH 03/12] docs: kdoc: backslashectomy in kdoc_parser
Date: Fri, 1 Aug 2025 06:27:10 +0200 [thread overview]
Message-ID: <20250801062710.552dac5a@foz.lan> (raw)
In-Reply-To: <20250801001326.924276-4-corbet@lwn.net>
Em Thu, 31 Jul 2025 18:13:17 -0600
Jonathan Corbet <corbet@lwn.net> escreveu:
> A lot of the regular expressions in this file have extraneous backslashes
This one is a bit scary... It could actually cause issues somewhere.
Also, IMHO, some expressions look worse on my eyes ;-)
> that may have been needed in Perl, but aren't helpful here. Take them out
> to reduce slightly the visual noise.
No idea if Perl actually requires, but, at least for me, I do prefer to
see all special characters properly escaped with a backslash. This way,
it is a lot clearer that what it is expecting is a string, instead of
using something that may affect regex processing.
This is specially important for my eyes when expecting for dots,
parenthesis and brackets.
> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
> ---
> scripts/lib/kdoc/kdoc_parser.py | 40 ++++++++++++++++-----------------
> 1 file changed, 20 insertions(+), 20 deletions(-)
>
> diff --git a/scripts/lib/kdoc/kdoc_parser.py b/scripts/lib/kdoc/kdoc_parser.py
> index 9948ede739a5..e1efa65a3480 100644
> --- a/scripts/lib/kdoc/kdoc_parser.py
> +++ b/scripts/lib/kdoc/kdoc_parser.py
> @@ -46,7 +46,7 @@ doc_decl = doc_com + KernRe(r'(\w+)', cache=False)
> known_section_names = 'description|context|returns?|notes?|examples?'
> known_sections = KernRe(known_section_names, flags = re.I)
> doc_sect = doc_com + \
> - KernRe(r'\s*(\@[.\w]+|\@\.\.\.|' + known_section_names + r')\s*:([^:].*)?$',
> + KernRe(r'\s*(@[.\w]+|@\.\.\.|' + known_section_names + r')\s*:([^:].*)?$',
> flags=re.I, cache=False)
>
> doc_content = doc_com_body + KernRe(r'(.*)', cache=False)
> @@ -60,7 +60,7 @@ attribute = KernRe(r"__attribute__\s*\(\([a-z0-9,_\*\s\(\)]*\)\)",
> export_symbol = KernRe(r'^\s*EXPORT_SYMBOL(_GPL)?\s*\(\s*(\w+)\s*\)\s*', cache=False)
> export_symbol_ns = KernRe(r'^\s*EXPORT_SYMBOL_NS(_GPL)?\s*\(\s*(\w+)\s*,\s*"\S+"\)\s*', cache=False)
>
> -type_param = KernRe(r"\@(\w*((\.\w+)|(->\w+))*(\.\.\.)?)", cache=False)
> +type_param = KernRe(r"@(\w*((\.\w+)|(->\w+))*(\.\.\.)?)", cache=False)
>
> #
> # Tests for the beginning of a kerneldoc block in its various forms.
> @@ -331,7 +331,7 @@ class KernelDoc:
>
> self.entry.anon_struct_union = False
>
> - param = KernRe(r'[\[\)].*').sub('', param, count=1)
> + param = KernRe(r'[)[].*').sub('', param, count=1)
This one, for instance, IMHO looks a lot worse for my eyes to understand
that there is a "[" that it is not an operator, but instead a string.
The open close parenthesis also looks weird. My regex-trained eyes think
that this would be part of a capture group.
>
> if dtype == "" and param.endswith("..."):
> if KernRe(r'\w\.\.\.$').search(param):
> @@ -405,7 +405,7 @@ class KernelDoc:
>
> for arg in args.split(splitter):
> # Strip comments
> - arg = KernRe(r'\/\*.*\*\/').sub('', arg)
> + arg = KernRe(r'/\*.*\*/').sub('', arg)
A pattern like /..../ is a standard way to pass search group with Regex
on many languages and utils that accept regular expressions like the
sed command. Dropping the backslash here IMHO makes it confusing ;-)
>
> # Ignore argument attributes
> arg = KernRe(r'\sPOS0?\s').sub(' ', arg)
> @@ -428,14 +428,14 @@ class KernelDoc:
>
> arg = arg.replace('#', ',')
>
> - r = KernRe(r'[^\(]+\(\*?\s*([\w\[\]\.]*)\s*\)')
> + r = KernRe(r'[^(]+\(\*?\s*([\w[\].]*)\s*\)')
Here, [.] is also a lot more confusing for me than [\.]
Ok, both works the same way on all implementations I know, but, as a doc
means any character, I need to re-read this two or three times to understand
that here it is waiting for a dot character instead of any character.
----
Here, I became too tired of reading regular expressions... better
stop to avoid headaches ;-)
Seriously, IMHO this patch makes a lot worse to understand what brackets,
parenthesis and dots are strings, and which ones are part of the regex
syntax.
> if r.match(arg):
> param = r.group(1)
> else:
> self.emit_msg(ln, f"Invalid param: {arg}")
> param = arg
>
> - dtype = KernRe(r'([^\(]+\(\*?)\s*' + re.escape(param)).sub(r'\1', arg)
> + dtype = KernRe(r'([^(]+\(\*?)\s*' + re.escape(param)).sub(r'\1', arg)
> self.push_parameter(ln, decl_type, param, dtype,
> arg, declaration_name)
>
> @@ -443,14 +443,14 @@ class KernelDoc:
> # Array-of-pointers
>
> arg = arg.replace('#', ',')
> - r = KernRe(r'[^\(]+\(\s*\*\s*([\w\[\]\.]*?)\s*(\s*\[\s*[\w]+\s*\]\s*)*\)')
> + r = KernRe(r'[^(]+\(\s*\*\s*([\w[\].]*?)\s*(\s*\[\s*[\w]+\s*\]\s*)*\)')
> if r.match(arg):
> param = r.group(1)
> else:
> self.emit_msg(ln, f"Invalid param: {arg}")
> param = arg
>
> - dtype = KernRe(r'([^\(]+\(\*?)\s*' + re.escape(param)).sub(r'\1', arg)
> + dtype = KernRe(r'([^(]+\(\*?)\s*' + re.escape(param)).sub(r'\1', arg)
>
> self.push_parameter(ln, decl_type, param, dtype,
> arg, declaration_name)
> @@ -637,8 +637,8 @@ class KernelDoc:
> # it is better to also move those to the NestedMatch logic,
> # to ensure that parenthesis will be properly matched.
>
> - (KernRe(r'__ETHTOOL_DECLARE_LINK_MODE_MASK\s*\(([^\)]+)\)', re.S), r'DECLARE_BITMAP(\1, __ETHTOOL_LINK_MODE_MASK_NBITS)'),
> - (KernRe(r'DECLARE_PHY_INTERFACE_MASK\s*\(([^\)]+)\)', re.S), r'DECLARE_BITMAP(\1, PHY_INTERFACE_MODE_MAX)'),
> + (KernRe(r'__ETHTOOL_DECLARE_LINK_MODE_MASK\s*\(([^)]+)\)', re.S), r'DECLARE_BITMAP(\1, __ETHTOOL_LINK_MODE_MASK_NBITS)'),
> + (KernRe(r'DECLARE_PHY_INTERFACE_MASK\s*\(([^)]+)\)', re.S), r'DECLARE_BITMAP(\1, PHY_INTERFACE_MODE_MAX)'),
> (KernRe(r'DECLARE_BITMAP\s*\(' + args_pattern + r',\s*' + args_pattern + r'\)', re.S), r'unsigned long \1[BITS_TO_LONGS(\2)]'),
> (KernRe(r'DECLARE_HASHTABLE\s*\(' + args_pattern + r',\s*' + args_pattern + r'\)', re.S), r'unsigned long \1[1 << ((\2) - 1)]'),
> (KernRe(r'DECLARE_KFIFO\s*\(' + args_pattern + r',\s*' + args_pattern + r',\s*' + args_pattern + r'\)', re.S), r'\2 *\1'),
> @@ -700,7 +700,7 @@ class KernelDoc:
> s_id = s_id.strip()
>
> newmember += f"{maintype} {s_id}; "
> - s_id = KernRe(r'[:\[].*').sub('', s_id)
> + s_id = KernRe(r'[:[].*').sub('', s_id)
> s_id = KernRe(r'^\s*\**(\S+)\s*').sub(r'\1', s_id)
>
> for arg in content.split(';'):
> @@ -709,7 +709,7 @@ class KernelDoc:
> if not arg:
> continue
>
> - r = KernRe(r'^([^\(]+\(\*?\s*)([\w\.]*)(\s*\).*)')
> + r = KernRe(r'^([^(]+\(\*?\s*)([\w.]*)(\s*\).*)')
> if r.match(arg):
> # Pointer-to-function
> dtype = r.group(1)
> @@ -767,12 +767,12 @@ class KernelDoc:
> self.check_sections(ln, declaration_name, decl_type)
>
> # Adjust declaration for better display
> - declaration = KernRe(r'([\{;])').sub(r'\1\n', declaration)
> + declaration = KernRe(r'([{;])').sub(r'\1\n', declaration)
> declaration = KernRe(r'\}\s+;').sub('};', declaration)
>
> # Better handle inlined enums
> while True:
> - r = KernRe(r'(enum\s+\{[^\}]+),([^\n])')
> + r = KernRe(r'(enum\s+\{[^}]+),([^\n])')
> if not r.search(declaration):
> break
>
> @@ -969,8 +969,8 @@ class KernelDoc:
> # - pci_match_device, __copy_to_user (long return type)
>
> name = r'[a-zA-Z0-9_~:]+'
> - prototype_end1 = r'[^\(]*'
> - prototype_end2 = r'[^\{]*'
> + prototype_end1 = r'[^(]*'
> + prototype_end2 = r'[^{]*'
> prototype_end = fr'\(({prototype_end1}|{prototype_end2})\)'
>
> # Besides compiling, Perl qr{[\w\s]+} works as a non-capturing group.
> @@ -1044,7 +1044,7 @@ class KernelDoc:
> Stores a typedef inside self.entries array.
> """
>
> - typedef_type = r'((?:\s+[\w\*]+\b){0,7}\s+(?:\w+\b|\*+))\s*'
> + typedef_type = r'((?:\s+[\w*]+\b){0,7}\s+(?:\w+\b|\*+))\s*'
> typedef_ident = r'\*?\s*(\w\S+)\s*'
> typedef_args = r'\s*\((.*)\);'
>
> @@ -1265,7 +1265,7 @@ class KernelDoc:
> self.dump_section()
>
> # Look for doc_com + <text> + doc_end:
> - r = KernRe(r'\s*\*\s*[a-zA-Z_0-9:\.]+\*/')
> + r = KernRe(r'\s*\*\s*[a-zA-Z_0-9:.]+\*/')
> if r.match(line):
> self.emit_msg(ln, f"suspicious ending line: {line}")
>
> @@ -1476,14 +1476,14 @@ class KernelDoc:
> """Ancillary routine to process a function prototype"""
>
> # strip C99-style comments to end of line
> - line = KernRe(r"\/\/.*$", re.S).sub('', line)
> + line = KernRe(r"//.*$", re.S).sub('', line)
> #
> # Soak up the line's worth of prototype text, stopping at { or ; if present.
> #
> if KernRe(r'\s*#\s*define').match(line):
> self.entry.prototype = line
> elif not line.startswith('#'): # skip other preprocessor stuff
> - r = KernRe(r'([^\{]*)')
> + r = KernRe(r'([^{]*)')
> if r.match(line):
> self.entry.prototype += r.group(1) + " "
> #
Thanks,
Mauro
next prev parent reply other threads:[~2025-08-01 4:27 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-01 0:13 [PATCH 00/12] docs: kdoc: thrash up dump_struct() Jonathan Corbet
2025-08-01 0:13 ` [PATCH 01/12] docs: kdoc: consolidate the stripping of private struct/union members Jonathan Corbet
2025-08-01 5:29 ` Mauro Carvalho Chehab
2025-08-01 0:13 ` [PATCH 02/12] docs: kdoc: Move a regex line in dump_struct() Jonathan Corbet
2025-08-01 5:29 ` Mauro Carvalho Chehab
2025-08-01 0:13 ` [PATCH 03/12] docs: kdoc: backslashectomy in kdoc_parser Jonathan Corbet
2025-08-01 4:27 ` Mauro Carvalho Chehab [this message]
2025-08-01 14:21 ` Jonathan Corbet
2025-08-04 12:58 ` Mauro Carvalho Chehab
2025-08-04 16:00 ` Mauro Carvalho Chehab
2025-08-04 18:29 ` Jonathan Corbet
2025-08-01 0:13 ` [PATCH 04/12] docs: kdoc: move the prefix transforms out of dump_struct() Jonathan Corbet
2025-08-01 5:28 ` Mauro Carvalho Chehab
2025-08-01 5:35 ` Mauro Carvalho Chehab
2025-08-01 0:13 ` [PATCH 05/12] docs: kdoc: split top-level prototype parsing " Jonathan Corbet
2025-08-01 5:34 ` Mauro Carvalho Chehab
2025-08-01 14:10 ` Jonathan Corbet
2025-08-04 12:20 ` Mauro Carvalho Chehab
2025-08-01 0:13 ` [PATCH 06/12] docs: kdoc: split struct-member rewriting " Jonathan Corbet
2025-08-01 5:37 ` Mauro Carvalho Chehab
2025-08-01 0:13 ` [PATCH 07/12] docs: kdoc: rework the rewrite_struct_members() main loop Jonathan Corbet
2025-08-01 5:42 ` Mauro Carvalho Chehab
2025-08-01 0:13 ` [PATCH 08/12] docs: kdoc: remove an extraneous strip() call Jonathan Corbet
2025-08-01 5:45 ` Mauro Carvalho Chehab
2025-08-01 0:13 ` [PATCH 09/12] docs: kdoc: Some rewrite_struct_members() commenting Jonathan Corbet
2025-08-01 5:50 ` Mauro Carvalho Chehab
2025-08-01 0:13 ` [PATCH 10/12] docs: kdoc: further rewrite_struct_members() cleanup Jonathan Corbet
2025-08-01 6:07 ` Mauro Carvalho Chehab
2025-08-01 22:52 ` Jonathan Corbet
2025-08-04 13:15 ` Mauro Carvalho Chehab
2025-08-05 22:46 ` Jonathan Corbet
2025-08-06 9:05 ` Mauro Carvalho Chehab
2025-08-06 13:00 ` Jonathan Corbet
2025-08-06 21:27 ` Mauro Carvalho Chehab
2025-08-01 0:13 ` [PATCH 11/12] docs: kdoc: extract output formatting from dump_struct() Jonathan Corbet
2025-08-01 6:09 ` Mauro Carvalho Chehab
2025-08-01 0:13 ` [PATCH 12/12] docs: kdoc: a few final dump_struct() touches Jonathan Corbet
2025-08-01 6:10 ` Mauro Carvalho Chehab
2025-08-01 6:23 ` [PATCH 00/12] docs: kdoc: thrash up dump_struct() Mauro Carvalho Chehab
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250801062710.552dac5a@foz.lan \
--to=mchehab+huawei@kernel.org \
--cc=akiyks@gmail.com \
--cc=corbet@lwn.net \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).