Re: [PATCH 03/12] docs: kdoc: backslashectomy in kdoc_parser

linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
To: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	Akira Yokosawa <akiyks@gmail.com>
Subject: Re: [PATCH 03/12] docs: kdoc: backslashectomy in kdoc_parser
Date: Fri, 1 Aug 2025 06:27:10 +0200	[thread overview]
Message-ID: <20250801062710.552dac5a@foz.lan> (raw)
In-Reply-To: <20250801001326.924276-4-corbet@lwn.net>

Em Thu, 31 Jul 2025 18:13:17 -0600
Jonathan Corbet <corbet@lwn.net> escreveu:

> A lot of the regular expressions in this file have extraneous backslashes

This one is a bit scary... It could actually cause issues somewhere.
Also, IMHO, some expressions look worse on my eyes ;-)

> that may have been needed in Perl, but aren't helpful here.  Take them out
> to reduce slightly the visual noise.

No idea if Perl actually requires, but, at least for me, I do prefer to
see all special characters properly escaped with a backslash. This way,
it is a lot clearer that what it is expecting is a string, instead of
using something that may affect regex processing.

This is specially important for my eyes when expecting for dots,
parenthesis and brackets.


> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
> ---
>  scripts/lib/kdoc/kdoc_parser.py | 40 ++++++++++++++++-----------------
>  1 file changed, 20 insertions(+), 20 deletions(-)
> 
> diff --git a/scripts/lib/kdoc/kdoc_parser.py b/scripts/lib/kdoc/kdoc_parser.py
> index 9948ede739a5..e1efa65a3480 100644
> --- a/scripts/lib/kdoc/kdoc_parser.py
> +++ b/scripts/lib/kdoc/kdoc_parser.py
> @@ -46,7 +46,7 @@ doc_decl = doc_com + KernRe(r'(\w+)', cache=False)
>  known_section_names = 'description|context|returns?|notes?|examples?'
>  known_sections = KernRe(known_section_names, flags = re.I)
>  doc_sect = doc_com + \
> -    KernRe(r'\s*(\@[.\w]+|\@\.\.\.|' + known_section_names + r')\s*:([^:].*)?$',
> +    KernRe(r'\s*(@[.\w]+|@\.\.\.|' + known_section_names + r')\s*:([^:].*)?$',
>             flags=re.I, cache=False)
>  
>  doc_content = doc_com_body + KernRe(r'(.*)', cache=False)
> @@ -60,7 +60,7 @@ attribute = KernRe(r"__attribute__\s*\(\([a-z0-9,_\*\s\(\)]*\)\)",
>  export_symbol = KernRe(r'^\s*EXPORT_SYMBOL(_GPL)?\s*\(\s*(\w+)\s*\)\s*', cache=False)
>  export_symbol_ns = KernRe(r'^\s*EXPORT_SYMBOL_NS(_GPL)?\s*\(\s*(\w+)\s*,\s*"\S+"\)\s*', cache=False)
>  
> -type_param = KernRe(r"\@(\w*((\.\w+)|(->\w+))*(\.\.\.)?)", cache=False)
> +type_param = KernRe(r"@(\w*((\.\w+)|(->\w+))*(\.\.\.)?)", cache=False)
>  
>  #
>  # Tests for the beginning of a kerneldoc block in its various forms.
> @@ -331,7 +331,7 @@ class KernelDoc:
>  
>          self.entry.anon_struct_union = False
>  
> -        param = KernRe(r'[\[\)].*').sub('', param, count=1)
> +        param = KernRe(r'[)[].*').sub('', param, count=1)

This one, for instance, IMHO looks a lot worse for my eyes to understand
that there is a "[" that it is not an operator, but instead a string.
The open close parenthesis also looks weird. My regex-trained eyes think
that this would be part of a capture group.

>  
>          if dtype == "" and param.endswith("..."):
>              if KernRe(r'\w\.\.\.$').search(param):
> @@ -405,7 +405,7 @@ class KernelDoc:
>  
>          for arg in args.split(splitter):
>              # Strip comments
> -            arg = KernRe(r'\/\*.*\*\/').sub('', arg)
> +            arg = KernRe(r'/\*.*\*/').sub('', arg)

A pattern like /..../ is a standard way to pass search group with Regex
on many languages and utils that accept regular expressions like the
sed command. Dropping the backslash here IMHO makes it confusing ;-)

>  
>              # Ignore argument attributes
>              arg = KernRe(r'\sPOS0?\s').sub(' ', arg)
> @@ -428,14 +428,14 @@ class KernelDoc:
>  
>                  arg = arg.replace('#', ',')
>  
> -                r = KernRe(r'[^\(]+\(\*?\s*([\w\[\]\.]*)\s*\)')
> +                r = KernRe(r'[^(]+\(\*?\s*([\w[\].]*)\s*\)')

Here, [.] is also a lot more confusing for me than [\.]

Ok, both works the same way on all implementations I know, but, as a doc
means any character, I need to re-read this two or three times to understand
that here it is waiting for a dot character instead of any character.


----

Here, I became too tired of reading regular expressions... better
stop to avoid headaches ;-)

Seriously, IMHO this patch makes a lot worse to understand what brackets,
parenthesis and dots are strings, and which ones are part of the regex
syntax. 


>                  if r.match(arg):
>                      param = r.group(1)
>                  else:
>                      self.emit_msg(ln, f"Invalid param: {arg}")
>                      param = arg
>  
> -                dtype = KernRe(r'([^\(]+\(\*?)\s*' + re.escape(param)).sub(r'\1', arg)
> +                dtype = KernRe(r'([^(]+\(\*?)\s*' + re.escape(param)).sub(r'\1', arg)
>                  self.push_parameter(ln, decl_type, param, dtype,
>                                      arg, declaration_name)
>  
> @@ -443,14 +443,14 @@ class KernelDoc:
>                  # Array-of-pointers
>  
>                  arg = arg.replace('#', ',')
> -                r = KernRe(r'[^\(]+\(\s*\*\s*([\w\[\]\.]*?)\s*(\s*\[\s*[\w]+\s*\]\s*)*\)')
> +                r = KernRe(r'[^(]+\(\s*\*\s*([\w[\].]*?)\s*(\s*\[\s*[\w]+\s*\]\s*)*\)')
>                  if r.match(arg):
>                      param = r.group(1)
>                  else:
>                      self.emit_msg(ln, f"Invalid param: {arg}")
>                      param = arg
>  
> -                dtype = KernRe(r'([^\(]+\(\*?)\s*' + re.escape(param)).sub(r'\1', arg)
> +                dtype = KernRe(r'([^(]+\(\*?)\s*' + re.escape(param)).sub(r'\1', arg)
>  
>                  self.push_parameter(ln, decl_type, param, dtype,
>                                      arg, declaration_name)
> @@ -637,8 +637,8 @@ class KernelDoc:
>              # it is better to also move those to the NestedMatch logic,
>              # to ensure that parenthesis will be properly matched.
>  
> -            (KernRe(r'__ETHTOOL_DECLARE_LINK_MODE_MASK\s*\(([^\)]+)\)', re.S), r'DECLARE_BITMAP(\1, __ETHTOOL_LINK_MODE_MASK_NBITS)'),
> -            (KernRe(r'DECLARE_PHY_INTERFACE_MASK\s*\(([^\)]+)\)', re.S), r'DECLARE_BITMAP(\1, PHY_INTERFACE_MODE_MAX)'),
> +            (KernRe(r'__ETHTOOL_DECLARE_LINK_MODE_MASK\s*\(([^)]+)\)', re.S), r'DECLARE_BITMAP(\1, __ETHTOOL_LINK_MODE_MASK_NBITS)'),
> +            (KernRe(r'DECLARE_PHY_INTERFACE_MASK\s*\(([^)]+)\)', re.S), r'DECLARE_BITMAP(\1, PHY_INTERFACE_MODE_MAX)'),
>              (KernRe(r'DECLARE_BITMAP\s*\(' + args_pattern + r',\s*' + args_pattern + r'\)', re.S), r'unsigned long \1[BITS_TO_LONGS(\2)]'),
>              (KernRe(r'DECLARE_HASHTABLE\s*\(' + args_pattern + r',\s*' + args_pattern + r'\)', re.S), r'unsigned long \1[1 << ((\2) - 1)]'),
>              (KernRe(r'DECLARE_KFIFO\s*\(' + args_pattern + r',\s*' + args_pattern + r',\s*' + args_pattern + r'\)', re.S), r'\2 *\1'),
> @@ -700,7 +700,7 @@ class KernelDoc:
>                      s_id = s_id.strip()
>  
>                      newmember += f"{maintype} {s_id}; "
> -                    s_id = KernRe(r'[:\[].*').sub('', s_id)
> +                    s_id = KernRe(r'[:[].*').sub('', s_id)
>                      s_id = KernRe(r'^\s*\**(\S+)\s*').sub(r'\1', s_id)
>  
>                      for arg in content.split(';'):
> @@ -709,7 +709,7 @@ class KernelDoc:
>                          if not arg:
>                              continue
>  
> -                        r = KernRe(r'^([^\(]+\(\*?\s*)([\w\.]*)(\s*\).*)')
> +                        r = KernRe(r'^([^(]+\(\*?\s*)([\w.]*)(\s*\).*)')
>                          if r.match(arg):
>                              # Pointer-to-function
>                              dtype = r.group(1)
> @@ -767,12 +767,12 @@ class KernelDoc:
>          self.check_sections(ln, declaration_name, decl_type)
>  
>          # Adjust declaration for better display
> -        declaration = KernRe(r'([\{;])').sub(r'\1\n', declaration)
> +        declaration = KernRe(r'([{;])').sub(r'\1\n', declaration)
>          declaration = KernRe(r'\}\s+;').sub('};', declaration)
>  
>          # Better handle inlined enums
>          while True:
> -            r = KernRe(r'(enum\s+\{[^\}]+),([^\n])')
> +            r = KernRe(r'(enum\s+\{[^}]+),([^\n])')
>              if not r.search(declaration):
>                  break
>  
> @@ -969,8 +969,8 @@ class KernelDoc:
>          # - pci_match_device, __copy_to_user (long return type)
>  
>          name = r'[a-zA-Z0-9_~:]+'
> -        prototype_end1 = r'[^\(]*'
> -        prototype_end2 = r'[^\{]*'
> +        prototype_end1 = r'[^(]*'
> +        prototype_end2 = r'[^{]*'
>          prototype_end = fr'\(({prototype_end1}|{prototype_end2})\)'
>  
>          # Besides compiling, Perl qr{[\w\s]+} works as a non-capturing group.
> @@ -1044,7 +1044,7 @@ class KernelDoc:
>          Stores a typedef inside self.entries array.
>          """
>  
> -        typedef_type = r'((?:\s+[\w\*]+\b){0,7}\s+(?:\w+\b|\*+))\s*'
> +        typedef_type = r'((?:\s+[\w*]+\b){0,7}\s+(?:\w+\b|\*+))\s*'
>          typedef_ident = r'\*?\s*(\w\S+)\s*'
>          typedef_args = r'\s*\((.*)\);'
>  
> @@ -1265,7 +1265,7 @@ class KernelDoc:
>              self.dump_section()
>  
>              # Look for doc_com + <text> + doc_end:
> -            r = KernRe(r'\s*\*\s*[a-zA-Z_0-9:\.]+\*/')
> +            r = KernRe(r'\s*\*\s*[a-zA-Z_0-9:.]+\*/')
>              if r.match(line):
>                  self.emit_msg(ln, f"suspicious ending line: {line}")
>  
> @@ -1476,14 +1476,14 @@ class KernelDoc:
>          """Ancillary routine to process a function prototype"""
>  
>          # strip C99-style comments to end of line
> -        line = KernRe(r"\/\/.*$", re.S).sub('', line)
> +        line = KernRe(r"//.*$", re.S).sub('', line)
>          #
>          # Soak up the line's worth of prototype text, stopping at { or ; if present.
>          #
>          if KernRe(r'\s*#\s*define').match(line):
>              self.entry.prototype = line
>          elif not line.startswith('#'):   # skip other preprocessor stuff
> -            r = KernRe(r'([^\{]*)')
> +            r = KernRe(r'([^{]*)')
>              if r.match(line):
>                  self.entry.prototype += r.group(1) + " "
>          #



Thanks,
Mauro

next prev parent reply	other threads:[~2025-08-01  4:27 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-01  0:13 [PATCH 00/12] docs: kdoc: thrash up dump_struct() Jonathan Corbet
2025-08-01  0:13 ` [PATCH 01/12] docs: kdoc: consolidate the stripping of private struct/union members Jonathan Corbet
2025-08-01  5:29   ` Mauro Carvalho Chehab
2025-08-01  0:13 ` [PATCH 02/12] docs: kdoc: Move a regex line in dump_struct() Jonathan Corbet
2025-08-01  5:29   ` Mauro Carvalho Chehab
2025-08-01  0:13 ` [PATCH 03/12] docs: kdoc: backslashectomy in kdoc_parser Jonathan Corbet
2025-08-01  4:27   ` Mauro Carvalho Chehab [this message]
2025-08-01 14:21     ` Jonathan Corbet
2025-08-04 12:58       ` Mauro Carvalho Chehab
2025-08-04 16:00         ` Mauro Carvalho Chehab
2025-08-04 18:29           ` Jonathan Corbet
2025-08-01  0:13 ` [PATCH 04/12] docs: kdoc: move the prefix transforms out of dump_struct() Jonathan Corbet
2025-08-01  5:28   ` Mauro Carvalho Chehab
2025-08-01  5:35     ` Mauro Carvalho Chehab
2025-08-01  0:13 ` [PATCH 05/12] docs: kdoc: split top-level prototype parsing " Jonathan Corbet
2025-08-01  5:34   ` Mauro Carvalho Chehab
2025-08-01 14:10     ` Jonathan Corbet
2025-08-04 12:20       ` Mauro Carvalho Chehab
2025-08-01  0:13 ` [PATCH 06/12] docs: kdoc: split struct-member rewriting " Jonathan Corbet
2025-08-01  5:37   ` Mauro Carvalho Chehab
2025-08-01  0:13 ` [PATCH 07/12] docs: kdoc: rework the rewrite_struct_members() main loop Jonathan Corbet
2025-08-01  5:42   ` Mauro Carvalho Chehab
2025-08-01  0:13 ` [PATCH 08/12] docs: kdoc: remove an extraneous strip() call Jonathan Corbet
2025-08-01  5:45   ` Mauro Carvalho Chehab
2025-08-01  0:13 ` [PATCH 09/12] docs: kdoc: Some rewrite_struct_members() commenting Jonathan Corbet
2025-08-01  5:50   ` Mauro Carvalho Chehab
2025-08-01  0:13 ` [PATCH 10/12] docs: kdoc: further rewrite_struct_members() cleanup Jonathan Corbet
2025-08-01  6:07   ` Mauro Carvalho Chehab
2025-08-01 22:52     ` Jonathan Corbet
2025-08-04 13:15       ` Mauro Carvalho Chehab
2025-08-05 22:46         ` Jonathan Corbet
2025-08-06  9:05           ` Mauro Carvalho Chehab
2025-08-06 13:00             ` Jonathan Corbet
2025-08-06 21:27               ` Mauro Carvalho Chehab
2025-08-01  0:13 ` [PATCH 11/12] docs: kdoc: extract output formatting from dump_struct() Jonathan Corbet
2025-08-01  6:09   ` Mauro Carvalho Chehab
2025-08-01  0:13 ` [PATCH 12/12] docs: kdoc: a few final dump_struct() touches Jonathan Corbet
2025-08-01  6:10   ` Mauro Carvalho Chehab
2025-08-01  6:23 ` [PATCH 00/12] docs: kdoc: thrash up dump_struct() Mauro Carvalho Chehab

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250801062710.552dac5a@foz.lan \
    --to=mchehab+huawei@kernel.org \
    --cc=akiyks@gmail.com \
    --cc=corbet@lwn.net \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).