All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
To: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	Akira Yokosawa <akiyks@gmail.com>
Subject: Re: [PATCH 03/12] docs: kdoc: backslashectomy in kdoc_parser
Date: Fri, 1 Aug 2025 06:27:10 +0200	[thread overview]
Message-ID: <20250801062710.552dac5a@foz.lan> (raw)
In-Reply-To: <20250801001326.924276-4-corbet@lwn.net>

Em Thu, 31 Jul 2025 18:13:17 -0600
Jonathan Corbet <corbet@lwn.net> escreveu:

> A lot of the regular expressions in this file have extraneous backslashes

This one is a bit scary... It could actually cause issues somewhere.
Also, IMHO, some expressions look worse on my eyes ;-)

> that may have been needed in Perl, but aren't helpful here.  Take them out
> to reduce slightly the visual noise.

No idea if Perl actually requires, but, at least for me, I do prefer to
see all special characters properly escaped with a backslash. This way,
it is a lot clearer that what it is expecting is a string, instead of
using something that may affect regex processing.

This is specially important for my eyes when expecting for dots,
parenthesis and brackets.


> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
> ---
>  scripts/lib/kdoc/kdoc_parser.py | 40 ++++++++++++++++-----------------
>  1 file changed, 20 insertions(+), 20 deletions(-)
> 
> diff --git a/scripts/lib/kdoc/kdoc_parser.py b/scripts/lib/kdoc/kdoc_parser.py
> index 9948ede739a5..e1efa65a3480 100644
> --- a/scripts/lib/kdoc/kdoc_parser.py
> +++ b/scripts/lib/kdoc/kdoc_parser.py
> @@ -46,7 +46,7 @@ doc_decl = doc_com + KernRe(r'(\w+)', cache=False)
>  known_section_names = 'description|context|returns?|notes?|examples?'
>  known_sections = KernRe(known_section_names, flags = re.I)
>  doc_sect = doc_com + \
> -    KernRe(r'\s*(\@[.\w]+|\@\.\.\.|' + known_section_names + r')\s*:([^:].*)?$',
> +    KernRe(r'\s*(@[.\w]+|@\.\.\.|' + known_section_names + r')\s*:([^:].*)?$',
>             flags=re.I, cache=False)
>  
>  doc_content = doc_com_body + KernRe(r'(.*)', cache=False)
> @@ -60,7 +60,7 @@ attribute = KernRe(r"__attribute__\s*\(\([a-z0-9,_\*\s\(\)]*\)\)",
>  export_symbol = KernRe(r'^\s*EXPORT_SYMBOL(_GPL)?\s*\(\s*(\w+)\s*\)\s*', cache=False)
>  export_symbol_ns = KernRe(r'^\s*EXPORT_SYMBOL_NS(_GPL)?\s*\(\s*(\w+)\s*,\s*"\S+"\)\s*', cache=False)
>  
> -type_param = KernRe(r"\@(\w*((\.\w+)|(->\w+))*(\.\.\.)?)", cache=False)
> +type_param = KernRe(r"@(\w*((\.\w+)|(->\w+))*(\.\.\.)?)", cache=False)
>  
>  #
>  # Tests for the beginning of a kerneldoc block in its various forms.
> @@ -331,7 +331,7 @@ class KernelDoc:
>  
>          self.entry.anon_struct_union = False
>  
> -        param = KernRe(r'[\[\)].*').sub('', param, count=1)
> +        param = KernRe(r'[)[].*').sub('', param, count=1)

This one, for instance, IMHO looks a lot worse for my eyes to understand
that there is a "[" that it is not an operator, but instead a string.
The open close parenthesis also looks weird. My regex-trained eyes think
that this would be part of a capture group.

>  
>          if dtype == "" and param.endswith("..."):
>              if KernRe(r'\w\.\.\.$').search(param):
> @@ -405,7 +405,7 @@ class KernelDoc:
>  
>          for arg in args.split(splitter):
>              # Strip comments
> -            arg = KernRe(r'\/\*.*\*\/').sub('', arg)
> +            arg = KernRe(r'/\*.*\*/').sub('', arg)

A pattern like /..../ is a standard way to pass search group with Regex
on many languages and utils that accept regular expressions like the
sed command. Dropping the backslash here IMHO makes it confusing ;-)

>  
>              # Ignore argument attributes
>              arg = KernRe(r'\sPOS0?\s').sub(' ', arg)
> @@ -428,14 +428,14 @@ class KernelDoc:
>  
>                  arg = arg.replace('#', ',')
>  
> -                r = KernRe(r'[^\(]+\(\*?\s*([\w\[\]\.]*)\s*\)')
> +                r = KernRe(r'[^(]+\(\*?\s*([\w[\].]*)\s*\)')

Here, [.] is also a lot more confusing for me than [\.]

Ok, both works the same way on all implementations I know, but, as a doc
means any character, I need to re-read this two or three times to understand
that here it is waiting for a dot character instead of any character.


----

Here, I became too tired of reading regular expressions... better
stop to avoid headaches ;-)

Seriously, IMHO this patch makes a lot worse to understand what brackets,
parenthesis and dots are strings, and which ones are part of the regex
syntax. 


>                  if r.match(arg):
>                      param = r.group(1)
>                  else:
>                      self.emit_msg(ln, f"Invalid param: {arg}")
>                      param = arg
>  
> -                dtype = KernRe(r'([^\(]+\(\*?)\s*' + re.escape(param)).sub(r'\1', arg)
> +                dtype = KernRe(r'([^(]+\(\*?)\s*' + re.escape(param)).sub(r'\1', arg)
>                  self.push_parameter(ln, decl_type, param, dtype,
>                                      arg, declaration_name)
>  
> @@ -443,14 +443,14 @@ class KernelDoc:
>                  # Array-of-pointers
>  
>                  arg = arg.replace('#', ',')
> -                r = KernRe(r'[^\(]+\(\s*\*\s*([\w\[\]\.]*?)\s*(\s*\[\s*[\w]+\s*\]\s*)*\)')
> +                r = KernRe(r'[^(]+\(\s*\*\s*([\w[\].]*?)\s*(\s*\[\s*[\w]+\s*\]\s*)*\)')
>                  if r.match(arg):
>                      param = r.group(1)
>                  else:
>                      self.emit_msg(ln, f"Invalid param: {arg}")
>                      param = arg
>  
> -                dtype = KernRe(r'([^\(]+\(\*?)\s*' + re.escape(param)).sub(r'\1', arg)
> +                dtype = KernRe(r'([^(]+\(\*?)\s*' + re.escape(param)).sub(r'\1', arg)
>  
>                  self.push_parameter(ln, decl_type, param, dtype,
>                                      arg, declaration_name)
> @@ -637,8 +637,8 @@ class KernelDoc:
>              # it is better to also move those to the NestedMatch logic,
>              # to ensure that parenthesis will be properly matched.
>  
> -            (KernRe(r'__ETHTOOL_DECLARE_LINK_MODE_MASK\s*\(([^\)]+)\)', re.S), r'DECLARE_BITMAP(\1, __ETHTOOL_LINK_MODE_MASK_NBITS)'),
> -            (KernRe(r'DECLARE_PHY_INTERFACE_MASK\s*\(([^\)]+)\)', re.S), r'DECLARE_BITMAP(\1, PHY_INTERFACE_MODE_MAX)'),
> +            (KernRe(r'__ETHTOOL_DECLARE_LINK_MODE_MASK\s*\(([^)]+)\)', re.S), r'DECLARE_BITMAP(\1, __ETHTOOL_LINK_MODE_MASK_NBITS)'),
> +            (KernRe(r'DECLARE_PHY_INTERFACE_MASK\s*\(([^)]+)\)', re.S), r'DECLARE_BITMAP(\1, PHY_INTERFACE_MODE_MAX)'),
>              (KernRe(r'DECLARE_BITMAP\s*\(' + args_pattern + r',\s*' + args_pattern + r'\)', re.S), r'unsigned long \1[BITS_TO_LONGS(\2)]'),
>              (KernRe(r'DECLARE_HASHTABLE\s*\(' + args_pattern + r',\s*' + args_pattern + r'\)', re.S), r'unsigned long \1[1 << ((\2) - 1)]'),
>              (KernRe(r'DECLARE_KFIFO\s*\(' + args_pattern + r',\s*' + args_pattern + r',\s*' + args_pattern + r'\)', re.S), r'\2 *\1'),
> @@ -700,7 +700,7 @@ class KernelDoc:
>                      s_id = s_id.strip()
>  
>                      newmember += f"{maintype} {s_id}; "
> -                    s_id = KernRe(r'[:\[].*').sub('', s_id)
> +                    s_id = KernRe(r'[:[].*').sub('', s_id)
>                      s_id = KernRe(r'^\s*\**(\S+)\s*').sub(r'\1', s_id)
>  
>                      for arg in content.split(';'):
> @@ -709,7 +709,7 @@ class KernelDoc:
>                          if not arg:
>                              continue
>  
> -                        r = KernRe(r'^([^\(]+\(\*?\s*)([\w\.]*)(\s*\).*)')
> +                        r = KernRe(r'^([^(]+\(\*?\s*)([\w.]*)(\s*\).*)')
>                          if r.match(arg):
>                              # Pointer-to-function
>                              dtype = r.group(1)
> @@ -767,12 +767,12 @@ class KernelDoc:
>          self.check_sections(ln, declaration_name, decl_type)
>  
>          # Adjust declaration for better display
> -        declaration = KernRe(r'([\{;])').sub(r'\1\n', declaration)
> +        declaration = KernRe(r'([{;])').sub(r'\1\n', declaration)
>          declaration = KernRe(r'\}\s+;').sub('};', declaration)
>  
>          # Better handle inlined enums
>          while True:
> -            r = KernRe(r'(enum\s+\{[^\}]+),([^\n])')
> +            r = KernRe(r'(enum\s+\{[^}]+),([^\n])')
>              if not r.search(declaration):
>                  break
>  
> @@ -969,8 +969,8 @@ class KernelDoc:
>          # - pci_match_device, __copy_to_user (long return type)
>  
>          name = r'[a-zA-Z0-9_~:]+'
> -        prototype_end1 = r'[^\(]*'
> -        prototype_end2 = r'[^\{]*'
> +        prototype_end1 = r'[^(]*'
> +        prototype_end2 = r'[^{]*'
>          prototype_end = fr'\(({prototype_end1}|{prototype_end2})\)'
>  
>          # Besides compiling, Perl qr{[\w\s]+} works as a non-capturing group.
> @@ -1044,7 +1044,7 @@ class KernelDoc:
>          Stores a typedef inside self.entries array.
>          """
>  
> -        typedef_type = r'((?:\s+[\w\*]+\b){0,7}\s+(?:\w+\b|\*+))\s*'
> +        typedef_type = r'((?:\s+[\w*]+\b){0,7}\s+(?:\w+\b|\*+))\s*'
>          typedef_ident = r'\*?\s*(\w\S+)\s*'
>          typedef_args = r'\s*\((.*)\);'
>  
> @@ -1265,7 +1265,7 @@ class KernelDoc:
>              self.dump_section()
>  
>              # Look for doc_com + <text> + doc_end:
> -            r = KernRe(r'\s*\*\s*[a-zA-Z_0-9:\.]+\*/')
> +            r = KernRe(r'\s*\*\s*[a-zA-Z_0-9:.]+\*/')
>              if r.match(line):
>                  self.emit_msg(ln, f"suspicious ending line: {line}")
>  
> @@ -1476,14 +1476,14 @@ class KernelDoc:
>          """Ancillary routine to process a function prototype"""
>  
>          # strip C99-style comments to end of line
> -        line = KernRe(r"\/\/.*$", re.S).sub('', line)
> +        line = KernRe(r"//.*$", re.S).sub('', line)
>          #
>          # Soak up the line's worth of prototype text, stopping at { or ; if present.
>          #
>          if KernRe(r'\s*#\s*define').match(line):
>              self.entry.prototype = line
>          elif not line.startswith('#'):   # skip other preprocessor stuff
> -            r = KernRe(r'([^\{]*)')
> +            r = KernRe(r'([^{]*)')
>              if r.match(line):
>                  self.entry.prototype += r.group(1) + " "
>          #



Thanks,
Mauro

  reply	other threads:[~2025-08-01  4:27 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-01  0:13 [PATCH 00/12] docs: kdoc: thrash up dump_struct() Jonathan Corbet
2025-08-01  0:13 ` [PATCH 01/12] docs: kdoc: consolidate the stripping of private struct/union members Jonathan Corbet
2025-08-01  5:29   ` Mauro Carvalho Chehab
2025-08-01  0:13 ` [PATCH 02/12] docs: kdoc: Move a regex line in dump_struct() Jonathan Corbet
2025-08-01  5:29   ` Mauro Carvalho Chehab
2025-08-01  0:13 ` [PATCH 03/12] docs: kdoc: backslashectomy in kdoc_parser Jonathan Corbet
2025-08-01  4:27   ` Mauro Carvalho Chehab [this message]
2025-08-01 14:21     ` Jonathan Corbet
2025-08-04 12:58       ` Mauro Carvalho Chehab
2025-08-04 16:00         ` Mauro Carvalho Chehab
2025-08-04 18:29           ` Jonathan Corbet
2025-08-01  0:13 ` [PATCH 04/12] docs: kdoc: move the prefix transforms out of dump_struct() Jonathan Corbet
2025-08-01  5:28   ` Mauro Carvalho Chehab
2025-08-01  5:35     ` Mauro Carvalho Chehab
2025-08-01  0:13 ` [PATCH 05/12] docs: kdoc: split top-level prototype parsing " Jonathan Corbet
2025-08-01  5:34   ` Mauro Carvalho Chehab
2025-08-01 14:10     ` Jonathan Corbet
2025-08-04 12:20       ` Mauro Carvalho Chehab
2025-08-01  0:13 ` [PATCH 06/12] docs: kdoc: split struct-member rewriting " Jonathan Corbet
2025-08-01  5:37   ` Mauro Carvalho Chehab
2025-08-01  0:13 ` [PATCH 07/12] docs: kdoc: rework the rewrite_struct_members() main loop Jonathan Corbet
2025-08-01  5:42   ` Mauro Carvalho Chehab
2025-08-01  0:13 ` [PATCH 08/12] docs: kdoc: remove an extraneous strip() call Jonathan Corbet
2025-08-01  5:45   ` Mauro Carvalho Chehab
2025-08-01  0:13 ` [PATCH 09/12] docs: kdoc: Some rewrite_struct_members() commenting Jonathan Corbet
2025-08-01  5:50   ` Mauro Carvalho Chehab
2025-08-01  0:13 ` [PATCH 10/12] docs: kdoc: further rewrite_struct_members() cleanup Jonathan Corbet
2025-08-01  6:07   ` Mauro Carvalho Chehab
2025-08-01 22:52     ` Jonathan Corbet
2025-08-04 13:15       ` Mauro Carvalho Chehab
2025-08-05 22:46         ` Jonathan Corbet
2025-08-06  9:05           ` Mauro Carvalho Chehab
2025-08-06 13:00             ` Jonathan Corbet
2025-08-06 21:27               ` Mauro Carvalho Chehab
2025-08-01  0:13 ` [PATCH 11/12] docs: kdoc: extract output formatting from dump_struct() Jonathan Corbet
2025-08-01  6:09   ` Mauro Carvalho Chehab
2025-08-01  0:13 ` [PATCH 12/12] docs: kdoc: a few final dump_struct() touches Jonathan Corbet
2025-08-01  6:10   ` Mauro Carvalho Chehab
2025-08-01  6:23 ` [PATCH 00/12] docs: kdoc: thrash up dump_struct() Mauro Carvalho Chehab

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250801062710.552dac5a@foz.lan \
    --to=mchehab+huawei@kernel.org \
    --cc=akiyks@gmail.com \
    --cc=corbet@lwn.net \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.