[PATCH v2 0/7] Further kernel-doc tweakery

linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/7] Further kernel-doc tweakery
@ 2025-07-03 18:43 Jonathan Corbet
  2025-07-03 18:43 ` [PATCH v2 1/7] docs: kdoc: don't reinvent string.strip() Jonathan Corbet
                   ` (7 more replies)
  0 siblings, 8 replies; 15+ messages in thread
From: Jonathan Corbet @ 2025-07-03 18:43 UTC (permalink / raw)
  To: linux-doc
  Cc: linux-kernel, Mauro Carvalho Chehab, Akira Yokosawa,
	Jonathan Corbet

This is a set of miscellaneous improvements, finishing my pass over the
first parsing pass and getting into the second ("dump_*") pass.

Changes from v1:
 - Apply tags
 - Rework the KernRe microoptimization to avoid exceptions
 - Fix the stupid white-space error in patch 7

Jonathan Corbet (7):
  docs: kdoc: don't reinvent string.strip()
  docs: kdoc: micro-optimize KernRe
  docs: kdoc: remove the brcount floor in process_proto_type()
  docs: kdoc: rework type prototype parsing
  docs: kdoc: some tweaks to process_proto_function()
  docs: kdoc: Remove a Python 2 comment
  docs: kdoc: pretty up dump_enum()

 Documentation/sphinx/kerneldoc.py |   2 -
 scripts/lib/kdoc/kdoc_parser.py   | 150 +++++++++++++++---------------
 scripts/lib/kdoc/kdoc_re.py       |   7 +-
 3 files changed, 79 insertions(+), 80 deletions(-)

-- 
2.49.0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 1/7] docs: kdoc: don't reinvent string.strip()
  2025-07-03 18:43 [PATCH v2 0/7] Further kernel-doc tweakery Jonathan Corbet
@ 2025-07-03 18:43 ` Jonathan Corbet
  2025-07-03 18:43 ` [PATCH v2 2/7] docs: kdoc: micro-optimize KernRe Jonathan Corbet
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Jonathan Corbet @ 2025-07-03 18:43 UTC (permalink / raw)
  To: linux-doc
  Cc: linux-kernel, Mauro Carvalho Chehab, Akira Yokosawa,
	Jonathan Corbet

process_proto_type() and process_proto_function() reinventing the strip()
string method with a whole series of separate regexes; take all that out
and just use strip().

The previous implementation also (in process_proto_type()) removed C++
comments *after* the above dance, leaving trailing whitespace in that case;
now we do the stripping afterward.  This results in exactly one output
change: the removal of a spurious space in the definition of
BACKLIGHT_POWER_REDUCED - see
https://docs.kernel.org/gpu/backlight.html#c.backlight_properties.

I note that we are putting semicolons after #define lines that really
shouldn't be there - a task for another day.

Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 scripts/lib/kdoc/kdoc_parser.py | 27 +++++----------------------
 1 file changed, 5 insertions(+), 22 deletions(-)

diff --git a/scripts/lib/kdoc/kdoc_parser.py b/scripts/lib/kdoc/kdoc_parser.py
index 93938155fce2..d9ff2d066160 100644
--- a/scripts/lib/kdoc/kdoc_parser.py
+++ b/scripts/lib/kdoc/kdoc_parser.py
@@ -1567,17 +1567,9 @@ class KernelDoc:
                 self.entry.prototype += r.group(1) + " "
 
         if '{' in line or ';' in line or KernRe(r'\s*#\s*define').match(line):
-            # strip comments
-            r = KernRe(r'/\*.*?\*/')
-            self.entry.prototype = r.sub('', self.entry.prototype)
-
-            # strip newlines/cr's
-            r = KernRe(r'[\r\n]+')
-            self.entry.prototype = r.sub(' ', self.entry.prototype)
-
-            # strip leading spaces
-            r = KernRe(r'^\s+')
-            self.entry.prototype = r.sub('', self.entry.prototype)
+            # strip comments and surrounding spaces
+            r = KernRe(r'/\*.*\*/')
+            self.entry.prototype = r.sub('', self.entry.prototype).strip()
 
             # Handle self.entry.prototypes for function pointers like:
             #       int (*pcs_config)(struct foo)
@@ -1600,17 +1592,8 @@ class KernelDoc:
     def process_proto_type(self, ln, line):
         """Ancillary routine to process a type"""
 
-        # Strip newlines/cr's.
-        line = KernRe(r'[\r\n]+', re.S).sub(' ', line)
-
-        # Strip leading spaces
-        line = KernRe(r'^\s+', re.S).sub('', line)
-
-        # Strip trailing spaces
-        line = KernRe(r'\s+$', re.S).sub('', line)
-
-        # Strip C99-style comments to the end of the line
-        line = KernRe(r"\/\/.*$", re.S).sub('', line)
+        # Strip C99-style comments and surrounding whitespace
+        line = KernRe(r"//.*$", re.S).sub('', line).strip()
 
         # To distinguish preprocessor directive from regular declaration later.
         if line.startswith('#'):
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 2/7] docs: kdoc: micro-optimize KernRe
  2025-07-03 18:43 [PATCH v2 0/7] Further kernel-doc tweakery Jonathan Corbet
  2025-07-03 18:43 ` [PATCH v2 1/7] docs: kdoc: don't reinvent string.strip() Jonathan Corbet
@ 2025-07-03 18:43 ` Jonathan Corbet
  2025-07-03 22:31   ` Mauro Carvalho Chehab
  2025-07-03 18:43 ` [PATCH v2 3/7] docs: kdoc: remove the brcount floor in process_proto_type() Jonathan Corbet
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 15+ messages in thread
From: Jonathan Corbet @ 2025-07-03 18:43 UTC (permalink / raw)
  To: linux-doc
  Cc: linux-kernel, Mauro Carvalho Chehab, Akira Yokosawa,
	Jonathan Corbet

Rework _add_regex() to avoid doing the lookup twice for the (hopefully
common) cache-hit case.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 scripts/lib/kdoc/kdoc_re.py | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/scripts/lib/kdoc/kdoc_re.py b/scripts/lib/kdoc/kdoc_re.py
index e81695b273bf..612223e1e723 100644
--- a/scripts/lib/kdoc/kdoc_re.py
+++ b/scripts/lib/kdoc/kdoc_re.py
@@ -29,12 +29,9 @@ class KernRe:
         """
         Adds a new regex or re-use it from the cache.
         """
-
-        if string in re_cache:
-            self.regex = re_cache[string]
-        else:
+        self.regex = re_cache.get(string, None)
+        if not self.regex:
             self.regex = re.compile(string, flags=flags)
-
             if self.cache:
                 re_cache[string] = self.regex
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/7] docs: kdoc: micro-optimize KernRe
  2025-07-03 18:43 ` [PATCH v2 2/7] docs: kdoc: micro-optimize KernRe Jonathan Corbet
@ 2025-07-03 22:31   ` Mauro Carvalho Chehab
  2025-07-03 22:32     ` Mauro Carvalho Chehab
  2025-07-03 23:47     ` Jonathan Corbet
  0 siblings, 2 replies; 15+ messages in thread
From: Mauro Carvalho Chehab @ 2025-07-03 22:31 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: linux-doc, linux-kernel, Akira Yokosawa

Em Thu,  3 Jul 2025 12:43:58 -0600
Jonathan Corbet <corbet@lwn.net> escreveu:

> Rework _add_regex() to avoid doing the lookup twice for the (hopefully
> common) cache-hit case.
> 
> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
> ---
>  scripts/lib/kdoc/kdoc_re.py | 7 ++-----
>  1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/scripts/lib/kdoc/kdoc_re.py b/scripts/lib/kdoc/kdoc_re.py
> index e81695b273bf..612223e1e723 100644
> --- a/scripts/lib/kdoc/kdoc_re.py
> +++ b/scripts/lib/kdoc/kdoc_re.py
> @@ -29,12 +29,9 @@ class KernRe:
>          """
>          Adds a new regex or re-use it from the cache.
>          """
> -
> -        if string in re_cache:
> -            self.regex = re_cache[string]
> -        else:
> +        self.regex = re_cache.get(string, None)

With get, None is default...

> +        if not self.regex:
>              self.regex = re.compile(string, flags=flags)

... yet, as you're using get, better to code it as:

	self.regex = re_cache.get(string, re.compile(string, flags=flags))

> -
>              if self.cache:
>                  re_cache[string] = self.regex
>  



Thanks,
Mauro

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/7] docs: kdoc: micro-optimize KernRe
  2025-07-03 22:31   ` Mauro Carvalho Chehab
@ 2025-07-03 22:32     ` Mauro Carvalho Chehab
  2025-07-03 23:47     ` Jonathan Corbet
  1 sibling, 0 replies; 15+ messages in thread
From: Mauro Carvalho Chehab @ 2025-07-03 22:32 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: linux-doc, linux-kernel, Akira Yokosawa

Em Fri, 4 Jul 2025 00:31:46 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> escreveu:

> Em Thu,  3 Jul 2025 12:43:58 -0600
> Jonathan Corbet <corbet@lwn.net> escreveu:
> 
> > Rework _add_regex() to avoid doing the lookup twice for the (hopefully
> > common) cache-hit case.
> > 
> > Signed-off-by: Jonathan Corbet <corbet@lwn.net>
> > ---
> >  scripts/lib/kdoc/kdoc_re.py | 7 ++-----
> >  1 file changed, 2 insertions(+), 5 deletions(-)
> > 
> > diff --git a/scripts/lib/kdoc/kdoc_re.py b/scripts/lib/kdoc/kdoc_re.py
> > index e81695b273bf..612223e1e723 100644
> > --- a/scripts/lib/kdoc/kdoc_re.py
> > +++ b/scripts/lib/kdoc/kdoc_re.py
> > @@ -29,12 +29,9 @@ class KernRe:
> >          """
> >          Adds a new regex or re-use it from the cache.
> >          """
> > -
> > -        if string in re_cache:
> > -            self.regex = re_cache[string]
> > -        else:
> > +        self.regex = re_cache.get(string, None)
> 
> With get, None is default...
> 
> > +        if not self.regex:
> >              self.regex = re.compile(string, flags=flags)
> 
> ... yet, as you're using get, better to code it as:
> 
> 	self.regex = re_cache.get(string, re.compile(string, flags=flags))

For got to mention: with or without that:

Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

> 
> > -
> >              if self.cache:
> >                  re_cache[string] = self.regex
> >  
> 
> 
> 
> Thanks,
> Mauro



Thanks,
Mauro

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/7] docs: kdoc: micro-optimize KernRe
  2025-07-03 22:31   ` Mauro Carvalho Chehab
  2025-07-03 22:32     ` Mauro Carvalho Chehab
@ 2025-07-03 23:47     ` Jonathan Corbet
  2025-07-04  6:01       ` Mauro Carvalho Chehab
  1 sibling, 1 reply; 15+ messages in thread
From: Jonathan Corbet @ 2025-07-03 23:47 UTC (permalink / raw)
  To: Mauro Carvalho Chehab; +Cc: linux-doc, linux-kernel, Akira Yokosawa

Mauro Carvalho Chehab <mchehab+huawei@kernel.org> writes:

> Em Thu,  3 Jul 2025 12:43:58 -0600
> Jonathan Corbet <corbet@lwn.net> escreveu:
>
>> Rework _add_regex() to avoid doing the lookup twice for the (hopefully
>> common) cache-hit case.
>> 
>> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
>> ---
>>  scripts/lib/kdoc/kdoc_re.py | 7 ++-----
>>  1 file changed, 2 insertions(+), 5 deletions(-)
>> 
>> diff --git a/scripts/lib/kdoc/kdoc_re.py b/scripts/lib/kdoc/kdoc_re.py
>> index e81695b273bf..612223e1e723 100644
>> --- a/scripts/lib/kdoc/kdoc_re.py
>> +++ b/scripts/lib/kdoc/kdoc_re.py
>> @@ -29,12 +29,9 @@ class KernRe:
>>          """
>>          Adds a new regex or re-use it from the cache.
>>          """
>> -
>> -        if string in re_cache:
>> -            self.regex = re_cache[string]
>> -        else:
>> +        self.regex = re_cache.get(string, None)
>
> With get, None is default...
>
>> +        if not self.regex:
>>              self.regex = re.compile(string, flags=flags)
>
> ... yet, as you're using get, better to code it as:
>
> 	self.regex = re_cache.get(string, re.compile(string, flags=flags))

...but that will recompile the regex each time, defeating the purpose of
the cache, no?

Thanks,

jon

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/7] docs: kdoc: micro-optimize KernRe
  2025-07-03 23:47     ` Jonathan Corbet
@ 2025-07-04  6:01       ` Mauro Carvalho Chehab
  2025-07-04 14:59         ` Jonathan Corbet
  0 siblings, 1 reply; 15+ messages in thread
From: Mauro Carvalho Chehab @ 2025-07-04  6:01 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: linux-doc, linux-kernel, Akira Yokosawa

Em Thu, 03 Jul 2025 17:47:13 -0600
Jonathan Corbet <corbet@lwn.net> escreveu:

> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> writes:
> 
> > Em Thu,  3 Jul 2025 12:43:58 -0600
> > Jonathan Corbet <corbet@lwn.net> escreveu:
> >  
> >> Rework _add_regex() to avoid doing the lookup twice for the (hopefully
> >> common) cache-hit case.
> >> 
> >> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
> >> ---
> >>  scripts/lib/kdoc/kdoc_re.py | 7 ++-----
> >>  1 file changed, 2 insertions(+), 5 deletions(-)
> >> 
> >> diff --git a/scripts/lib/kdoc/kdoc_re.py b/scripts/lib/kdoc/kdoc_re.py
> >> index e81695b273bf..612223e1e723 100644
> >> --- a/scripts/lib/kdoc/kdoc_re.py
> >> +++ b/scripts/lib/kdoc/kdoc_re.py
> >> @@ -29,12 +29,9 @@ class KernRe:
> >>          """
> >>          Adds a new regex or re-use it from the cache.
> >>          """
> >> -
> >> -        if string in re_cache:
> >> -            self.regex = re_cache[string]
> >> -        else:
> >> +        self.regex = re_cache.get(string, None)  
> >
> > With get, None is default...
> >  
> >> +        if not self.regex:
> >>              self.regex = re.compile(string, flags=flags)  
> >
> > ... yet, as you're using get, better to code it as:
> >
> > 	self.regex = re_cache.get(string, re.compile(string, flags=flags))  
> 
> ...but that will recompile the regex each time, defeating the purpose of
> the cache, no?

No. It should do exactly like the previous code:

- if re_cache[string] exists, it returns it. 
- Otherwise, it returns re.compile(string, flags=flags).

https://www.w3schools.com/python/ref_dictionary_get.asp


Thanks,
Mauro

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/7] docs: kdoc: micro-optimize KernRe
  2025-07-04  6:01       ` Mauro Carvalho Chehab
@ 2025-07-04 14:59         ` Jonathan Corbet
  2025-07-08  8:13           ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 15+ messages in thread
From: Jonathan Corbet @ 2025-07-04 14:59 UTC (permalink / raw)
  To: Mauro Carvalho Chehab; +Cc: linux-doc, linux-kernel, Akira Yokosawa

Mauro Carvalho Chehab <mchehab+huawei@kernel.org> writes:

> Em Thu, 03 Jul 2025 17:47:13 -0600
> Jonathan Corbet <corbet@lwn.net> escreveu:
>
>> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> writes:
>> 
>> > Em Thu,  3 Jul 2025 12:43:58 -0600
>> > Jonathan Corbet <corbet@lwn.net> escreveu:
>> >  
>> >> Rework _add_regex() to avoid doing the lookup twice for the (hopefully
>> >> common) cache-hit case.
>> >> 
>> >> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
>> >> ---
>> >>  scripts/lib/kdoc/kdoc_re.py | 7 ++-----
>> >>  1 file changed, 2 insertions(+), 5 deletions(-)
>> >> 
>> >> diff --git a/scripts/lib/kdoc/kdoc_re.py b/scripts/lib/kdoc/kdoc_re.py
>> >> index e81695b273bf..612223e1e723 100644
>> >> --- a/scripts/lib/kdoc/kdoc_re.py
>> >> +++ b/scripts/lib/kdoc/kdoc_re.py
>> >> @@ -29,12 +29,9 @@ class KernRe:
>> >>          """
>> >>          Adds a new regex or re-use it from the cache.
>> >>          """
>> >> -
>> >> -        if string in re_cache:
>> >> -            self.regex = re_cache[string]
>> >> -        else:
>> >> +        self.regex = re_cache.get(string, None)  
>> >
>> > With get, None is default...
>> >  
>> >> +        if not self.regex:
>> >>              self.regex = re.compile(string, flags=flags)  
>> >
>> > ... yet, as you're using get, better to code it as:
>> >
>> > 	self.regex = re_cache.get(string, re.compile(string, flags=flags))  
>> 
>> ...but that will recompile the regex each time, defeating the purpose of
>> the cache, no?
>
> No. It should do exactly like the previous code:
>
> - if re_cache[string] exists, it returns it. 
> - Otherwise, it returns re.compile(string, flags=flags).
>
> https://www.w3schools.com/python/ref_dictionary_get.asp

The re.compile() call is evaluated before the call to get() - just like
it would be in C.  This is easy enough to prove to yourself in the REPL
if you doubt me...

Thanks,

jon

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/7] docs: kdoc: micro-optimize KernRe
  2025-07-04 14:59         ` Jonathan Corbet
@ 2025-07-08  8:13           ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 15+ messages in thread
From: Mauro Carvalho Chehab @ 2025-07-08  8:13 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: linux-doc, linux-kernel, Akira Yokosawa

Em Fri, 04 Jul 2025 08:59:45 -0600
Jonathan Corbet <corbet@lwn.net> escreveu:

> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> writes:
> 
> > Em Thu, 03 Jul 2025 17:47:13 -0600
> > Jonathan Corbet <corbet@lwn.net> escreveu:
> >  
> >> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> writes:
> >>   
> >> > Em Thu,  3 Jul 2025 12:43:58 -0600
> >> > Jonathan Corbet <corbet@lwn.net> escreveu:
> >> >    
> >> >> Rework _add_regex() to avoid doing the lookup twice for the (hopefully
> >> >> common) cache-hit case.
> >> >> 
> >> >> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
> >> >> ---
> >> >>  scripts/lib/kdoc/kdoc_re.py | 7 ++-----
> >> >>  1 file changed, 2 insertions(+), 5 deletions(-)
> >> >> 
> >> >> diff --git a/scripts/lib/kdoc/kdoc_re.py b/scripts/lib/kdoc/kdoc_re.py
> >> >> index e81695b273bf..612223e1e723 100644
> >> >> --- a/scripts/lib/kdoc/kdoc_re.py
> >> >> +++ b/scripts/lib/kdoc/kdoc_re.py
> >> >> @@ -29,12 +29,9 @@ class KernRe:
> >> >>          """
> >> >>          Adds a new regex or re-use it from the cache.
> >> >>          """
> >> >> -
> >> >> -        if string in re_cache:
> >> >> -            self.regex = re_cache[string]
> >> >> -        else:
> >> >> +        self.regex = re_cache.get(string, None)    
> >> >
> >> > With get, None is default...
> >> >    
> >> >> +        if not self.regex:
> >> >>              self.regex = re.compile(string, flags=flags)    
> >> >
> >> > ... yet, as you're using get, better to code it as:
> >> >
> >> > 	self.regex = re_cache.get(string, re.compile(string, flags=flags))    
> >> 
> >> ...but that will recompile the regex each time, defeating the purpose of
> >> the cache, no?  
> >
> > No. It should do exactly like the previous code:
> >
> > - if re_cache[string] exists, it returns it. 
> > - Otherwise, it returns re.compile(string, flags=flags).
> >
> > https://www.w3schools.com/python/ref_dictionary_get.asp  
> 
> The re.compile() call is evaluated before the call to get() - just like
> it would be in C.  This is easy enough to prove to yourself in the REPL
> if you doubt me...

You're right!

Tested with the small code snippet:

	# test.py
	inner called
	Inner will be called: True
	inner called
	Inner should  not be called: False

I guess I expected too much from python's optimizer ;-) My fault.

Your patch looks OK to me.

Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

-

As reference, this was the test code

#!/usr/bin/env python3

def inner():
   print("inner called")

   return True

c = {}

print(f"Inner will be called: {c.get('a', inner())}")

c = { "a": "False"}

print(f"Inner should  not be called: {c.get('a', inner())}")




Thanks,
Mauro

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 3/7] docs: kdoc: remove the brcount floor in process_proto_type()
  2025-07-03 18:43 [PATCH v2 0/7] Further kernel-doc tweakery Jonathan Corbet
  2025-07-03 18:43 ` [PATCH v2 1/7] docs: kdoc: don't reinvent string.strip() Jonathan Corbet
  2025-07-03 18:43 ` [PATCH v2 2/7] docs: kdoc: micro-optimize KernRe Jonathan Corbet
@ 2025-07-03 18:43 ` Jonathan Corbet
  2025-07-03 18:44 ` [PATCH v2 4/7] docs: kdoc: rework type prototype parsing Jonathan Corbet
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Jonathan Corbet @ 2025-07-03 18:43 UTC (permalink / raw)
  To: linux-doc
  Cc: linux-kernel, Mauro Carvalho Chehab, Akira Yokosawa,
	Jonathan Corbet

Putting the floor under brcount does not change the output in any way, just
remove it.

Change the termination test from ==0 to <=0 to prevent infinite loops in
case somebody does something truly wacko in the code.

Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 scripts/lib/kdoc/kdoc_parser.py | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/scripts/lib/kdoc/kdoc_parser.py b/scripts/lib/kdoc/kdoc_parser.py
index d9ff2d066160..935f2a3c4b47 100644
--- a/scripts/lib/kdoc/kdoc_parser.py
+++ b/scripts/lib/kdoc/kdoc_parser.py
@@ -1609,9 +1609,7 @@ class KernelDoc:
                 self.entry.brcount += r.group(2).count('{')
                 self.entry.brcount -= r.group(2).count('}')
 
-                self.entry.brcount = max(self.entry.brcount, 0)
-
-                if r.group(2) == ';' and self.entry.brcount == 0:
+                if r.group(2) == ';' and self.entry.brcount <= 0:
                     self.dump_declaration(ln, self.entry.prototype)
                     self.reset_state(ln)
                     break
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 4/7] docs: kdoc: rework type prototype parsing
  2025-07-03 18:43 [PATCH v2 0/7] Further kernel-doc tweakery Jonathan Corbet
                   ` (2 preceding siblings ...)
  2025-07-03 18:43 ` [PATCH v2 3/7] docs: kdoc: remove the brcount floor in process_proto_type() Jonathan Corbet
@ 2025-07-03 18:44 ` Jonathan Corbet
  2025-07-03 18:44 ` [PATCH v2 5/7] docs: kdoc: some tweaks to process_proto_function() Jonathan Corbet
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Jonathan Corbet @ 2025-07-03 18:44 UTC (permalink / raw)
  To: linux-doc
  Cc: linux-kernel, Mauro Carvalho Chehab, Akira Yokosawa,
	Jonathan Corbet

process_proto_type() is using a complex regex and a "while True" loop to
split a declaration into chunks and, in the end, count brackets.  Switch to
using a simpler regex to just do the split directly, and handle each chunk
as it comes.  The result is, IMO, easier to understand and reason about.

The old algorithm would occasionally elide the space between function
parameters; see struct rng_alg->generate(), foe example.  The only output
difference is to not elide that space, which is more correct.

Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 scripts/lib/kdoc/kdoc_parser.py | 43 +++++++++++++++++++--------------
 1 file changed, 25 insertions(+), 18 deletions(-)

diff --git a/scripts/lib/kdoc/kdoc_parser.py b/scripts/lib/kdoc/kdoc_parser.py
index 935f2a3c4b47..61da297df623 100644
--- a/scripts/lib/kdoc/kdoc_parser.py
+++ b/scripts/lib/kdoc/kdoc_parser.py
@@ -1594,30 +1594,37 @@ class KernelDoc:
 
         # Strip C99-style comments and surrounding whitespace
         line = KernRe(r"//.*$", re.S).sub('', line).strip()
+        if not line:
+            return # nothing to see here
 
         # To distinguish preprocessor directive from regular declaration later.
         if line.startswith('#'):
             line += ";"
-
-        r = KernRe(r'([^\{\};]*)([\{\};])(.*)')
-        while True:
-            if r.search(line):
-                if self.entry.prototype:
-                    self.entry.prototype += " "
-                self.entry.prototype += r.group(1) + r.group(2)
-
-                self.entry.brcount += r.group(2).count('{')
-                self.entry.brcount -= r.group(2).count('}')
-
-                if r.group(2) == ';' and self.entry.brcount <= 0:
+        #
+        # Split the declaration on any of { } or ;, and accumulate pieces
+        # until we hit a semicolon while not inside {brackets}
+        #
+        r = KernRe(r'(.*?)([{};])')
+        for chunk in r.split(line):
+            if chunk:  # Ignore empty matches
+                self.entry.prototype += chunk
+                #
+                # This cries out for a match statement ... someday after we can
+                # drop Python 3.9 ...
+                #
+                if chunk == '{':
+                    self.entry.brcount += 1
+                elif chunk == '}':
+                    self.entry.brcount -= 1
+                elif chunk == ';' and self.entry.brcount <= 0:
                     self.dump_declaration(ln, self.entry.prototype)
                     self.reset_state(ln)
-                    break
-
-                line = r.group(3)
-            else:
-                self.entry.prototype += line
-                break
+                    return
+        #
+        # We hit the end of the line while still in the declaration; put
+        # in a space to represent the newline.
+        #
+        self.entry.prototype += ' '
 
     def process_proto(self, ln, line):
         """STATE_PROTO: reading a function/whatever prototype."""
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 5/7] docs: kdoc: some tweaks to process_proto_function()
  2025-07-03 18:43 [PATCH v2 0/7] Further kernel-doc tweakery Jonathan Corbet
                   ` (3 preceding siblings ...)
  2025-07-03 18:44 ` [PATCH v2 4/7] docs: kdoc: rework type prototype parsing Jonathan Corbet
@ 2025-07-03 18:44 ` Jonathan Corbet
  2025-07-03 18:44 ` [PATCH v2 6/7] docs: kdoc: Remove a Python 2 comment Jonathan Corbet
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Jonathan Corbet @ 2025-07-03 18:44 UTC (permalink / raw)
  To: linux-doc
  Cc: linux-kernel, Mauro Carvalho Chehab, Akira Yokosawa,
	Jonathan Corbet

Add a set of comments to process_proto_function and reorganize the logic
slightly; no functional change.

Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 scripts/lib/kdoc/kdoc_parser.py | 43 ++++++++++++++++++---------------
 1 file changed, 24 insertions(+), 19 deletions(-)

diff --git a/scripts/lib/kdoc/kdoc_parser.py b/scripts/lib/kdoc/kdoc_parser.py
index 61da297df623..d5ef3ce87438 100644
--- a/scripts/lib/kdoc/kdoc_parser.py
+++ b/scripts/lib/kdoc/kdoc_parser.py
@@ -1553,39 +1553,44 @@ class KernelDoc:
         """Ancillary routine to process a function prototype"""
 
         # strip C99-style comments to end of line
-        r = KernRe(r"\/\/.*$", re.S)
-        line = r.sub('', line)
-
+        line = KernRe(r"\/\/.*$", re.S).sub('', line)
+        #
+        # Soak up the line's worth of prototype text, stopping at { or ; if present.
+        #
         if KernRe(r'\s*#\s*define').match(line):
             self.entry.prototype = line
-        elif line.startswith('#'):
-            # Strip other macros like #ifdef/#ifndef/#endif/...
-            pass
-        else:
+        elif not line.startswith('#'):   # skip other preprocessor stuff
             r = KernRe(r'([^\{]*)')
             if r.match(line):
                 self.entry.prototype += r.group(1) + " "
-
+        #
+        # If we now have the whole prototype, clean it up and declare victory.
+        #
         if '{' in line or ';' in line or KernRe(r'\s*#\s*define').match(line):
             # strip comments and surrounding spaces
-            r = KernRe(r'/\*.*\*/')
-            self.entry.prototype = r.sub('', self.entry.prototype).strip()
-
+            self.entry.prototype = KernRe(r'/\*.*\*/').sub('', self.entry.prototype).strip()
+            #
             # Handle self.entry.prototypes for function pointers like:
             #       int (*pcs_config)(struct foo)
-
+            # by turning it into
+            #	    int pcs_config(struct foo)
+            #
             r = KernRe(r'^(\S+\s+)\(\s*\*(\S+)\)')
             self.entry.prototype = r.sub(r'\1\2', self.entry.prototype)
-
+            #
+            # Handle special declaration syntaxes
+            #
             if 'SYSCALL_DEFINE' in self.entry.prototype:
                 self.entry.prototype = self.syscall_munge(ln,
                                                           self.entry.prototype)
-
-            r = KernRe(r'TRACE_EVENT|DEFINE_EVENT|DEFINE_SINGLE_EVENT')
-            if r.search(self.entry.prototype):
-                self.entry.prototype = self.tracepoint_munge(ln,
-                                                             self.entry.prototype)
-
+            else:
+                r = KernRe(r'TRACE_EVENT|DEFINE_EVENT|DEFINE_SINGLE_EVENT')
+                if r.search(self.entry.prototype):
+                    self.entry.prototype = self.tracepoint_munge(ln,
+                                                                 self.entry.prototype)
+            #
+            # ... and we're done
+            #
             self.dump_function(ln, self.entry.prototype)
             self.reset_state(ln)
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 6/7] docs: kdoc: Remove a Python 2 comment
  2025-07-03 18:43 [PATCH v2 0/7] Further kernel-doc tweakery Jonathan Corbet
                   ` (4 preceding siblings ...)
  2025-07-03 18:44 ` [PATCH v2 5/7] docs: kdoc: some tweaks to process_proto_function() Jonathan Corbet
@ 2025-07-03 18:44 ` Jonathan Corbet
  2025-07-03 18:44 ` [PATCH v2 7/7] docs: kdoc: pretty up dump_enum() Jonathan Corbet
  2025-07-04  0:45 ` [PATCH v2 0/7] Further kernel-doc tweakery Akira Yokosawa
  7 siblings, 0 replies; 15+ messages in thread
From: Jonathan Corbet @ 2025-07-03 18:44 UTC (permalink / raw)
  To: linux-doc
  Cc: linux-kernel, Mauro Carvalho Chehab, Akira Yokosawa,
	Jonathan Corbet, Jani Nikula

We no longer support Python 2 in the docs build chain at all, so we
certainly do not need to admonish folks to keep this file working with it.

Cc: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/sphinx/kerneldoc.py | 2 --
 1 file changed, 2 deletions(-)

diff --git a/Documentation/sphinx/kerneldoc.py b/Documentation/sphinx/kerneldoc.py
index 51a2793dc8e2..2586b4d4e494 100644
--- a/Documentation/sphinx/kerneldoc.py
+++ b/Documentation/sphinx/kerneldoc.py
@@ -25,8 +25,6 @@
 # Authors:
 #    Jani Nikula <jani.nikula@intel.com>
 #
-# Please make sure this works on both python2 and python3.
-#
 
 import codecs
 import os
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 7/7] docs: kdoc: pretty up dump_enum()
  2025-07-03 18:43 [PATCH v2 0/7] Further kernel-doc tweakery Jonathan Corbet
                   ` (5 preceding siblings ...)
  2025-07-03 18:44 ` [PATCH v2 6/7] docs: kdoc: Remove a Python 2 comment Jonathan Corbet
@ 2025-07-03 18:44 ` Jonathan Corbet
  2025-07-04  0:45 ` [PATCH v2 0/7] Further kernel-doc tweakery Akira Yokosawa
  7 siblings, 0 replies; 15+ messages in thread
From: Jonathan Corbet @ 2025-07-03 18:44 UTC (permalink / raw)
  To: linux-doc
  Cc: linux-kernel, Mauro Carvalho Chehab, Akira Yokosawa,
	Jonathan Corbet

Add some comments to dump_enum to help the next person who has to figure
out what it is actually doing.

Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
---
 scripts/lib/kdoc/kdoc_parser.py | 39 +++++++++++++++++++++------------
 1 file changed, 25 insertions(+), 14 deletions(-)

diff --git a/scripts/lib/kdoc/kdoc_parser.py b/scripts/lib/kdoc/kdoc_parser.py
index d5ef3ce87438..831f061f61b8 100644
--- a/scripts/lib/kdoc/kdoc_parser.py
+++ b/scripts/lib/kdoc/kdoc_parser.py
@@ -860,39 +860,48 @@ class KernelDoc:
         # Strip #define macros inside enums
         proto = KernRe(r'#\s*((define|ifdef|if)\s+|endif)[^;]*;', flags=re.S).sub('', proto)
 
-        members = None
-        declaration_name = None
-
+        #
+        # Parse out the name and members of the enum.  Typedef form first.
+        #
         r = KernRe(r'typedef\s+enum\s*\{(.*)\}\s*(\w*)\s*;')
         if r.search(proto):
             declaration_name = r.group(2)
             members = r.group(1).rstrip()
+        #
+        # Failing that, look for a straight enum
+        #
         else:
             r = KernRe(r'enum\s+(\w*)\s*\{(.*)\}')
             if r.match(proto):
                 declaration_name = r.group(1)
                 members = r.group(2).rstrip()
-
-        if not members:
-            self.emit_msg(ln, f"{proto}: error: Cannot parse enum!")
-            return
-
+        #
+        # OK, this isn't going to work.
+        #
+            else:
+                self.emit_msg(ln, f"{proto}: error: Cannot parse enum!")
+                return
+        #
+        # Make sure we found what we were expecting.
+        #
         if self.entry.identifier != declaration_name:
             if self.entry.identifier == "":
                 self.emit_msg(ln,
                               f"{proto}: wrong kernel-doc identifier on prototype")
             else:
                 self.emit_msg(ln,
-                              f"expecting prototype for enum {self.entry.identifier}. Prototype was for enum {declaration_name} instead")
+                              f"expecting prototype for enum {self.entry.identifier}. "
+                              f"Prototype was for enum {declaration_name} instead")
             return
 
         if not declaration_name:
             declaration_name = "(anonymous)"
-
+        #
+        # Parse out the name of each enum member, and verify that we
+        # have a description for it.
+        #
         member_set = set()
-
-        members = KernRe(r'\([^;]*?[\)]').sub('', members)
-
+        members = KernRe(r'\([^;)]*\)').sub('', members)
         for arg in members.split(','):
             if not arg:
                 continue
@@ -903,7 +912,9 @@ class KernelDoc:
                 self.emit_msg(ln,
                               f"Enum value '{arg}' not described in enum '{declaration_name}'")
             member_set.add(arg)
-
+        #
+        # Ensure that every described member actually exists in the enum.
+        #
         for k in self.entry.parameterdescs:
             if k not in member_set:
                 self.emit_msg(ln,
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/7] Further kernel-doc tweakery
  2025-07-03 18:43 [PATCH v2 0/7] Further kernel-doc tweakery Jonathan Corbet
                   ` (6 preceding siblings ...)
  2025-07-03 18:44 ` [PATCH v2 7/7] docs: kdoc: pretty up dump_enum() Jonathan Corbet
@ 2025-07-04  0:45 ` Akira Yokosawa
  7 siblings, 0 replies; 15+ messages in thread
From: Akira Yokosawa @ 2025-07-04  0:45 UTC (permalink / raw)
  To: Jonathan Corbet, linux-doc; +Cc: linux-kernel, Mauro Carvalho Chehab

On Thu,  3 Jul 2025 12:43:56 -0600, Jonathan Corbet wrote:
> This is a set of miscellaneous improvements, finishing my pass over the
> first parsing pass and getting into the second ("dump_*") pass.
> 
> Changes from v1:
>  - Apply tags
>  - Rework the KernRe microoptimization to avoid exceptions
>  - Fix the stupid white-space error in patch 7

Tested-by: Akira Yokosawa <akiyks@gmail.com>

Thanks!

> 
> Jonathan Corbet (7):
>   docs: kdoc: don't reinvent string.strip()
>   docs: kdoc: micro-optimize KernRe
>   docs: kdoc: remove the brcount floor in process_proto_type()
>   docs: kdoc: rework type prototype parsing
>   docs: kdoc: some tweaks to process_proto_function()
>   docs: kdoc: Remove a Python 2 comment
>   docs: kdoc: pretty up dump_enum()
> 
>  Documentation/sphinx/kerneldoc.py |   2 -
>  scripts/lib/kdoc/kdoc_parser.py   | 150 +++++++++++++++---------------
>  scripts/lib/kdoc/kdoc_re.py       |   7 +-
>  3 files changed, 79 insertions(+), 80 deletions(-)
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2025-07-08  8:13 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-03 18:43 [PATCH v2 0/7] Further kernel-doc tweakery Jonathan Corbet
2025-07-03 18:43 ` [PATCH v2 1/7] docs: kdoc: don't reinvent string.strip() Jonathan Corbet
2025-07-03 18:43 ` [PATCH v2 2/7] docs: kdoc: micro-optimize KernRe Jonathan Corbet
2025-07-03 22:31   ` Mauro Carvalho Chehab
2025-07-03 22:32     ` Mauro Carvalho Chehab
2025-07-03 23:47     ` Jonathan Corbet
2025-07-04  6:01       ` Mauro Carvalho Chehab
2025-07-04 14:59         ` Jonathan Corbet
2025-07-08  8:13           ` Mauro Carvalho Chehab
2025-07-03 18:43 ` [PATCH v2 3/7] docs: kdoc: remove the brcount floor in process_proto_type() Jonathan Corbet
2025-07-03 18:44 ` [PATCH v2 4/7] docs: kdoc: rework type prototype parsing Jonathan Corbet
2025-07-03 18:44 ` [PATCH v2 5/7] docs: kdoc: some tweaks to process_proto_function() Jonathan Corbet
2025-07-03 18:44 ` [PATCH v2 6/7] docs: kdoc: Remove a Python 2 comment Jonathan Corbet
2025-07-03 18:44 ` [PATCH v2 7/7] docs: kdoc: pretty up dump_enum() Jonathan Corbet
2025-07-04  0:45 ` [PATCH v2 0/7] Further kernel-doc tweakery Akira Yokosawa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).