From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
To: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
Akira Yokosawa <akiyks@gmail.com>
Subject: Re: [PATCH 12/12] docs: kdoc: Improve the output text accumulation
Date: Thu, 10 Jul 2025 10:19:31 +0200 [thread overview]
Message-ID: <20250710101931.202953d1@foz.lan> (raw)
In-Reply-To: <20250710091352.4ae01211@foz.lan>
Em Thu, 10 Jul 2025 09:13:52 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> escreveu:
> Em Thu, 10 Jul 2025 08:41:19 +0200
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> escreveu:
>
> > Em Wed, 2 Jul 2025 16:35:24 -0600
> > Jonathan Corbet <corbet@lwn.net> escreveu:
> >
> > > Building strings with repeated concatenation is somewhat inefficient in
> > > Python; it is better to make a list and glom them all together at the end.
> > > Add a small set of methods to the OutputFormat superclass to manage the
> > > output string, and use them throughout.
> > >
> > > Signed-off-by: Jonathan Corbet <corbet@lwn.net>
> >
> > The patch looks good to me. Just a minor nit below.
> >
> > > ---
> > > scripts/lib/kdoc/kdoc_output.py | 185 +++++++++++++++++---------------
> > > 1 file changed, 98 insertions(+), 87 deletions(-)
> > >
> > > diff --git a/scripts/lib/kdoc/kdoc_output.py b/scripts/lib/kdoc/kdoc_output.py
> > > index ea8914537ba0..d4aabdaa9c51 100644
> > > --- a/scripts/lib/kdoc/kdoc_output.py
> > > +++ b/scripts/lib/kdoc/kdoc_output.py
> > > @@ -73,7 +73,19 @@ class OutputFormat:
> > > self.config = None
> > > self.no_doc_sections = False
> > >
> > > - self.data = ""
> > > + #
> > > + # Accumulation and management of the output text.
> > > + #
> > > + def reset_output(self):
> > > + self._output = []
> > > +
> > > + def emit(self, text):
> > > + """Add a string to out output text"""
> > > + self._output.append(text)
> > > +
> > > + def output(self):
> > > + """Obtain the accumulated output text"""
> > > + return ''.join(self._output)
> >
> > I would prefer to use a more Pythonic name for this function:
> >
> > def __str__(self)
> >
> > This way, all it takes to get the final string is to use str():
> >
> > out_str = str(out)
> >
> > With that:
> >
> > Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
>
>
> Hmm... actually, I would code it on a different way, using something like:
>
> class OutputString:
> def __init__(self):
> """Initialize internal list"""
> self._output = []
>
> # Probably not needed - The code can simply do, instead:
> # a = OutputString() to create a new string.
> def reset(self):
> """Reset the output text"""
> self._output = []
>
> def __add__(self, text):
> """Add a string to out output text"""
> if not isinstance(text, str):
> raise TypeError("Can only append strings")
> self._output.append(text)
> return self
>
> def __str__(self):
> return ''.join(self._output)
>
> # and, if needed, add a getter/setter:
>
> @property
> def data(self):
> """Getter for the current output"""
> return ''.join(self._output)
>
> @data.setter
> def data(self, new_value):
> if isinstance(new_value, str):
> self._output = [new_value]
> elif isinstance(new_value, list):
> self._output = new_value
> else:
> raise TypeError("Value should be either list or string")
>
> That would allow things like:
>
> out = OutputString()
> out = out + "Foo" + " " + "Bar"
> print(out)
>
> out = OutputString()
> out += "Foo"
> out += " "
> out += "Bar"
> return(str(out))
>
> and won't require much changes at the output logic, and IMO will
> provide a cleaner code.
>
> Thanks,
> Mauro
Heh, on those times where LLM can quickly code trivial things for us,
I actually decided to test 3 different variants:
- using string +=
- using list append
- using __add__
- using __iadd__
Except if the LLM-generated did something wrong (I double checked, and
was unable to find any issues), the results on Python 3.13.5 are:
$ /tmp/bench.py
Benchmarking 1,000 ops × 1000 strings = 1,000,000 appends
Run str+= ExplicitList __add__ __iadd__
------------------------------------------------------------
1 25.26 29.44 53.42 50.71
2 29.34 29.35 53.45 50.61
3 29.44 29.56 53.41 50.67
4 29.28 29.23 53.26 50.64
5 29.28 29.20 45.90 40.47
6 23.53 23.62 42.74 40.61
7 23.43 23.76 42.97 40.78
8 23.51 23.59 42.67 40.61
9 23.43 23.52 42.77 40.72
10 23.53 23.54 42.78 40.67
11 23.83 23.63 42.98 40.87
12 23.49 23.45 42.67 40.53
13 23.43 23.69 42.75 40.66
14 23.47 23.49 42.70 40.56
15 23.44 23.63 42.72 40.52
16 23.51 23.56 42.65 40.66
17 23.48 23.60 42.86 40.81
18 23.67 23.53 42.73 40.59
19 23.75 23.62 42.78 40.58
20 23.68 23.55 42.77 40.54
21 23.65 23.67 42.76 40.59
22 23.73 23.49 42.78 40.61
23 23.61 23.59 42.78 40.58
24 23.66 23.51 42.73 40.55
------------------------------------------------------------
Avg 24.60 24.78 44.67 42.30
Summary:
ExplicitList : 100.74% slower than str+=
__add__ : 181.56% slower than str+=
__iadd__ : 171.93% slower than str+=
(running it a couple of times sometimes it sometimes show list
addition a little bit better, bu all at the +/- 1% range)
In practice, it means that my suggestion of using __add__ (or even
using the __iadd__ variant) was not good, but it also showed
that Python 3.13 implementation is actually very efficient
with str += operations.
With that, I would just drop this patch, as the performance is
almost identical, and using "emit()" instead of "+=" IMO makes
the code less clear.
-
Btw, with Python 3.9, "".join(list) is a lot worse than str += :
$ python3.9 /tmp/bench.py
Benchmarking 1,000 ops × 1000 strings = 1,000,000 appends
Run str+= ExplicitList __add__ __iadd__
------------------------------------------------------------
1 28.27 87.24 96.03 88.81
2 32.76 87.35 87.40 88.92
3 32.69 85.98 73.01 70.87
4 26.28 69.80 70.62 71.90
5 27.21 70.54 71.04 72.00
6 27.77 70.06 70.22 70.92
7 27.03 69.75 70.30 70.89
8 33.31 72.63 70.57 70.59
9 26.33 70.15 70.27 70.97
10 26.29 69.84 71.60 70.94
11 26.59 69.60 70.16 71.26
12 26.38 69.57 71.64 70.95
13 26.41 69.89 70.11 70.85
14 26.38 69.86 70.36 70.93
15 26.43 69.57 70.18 70.90
16 26.38 70.04 70.26 71.19
17 26.40 70.02 80.50 71.01
18 26.41 71.74 80.39 71.90
19 28.06 69.60 71.95 70.88
20 28.28 69.90 71.12 71.07
21 26.34 69.74 72.42 71.02
22 26.33 69.86 70.25 70.97
23 26.40 69.78 71.64 71.10
24 26.44 69.73 70.23 70.83
------------------------------------------------------------
Avg 27.55 72.18 73.43 72.57
Summary:
ExplicitList : 262.00% slower than str+=
__add__ : 266.54% slower than str+=
__iadd__ : 263.42% slower than str+=
Thanks,
Mauro
---
#!/usr/bin/env python3
import timeit
class ExplicitList:
def __init__(self):
self._output = []
def emit(self, text):
self._output.append(text)
def output(self):
return ''.join(self._output)
class OutputStringAdd:
def __init__(self):
self._output = []
def __add__(self, text):
self._output.append(text)
return self
def __str__(self):
return ''.join(self._output)
class OutputStringIAdd:
def __init__(self):
self._output = []
def __iadd__(self, text):
self._output.append(text)
return self
def __str__(self):
return ''.join(self._output)
def calculate_comparison(base_time, compare_time):
"""Returns tuple of (is_faster, percentage)"""
if compare_time < base_time:
return (True, (1 - compare_time/base_time)*100)
else:
return (False, (compare_time/base_time)*100)
def benchmark():
N = 1000 # Operations
STRINGS_PER_OP = 1000
REPEATS = 24
# Generate test data (1000 unique 10-character strings)
test_strings = [f"string_{i:03d}" for i in range(STRINGS_PER_OP)]
print(f"Benchmarking {N:,} ops × {STRINGS_PER_OP} strings = {N*STRINGS_PER_OP:,} appends\n")
headers = ['Run', 'str+=', 'ExplicitList', '__add__', '__iadd__']
print(f"{headers[0]:<6} {headers[1]:<12} {headers[2]:<12} {headers[3]:<12} {headers[4]:<12}")
print("-" * 60)
results = []
for i in range(REPEATS):
# Benchmark normal string +=
t_str = timeit.timeit(
'result = ""\nfor s in test_strings: result += s',
globals={'test_strings': test_strings},
number=N
) * 1000
# Benchmark ExplicitList
t_explicit = timeit.timeit(
'obj = ExplicitList()\nfor s in test_strings: obj.emit(s)',
globals={'test_strings': test_strings, 'ExplicitList': ExplicitList},
number=N
) * 1000
# Benchmark __add__ version
t_add = timeit.timeit(
'obj = OutputStringAdd()\nfor s in test_strings: obj += s',
globals={'test_strings': test_strings, 'OutputStringAdd': OutputStringAdd},
number=N
) * 1000
# Benchmark __iadd__ version
t_iadd = timeit.timeit(
'obj = OutputStringIAdd()\nfor s in test_strings: obj += s',
globals={'test_strings': test_strings, 'OutputStringIAdd': OutputStringIAdd},
number=N
) * 1000
results.append((t_str, t_explicit, t_add, t_iadd))
print(f"{i+1:<6} {t_str:<12.2f} {t_explicit:<12.2f} {t_add:<12.2f} {t_iadd:<12.2f}")
# Calculate averages
avg_str = sum(r[0] for r in results) / REPEATS
avg_explicit = sum(r[1] for r in results) / REPEATS
avg_add = sum(r[2] for r in results) / REPEATS
avg_iadd = sum(r[3] for r in results) / REPEATS
print("-" * 60)
print(f"{'Avg':<6} {avg_str:<12.2f} {avg_explicit:<12.2f} {avg_add:<12.2f} {avg_iadd:<12.2f}")
print()
print("Summary:")
# Calculate and print comparisons
for name, time in [("ExplicitList", avg_explicit),
("__add__", avg_add),
("__iadd__", avg_iadd)]:
is_faster, percentage = calculate_comparison(avg_str, time)
if is_faster:
print(f"{name:<12} : {percentage:.2f}% faster than str+=")
else:
print(f"{name:<12} : {percentage:.2f}% slower than str+=")
if __name__ == "__main__":
benchmark()
next prev parent reply other threads:[~2025-07-10 8:19 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-02 22:35 [PATCH 00/12] [PATCH 00/11] Thrash up the parser/output interface Jonathan Corbet
2025-07-02 22:35 ` [PATCH 01/12] docs: kdoc; Add a rudimentary class to represent output items Jonathan Corbet
2025-07-10 5:28 ` Mauro Carvalho Chehab
2025-07-02 22:35 ` [PATCH 02/12] docs: kdoc: simplify the output-item passing Jonathan Corbet
2025-07-10 5:29 ` Mauro Carvalho Chehab
2025-07-02 22:35 ` [PATCH 03/12] docs: kdoc: drop "sectionlist" Jonathan Corbet
2025-07-09 16:27 ` Mauro Carvalho Chehab
2025-07-02 22:35 ` [PATCH 04/12] docs: kdoc: Centralize handling of the item section list Jonathan Corbet
2025-07-10 5:45 ` Mauro Carvalho Chehab
2025-07-10 13:25 ` Jonathan Corbet
2025-07-02 22:35 ` [PATCH 05/12] docs: kdoc: remove the "struct_actual" machinery Jonathan Corbet
2025-07-10 6:11 ` Mauro Carvalho Chehab
2025-07-02 22:35 ` [PATCH 06/12] docs: kdoc: use self.entry.parameterlist directly in check_sections() Jonathan Corbet
2025-07-10 6:12 ` Mauro Carvalho Chehab
2025-07-02 22:35 ` [PATCH 07/12] docs: kdoc: Coalesce parameter-list handling Jonathan Corbet
2025-07-10 6:20 ` Mauro Carvalho Chehab
2025-07-02 22:35 ` [PATCH 08/12] docs: kdoc: Regularize the use of the declaration name Jonathan Corbet
2025-07-10 6:22 ` Mauro Carvalho Chehab
2025-07-02 22:35 ` [PATCH 09/12] docs: kdoc: straighten up dump_declaration() Jonathan Corbet
2025-07-10 6:25 ` Mauro Carvalho Chehab
2025-07-10 13:27 ` Jonathan Corbet
2025-07-10 22:13 ` Mauro Carvalho Chehab
2025-07-02 22:35 ` [PATCH 10/12] docs: kdoc: directly access the always-there KdocItem fields Jonathan Corbet
2025-07-10 6:27 ` Mauro Carvalho Chehab
2025-07-02 22:35 ` [PATCH 11/12] docs: kdoc: clean up check_sections() Jonathan Corbet
2025-07-10 6:29 ` Mauro Carvalho Chehab
2025-07-02 22:35 ` [PATCH 12/12] docs: kdoc: Improve the output text accumulation Jonathan Corbet
2025-07-10 6:41 ` Mauro Carvalho Chehab
2025-07-10 7:13 ` Mauro Carvalho Chehab
2025-07-10 8:19 ` Mauro Carvalho Chehab [this message]
2025-07-10 10:10 ` Mauro Carvalho Chehab
2025-07-10 10:31 ` Mauro Carvalho Chehab
2025-07-10 10:59 ` Mauro Carvalho Chehab
2025-07-10 23:30 ` Jonathan Corbet
2025-07-11 6:14 ` Mauro Carvalho Chehab
2025-07-11 12:49 ` Jonathan Corbet
2025-07-11 16:28 ` Mauro Carvalho Chehab
2025-07-11 16:39 ` Jonathan Corbet
2025-07-03 2:07 ` [PATCH 00/12] [PATCH 00/11] Thrash up the parser/output interface Yanteng Si
2025-07-09 15:29 ` Jonathan Corbet
2025-07-09 16:21 ` Mauro Carvalho Chehab
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250710101931.202953d1@foz.lan \
--to=mchehab+huawei@kernel.org \
--cc=akiyks@gmail.com \
--cc=corbet@lwn.net \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.