Re: [PATCH 12/12] docs: kdoc: Improve the output text accumulation

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
To: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	Akira Yokosawa <akiyks@gmail.com>
Subject: Re: [PATCH 12/12] docs: kdoc: Improve the output text accumulation
Date: Thu, 10 Jul 2025 12:31:55 +0200	[thread overview]
Message-ID: <20250710123155.1045831d@foz.lan> (raw)
In-Reply-To: <20250710121033.42db5ef3@foz.lan>

Em Thu, 10 Jul 2025 12:10:33 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> escreveu:

> From the above:
> 
> - It is not worth applying patch 12/12 as it makes the code slower;
> - Python 3.13t (no-GIL version) had very bad results. It seems it
>   still requires optimization;
> - Python 3.9 is a lot worse (140% to 190%) when using list append;
> - when there are not many concats, Python 3.13 is about 15% slower
>   with lists than concat strings. It only approaches str concat
>   when the number of concats is high.
> 
> With the above, clearly str += is faster than list append.
> 
> So, except if I did something wrong on this benchmark script, please
> don't apply patch 12/12.

And I did: I forgot the final line at the concat code to get the
result as strings.

For explicit list:
	result = obj.output()

For implicit ones:
	result = str(obj)

Yet, the conclusion is similar. With Python 3.13:

    $ for i in python3.13; do for j in 1 10 100 1000; do $i /tmp/bench.py $((1000000/$j)) $j 1; done; done
    1 strings in a loop with 1000000 interactions, repeating 24 times
        str +=       : time: 41.42
        list join    : time: 127.33: 207.42% slower than str +=
    10 strings in a loop with 100000 interactions, repeating 24 times
        str +=       : time: 27.15
        list join    : time: 39.19: 44.36% slower than str +=
    100 strings in a loop with 10000 interactions, repeating 24 times
        str +=       : time: 24.84
        list join    : time: 30.70: 23.57% slower than str +=
    1000 strings in a loop with 1000 interactions, repeating 24 times
        str +=       : time: 21.84
        list join    : time: 27.85: 27.50% slower than str +=

Explict list concat was between ~30% to ~200% worse than str concat.


Thanks,
Mauro

---

#!/usr/bin/env python3

import argparse
import time
import sys

def benchmark_str_concat(test_strings, n_ops):
    start = time.time()
    for _ in range(n_ops):
        result = ""
        for s in test_strings:
            result += s
    return (time.time() - start) * 1000

def benchmark_explicit_list(test_strings, n_ops):
    class ExplicitList:
        def __init__(self):
            self._output = []

        def emit(self, text):
            self._output.append(text)

        def output(self):
            return ''.join(self._output)

    start = time.time()
    for _ in range(n_ops):
        obj = ExplicitList()
        for s in test_strings:
            obj.emit(s)

        result = obj.output()

    return (time.time() - start) * 1000

def benchmark_add_overload(test_strings, n_ops):
    class OutputStringAdd:
        def __init__(self):
            self._output = []

        def __add__(self, text):
            self._output.append(text)
            return self

        def __str__(self):
            return ''.join(self._output)

    start = time.time()
    for _ in range(n_ops):
        obj = OutputStringAdd()
        for s in test_strings:
            obj += s

        result = str(obj)

    return (time.time() - start) * 1000

def benchmark_iadd_overload(test_strings, n_ops):
    class OutputStringIAdd:
        def __init__(self):
            self._output = []

        def __iadd__(self, text):
            self._output.append(text)
            return self

        def __str__(self):
            return ''.join(self._output)

    start = time.time()
    for _ in range(n_ops):
        obj = OutputStringIAdd()
        for s in test_strings:
            obj += s
        result = str(obj)

    return (time.time() - start) * 1000

def calculate_comparison(base_time, compare_time):
    if compare_time < base_time:
        return (True, (1 - compare_time/base_time)*100)
    return (False, (compare_time/base_time - 1)*100)

def benchmark(num_reps, strings_per_run, repeats, detail):
    test_strings = [f"string_{i:03d}" for i in range(strings_per_run)]

    # Create benchmark execution order list
    benchmarks = [
        ("str +=", benchmark_str_concat),
        ("list join", benchmark_explicit_list),
        ("__add__", benchmark_add_overload),
        ("__iadd__", benchmark_iadd_overload)
    ]

    # Use all possible permutations of benchmark order to reduce any
    # noise due to CPU caches
    all_orders = [
        (0, 1, 2, 3), (0, 1, 3, 2), (0, 2, 1, 3), (0, 2, 3, 1),
        (0, 3, 1, 2), (0, 3, 2, 1), (1, 0, 2, 3), (1, 0, 3, 2),
        (1, 2, 0, 3), (1, 2, 3, 0), (1, 3, 0, 2), (1, 3, 2, 0),
        (2, 0, 1, 3), (2, 0, 3, 1), (2, 1, 0, 3), (2, 1, 3, 0),
        (2, 3, 0, 1), (2, 3, 1, 0), (3, 0, 1, 2), (3, 0, 2, 1),
        (3, 1, 0, 2), (3, 1, 2, 0), (3, 2, 0, 1), (3, 2, 1, 0)
    ]

    results = {}
    for name, _ in benchmarks:
        results[name] = 0

    # Warm-up phase to reduce caching issues
    for name, fn in benchmarks:
        fn(test_strings, 1)

    n_repeats = len(all_orders) * repeats
    print(f"    {strings_per_run} strings in a loop with {num_reps} interactions, repeating {n_repeats} times")

    # Actual benchmark starts here
    i = 0
    if detail:
        headers = ['Run'] + [name for name, _ in benchmarks]
        print()
        print(f"\t{headers[0]:<6} {headers[1]:<12} {headers[2]:<12} {headers[3]:<12} {headers[4]:<12}")
        print("\t" + "-" * 60)
    for _ in range(repeats):
        # Shuffle execution order each run
        for order in all_orders:
            run_results = {}
            for idx in order:
                name, func = benchmarks[idx]
                run_results[name] = func(test_strings, num_reps)
                results[name] += run_results[name]

            if detail:
                # Output results in consistent order
                print(f"\t{i+1:<6}", end=" ")
                for name, _ in benchmarks:
                    print(f"{run_results[name]:<12.2f}", end=" ")
                print()

            i += 1

    avg_results = {}
    for name, _ in benchmarks:
        avg_results[name] = results[name] / repeats / len(all_orders)

    if detail:
        print("\t" + "-" * 60)
        print(f"\t      ", end=" ")
        for name, _ in benchmarks:
            print(f"{avg_results[name]:<12.2f}", end=" ")
        print()
        print()

    ref = benchmarks.pop(0)

    print(f"\t{ref[0]:<12} : time: {avg_results[ref[0]]:3.2f}")
    for name, _ in benchmarks:
        is_faster, percentage = calculate_comparison(avg_results[ref[0]], avg_results[name])
        direction = "faster" if is_faster else "slower"
        print(f"\t{name:<12} : time: {avg_results[name]:3.2f}: {percentage:3.2f}% {direction} than {ref[0]}")



if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument('-d', '--detail', action='store_true',
                       help='Enable detailed output')

    args, remaining = parser.parse_known_args()

    # Then handle the positional arguments manually
    if len(remaining) != 3:
        print(f"Usage: {sys.argv[0]} [-d] <num_repetitions> <strings_per_op> <repeats>")
        sys.exit(1)

    num_reps = int(remaining[0])
    strings_per_op = int(remaining[1])
    repeats = int(remaining[2])

    num_reps = int(sys.argv[1])
    strings_per_op = int(sys.argv[2])
    repeats = int(sys.argv[3])

    benchmark(num_reps, strings_per_op, repeats, args.detail)

next prev parent reply	other threads:[~2025-07-10 10:31 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-02 22:35 [PATCH 00/12] [PATCH 00/11] Thrash up the parser/output interface Jonathan Corbet
2025-07-02 22:35 ` [PATCH 01/12] docs: kdoc; Add a rudimentary class to represent output items Jonathan Corbet
2025-07-10  5:28   ` Mauro Carvalho Chehab
2025-07-02 22:35 ` [PATCH 02/12] docs: kdoc: simplify the output-item passing Jonathan Corbet
2025-07-10  5:29   ` Mauro Carvalho Chehab
2025-07-02 22:35 ` [PATCH 03/12] docs: kdoc: drop "sectionlist" Jonathan Corbet
2025-07-09 16:27   ` Mauro Carvalho Chehab
2025-07-02 22:35 ` [PATCH 04/12] docs: kdoc: Centralize handling of the item section list Jonathan Corbet
2025-07-10  5:45   ` Mauro Carvalho Chehab
2025-07-10 13:25     ` Jonathan Corbet
2025-07-02 22:35 ` [PATCH 05/12] docs: kdoc: remove the "struct_actual" machinery Jonathan Corbet
2025-07-10  6:11   ` Mauro Carvalho Chehab
2025-07-02 22:35 ` [PATCH 06/12] docs: kdoc: use self.entry.parameterlist directly in check_sections() Jonathan Corbet
2025-07-10  6:12   ` Mauro Carvalho Chehab
2025-07-02 22:35 ` [PATCH 07/12] docs: kdoc: Coalesce parameter-list handling Jonathan Corbet
2025-07-10  6:20   ` Mauro Carvalho Chehab
2025-07-02 22:35 ` [PATCH 08/12] docs: kdoc: Regularize the use of the declaration name Jonathan Corbet
2025-07-10  6:22   ` Mauro Carvalho Chehab
2025-07-02 22:35 ` [PATCH 09/12] docs: kdoc: straighten up dump_declaration() Jonathan Corbet
2025-07-10  6:25   ` Mauro Carvalho Chehab
2025-07-10 13:27     ` Jonathan Corbet
2025-07-10 22:13       ` Mauro Carvalho Chehab
2025-07-02 22:35 ` [PATCH 10/12] docs: kdoc: directly access the always-there KdocItem fields Jonathan Corbet
2025-07-10  6:27   ` Mauro Carvalho Chehab
2025-07-02 22:35 ` [PATCH 11/12] docs: kdoc: clean up check_sections() Jonathan Corbet
2025-07-10  6:29   ` Mauro Carvalho Chehab
2025-07-02 22:35 ` [PATCH 12/12] docs: kdoc: Improve the output text accumulation Jonathan Corbet
2025-07-10  6:41   ` Mauro Carvalho Chehab
2025-07-10  7:13     ` Mauro Carvalho Chehab
2025-07-10  8:19       ` Mauro Carvalho Chehab
2025-07-10 10:10         ` Mauro Carvalho Chehab
2025-07-10 10:31           ` Mauro Carvalho Chehab [this message]
2025-07-10 10:59             ` Mauro Carvalho Chehab
2025-07-10 23:30         ` Jonathan Corbet
2025-07-11  6:14           ` Mauro Carvalho Chehab
2025-07-11 12:49             ` Jonathan Corbet
2025-07-11 16:28               ` Mauro Carvalho Chehab
2025-07-11 16:39                 ` Jonathan Corbet
2025-07-03  2:07 ` [PATCH 00/12] [PATCH 00/11] Thrash up the parser/output interface Yanteng Si
2025-07-09 15:29 ` Jonathan Corbet
2025-07-09 16:21   ` Mauro Carvalho Chehab

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250710123155.1045831d@foz.lan \
    --to=mchehab+huawei@kernel.org \
    --cc=akiyks@gmail.com \
    --cc=corbet@lwn.net \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.