All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Laight <david.laight.linux@gmail.com>
To: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Feng Jiang <jiangfeng@kylinos.cn>,
	Andy Shevchenko <andriy.shevchenko@intel.com>,
	pjw@kernel.org, palmer@dabbelt.com, aou@eecs.berkeley.edu,
	alex@ghiti.fr, akpm@linux-foundation.org, kees@kernel.org,
	andy@kernel.org, ebiggers@kernel.org, martin.petersen@oracle.com,
	ardb@kernel.org, charlie@rivosinc.com,
	conor.dooley@microchip.com, ajones@ventanamicro.com,
	linus.walleij@linaro.org, nathan@kernel.org,
	linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org,
	linux-hardening@vger.kernel.org
Subject: Re: [PATCH v3 0/8] riscv: optimize string functions and add kunit tests
Date: Wed, 21 Jan 2026 10:57:17 +0000	[thread overview]
Message-ID: <20260121105717.04853c5d@pumpkin> (raw)
In-Reply-To: <CAHp75Ve_pyVm430FL=aEAw1Cnf-92T3Y23qh7pEOaMYMp9iyvw@mail.gmail.com>

On Wed, 21 Jan 2026 09:01:29 +0200
Andy Shevchenko <andy.shevchenko@gmail.com> wrote:

...
> I understand that. My point is if we move the generic implementation
> to use word-at-a-time technique the difference should not go 4x,
> right? Perhaps 1.5x or so. I believe this will be a very useful
> exercise.

I posted a version earlier.

After the initial setup (aligning the base address and loading
some constants the loop on x86-64 is 7 instructions (should be similar
for other architectures).
I think it will execute in 4 clocks.
You then need to find the byte in the word, easy enough on LE with
a fast ffs() - but harder otherwise.
The real problem is the cost for short strings.
Like memcpy() you need a hint from the source of the 'expected' length
(as a compile-time constant) to compile-time select the algorithm.

OTOH:
	for (;;) {
		if (!ptr[0]) return ptr - start;
		ptr += 2;
	while (ptr[-1]);
	return ptr - start - 1;
has two 'load+compare+branch' and one add per loop.
On x86 that might all overlap and give you a two-clock loop
that checks one byte every clock - faster than 'rep scasb'.
(You can get a two clock loop, but not a 1 clock loop.)
I think unrolling further will make little/no difference.

The break-even for the word-at-a-time version is probably at least 64
characters.

	David

WARNING: multiple messages have this Message-ID (diff)
From: David Laight <david.laight.linux@gmail.com>
To: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Feng Jiang <jiangfeng@kylinos.cn>,
	Andy Shevchenko <andriy.shevchenko@intel.com>,
	pjw@kernel.org, palmer@dabbelt.com, aou@eecs.berkeley.edu,
	alex@ghiti.fr, akpm@linux-foundation.org, kees@kernel.org,
	andy@kernel.org, ebiggers@kernel.org, martin.petersen@oracle.com,
	ardb@kernel.org, charlie@rivosinc.com,
	conor.dooley@microchip.com, ajones@ventanamicro.com,
	linus.walleij@linaro.org, nathan@kernel.org,
	linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org,
	linux-hardening@vger.kernel.org
Subject: Re: [PATCH v3 0/8] riscv: optimize string functions and add kunit tests
Date: Wed, 21 Jan 2026 10:57:17 +0000	[thread overview]
Message-ID: <20260121105717.04853c5d@pumpkin> (raw)
In-Reply-To: <CAHp75Ve_pyVm430FL=aEAw1Cnf-92T3Y23qh7pEOaMYMp9iyvw@mail.gmail.com>

On Wed, 21 Jan 2026 09:01:29 +0200
Andy Shevchenko <andy.shevchenko@gmail.com> wrote:

...
> I understand that. My point is if we move the generic implementation
> to use word-at-a-time technique the difference should not go 4x,
> right? Perhaps 1.5x or so. I believe this will be a very useful
> exercise.

I posted a version earlier.

After the initial setup (aligning the base address and loading
some constants the loop on x86-64 is 7 instructions (should be similar
for other architectures).
I think it will execute in 4 clocks.
You then need to find the byte in the word, easy enough on LE with
a fast ffs() - but harder otherwise.
The real problem is the cost for short strings.
Like memcpy() you need a hint from the source of the 'expected' length
(as a compile-time constant) to compile-time select the algorithm.

OTOH:
	for (;;) {
		if (!ptr[0]) return ptr - start;
		ptr += 2;
	while (ptr[-1]);
	return ptr - start - 1;
has two 'load+compare+branch' and one add per loop.
On x86 that might all overlap and give you a two-clock loop
that checks one byte every clock - faster than 'rep scasb'.
(You can get a two clock loop, but not a 1 clock loop.)
I think unrolling further will make little/no difference.

The break-even for the word-at-a-time version is probably at least 64
characters.

	David

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

  parent reply	other threads:[~2026-01-21 10:57 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-20  6:58 [PATCH v3 0/8] riscv: optimize string functions and add kunit tests Feng Jiang
2026-01-20  6:58 ` Feng Jiang
2026-01-20  6:58 ` [PATCH v3 1/8] lib/string_kunit: add correctness test for strlen Feng Jiang
2026-01-20  6:58   ` Feng Jiang
2026-01-20  7:28   ` Andy Shevchenko
2026-01-20  7:28     ` Andy Shevchenko
2026-01-20  6:58 ` [PATCH v3 2/8] lib/string_kunit: add correctness test for strnlen Feng Jiang
2026-01-20  6:58   ` Feng Jiang
2026-01-20  7:29   ` Andy Shevchenko
2026-01-20  7:29     ` Andy Shevchenko
2026-01-20  6:58 ` [PATCH v3 3/8] lib/string_kunit: add correctness test for strrchr() Feng Jiang
2026-01-20  6:58   ` Feng Jiang
2026-01-20  7:30   ` Andy Shevchenko
2026-01-20  7:30     ` Andy Shevchenko
2026-01-20  6:58 ` [PATCH v3 4/8] lib/string_kunit: add performance benchmarks for strlen Feng Jiang
2026-01-20  6:58   ` Feng Jiang
2026-01-20  7:46   ` Andy Shevchenko
2026-01-20  7:46     ` Andy Shevchenko
2026-01-21  5:45     ` Feng Jiang
2026-01-21  5:45       ` Feng Jiang
2026-01-20  6:58 ` [PATCH v3 5/8] lib/string_kunit: extend benchmarks to strnlen and chr searches Feng Jiang
2026-01-20  6:58   ` Feng Jiang
2026-01-20  7:48   ` Andy Shevchenko
2026-01-20  7:48     ` Andy Shevchenko
2026-01-21  5:48     ` Feng Jiang
2026-01-21  5:48       ` Feng Jiang
2026-01-20  6:58 ` [PATCH v3 6/8] riscv: lib: add strnlen implementation Feng Jiang
2026-01-20  6:58   ` Feng Jiang
2026-01-20  7:31   ` Andy Shevchenko
2026-01-20  7:31     ` Andy Shevchenko
2026-01-21  5:52     ` Feng Jiang
2026-01-21  5:52       ` Feng Jiang
2026-01-21  7:24   ` Qingfang Deng
2026-01-21  7:24     ` Qingfang Deng
2026-01-23  1:28     ` Feng Jiang
2026-01-23  1:28       ` Feng Jiang
2026-01-20  6:58 ` [PATCH v3 7/8] riscv: lib: add strchr implementation Feng Jiang
2026-01-20  6:58   ` Feng Jiang
2026-01-20  7:31   ` Andy Shevchenko
2026-01-20  7:31     ` Andy Shevchenko
2026-01-20  6:58 ` [PATCH v3 8/8] riscv: lib: add strrchr implementation Feng Jiang
2026-01-20  6:58   ` Feng Jiang
2026-01-20  7:32   ` Andy Shevchenko
2026-01-20  7:32     ` Andy Shevchenko
2026-01-20  7:36 ` [PATCH v3 0/8] riscv: optimize string functions and add kunit tests Andy Shevchenko
2026-01-20  7:36   ` Andy Shevchenko
2026-01-21  6:44   ` Feng Jiang
2026-01-21  6:44     ` Feng Jiang
2026-01-21  7:01     ` Andy Shevchenko
2026-01-21  7:01       ` Andy Shevchenko
2026-01-21  8:12       ` Feng Jiang
2026-01-21  8:12         ` Feng Jiang
2026-01-21 10:57       ` David Laight [this message]
2026-01-21 10:57         ` David Laight
2026-01-23  3:12         ` Feng Jiang
2026-01-23  3:12           ` Feng Jiang
2026-01-23 10:16           ` David Laight
2026-01-23 10:16             ` David Laight

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260121105717.04853c5d@pumpkin \
    --to=david.laight.linux@gmail.com \
    --cc=ajones@ventanamicro.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex@ghiti.fr \
    --cc=andriy.shevchenko@intel.com \
    --cc=andy.shevchenko@gmail.com \
    --cc=andy@kernel.org \
    --cc=aou@eecs.berkeley.edu \
    --cc=ardb@kernel.org \
    --cc=charlie@rivosinc.com \
    --cc=conor.dooley@microchip.com \
    --cc=ebiggers@kernel.org \
    --cc=jiangfeng@kylinos.cn \
    --cc=kees@kernel.org \
    --cc=linus.walleij@linaro.org \
    --cc=linux-hardening@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=martin.petersen@oracle.com \
    --cc=nathan@kernel.org \
    --cc=palmer@dabbelt.com \
    --cc=pjw@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.