From: David Laight <david.laight.linux@gmail.com>
To: Paul Walmsley <pjw@kernel.org>
Cc: Feng Jiang <jiangfeng@kylinos.cn>,
palmer@dabbelt.com, aou@eecs.berkeley.edu, alex@ghiti.fr,
samuel.holland@sifive.com, charlie@rivosinc.com,
conor.dooley@microchip.com, linux-riscv@lists.infradead.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] riscv: lib: optimize strlen loop efficiency
Date: Wed, 28 Jan 2026 18:59:04 +0000 [thread overview]
Message-ID: <20260128185904.5ec5c24e@pumpkin> (raw)
In-Reply-To: <20260115184619.574f1b36@pumpkin>
On Thu, 15 Jan 2026 18:46:19 +0000
David Laight <david.laight.linux@gmail.com> wrote:
...
> While I suspect the per-byte cost is 'two bytes/clock' on x86-64
> the fixed cost may move the break-even point above the length of the
> average strlen() in the kernel.
> Of course, x86 probably falls back to 'rep scasb' at (maybe)
> (40 + 2n) clocks for 'n' bytes.
> A carefully written slightly unrolled asm loop might manage one
> byte per clock!
> I could spend weeks benchmarking different versions.
I've spent a quick half-hour...
On my zen-5 in userspace:
glibc's strlen() is showing the same fixed cost (50 clocks including overhead)
for sizes below (about) 100 bytes, for big buffers add 1 clock for ~50 bytes.
It must be using some simd instructions.
A simple:
len = 0; while (s[len]) len++; return len;
loop is about 1 byte/clock, overhead ~25 clocks (probably the mostly one 'rdpmc'
instruction).
(Needs a barrier() to stop gcc converting it to a libc call.)
Unrolling the loop once:
for (len = 0; s[len]; len += 2)
if (!s[len + 1] return len + 1;
return len;
actually runs twice as fast - so 2 bytes/clock.
Unrolling 4 times doesn't help, suddenly goes somewhat slower somewhere
between 128 and 256 bytes (to 1.5 bytes/clock).
The C 'longs' loop has an overhead of ~45 clocks and does 6 bytes/clock.
So the is better for buffers longer than 64 bytes.
The 'elephant in the room' is 'repne scasb'.
The fixed cost is some 150 clocks and the cost 3 clocks/byte.
I don't think any of the Intel cpu I have will do a 'one clock loop'.
I certainly failed to get one in the past when there was a data-dependency
between the iterations.
But I don't have anything modern (newest is an i7-7xxx) and I don't have
any old amd ones.
I needs to get a zen-1 (or 1a) and one of the Intel system that should be
cheap because they won't run win-11.
David
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
next prev parent reply other threads:[~2026-01-28 18:59 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-18 3:26 [PATCH] riscv: lib: optimize strlen loop efficiency Feng Jiang
2026-01-15 2:03 ` Paul Walmsley
2026-01-15 3:23 ` Feng Jiang
2026-01-24 8:14 ` Paul Walmsley
2026-01-26 3:05 ` Feng Jiang
2026-01-15 11:19 ` David Laight
2026-01-15 18:46 ` David Laight
2026-01-26 2:52 ` Feng Jiang
2026-01-28 18:59 ` David Laight [this message]
2026-01-29 8:34 ` Feng Jiang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260128185904.5ec5c24e@pumpkin \
--to=david.laight.linux@gmail.com \
--cc=alex@ghiti.fr \
--cc=aou@eecs.berkeley.edu \
--cc=charlie@rivosinc.com \
--cc=conor.dooley@microchip.com \
--cc=jiangfeng@kylinos.cn \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-riscv@lists.infradead.org \
--cc=palmer@dabbelt.com \
--cc=pjw@kernel.org \
--cc=samuel.holland@sifive.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox