Re: [PATCH next] string: Optimise strlen()

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: David Laight <david.laight.linux@gmail.com>
To: Andrew Morton <akpm@linux-foundation.org>,
	Kees Cook <kees@kernel.org>, Andy Shevchenko <andy@kernel.org>,
	linux-kernel@vger.kernel.org, linux-hardening@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH next] string: Optimise strlen()
Date: Sun, 19 Apr 2026 11:41:17 +0100	[thread overview]
Message-ID: <20260419114117.7cf50b2b@pumpkin> (raw)
In-Reply-To: <20260327195737.89537-1-david.laight.linux@gmail.com>

On Fri, 27 Mar 2026 19:57:37 +0000
david.laight.linux@gmail.com wrote:

> From: David Laight <david.laight.linux@gmail.com>
> 
> Unrolling the loop once significantly improves performance on some CPU.
> Userspace testing on a Zen-5 shows it runs at two bytes/clock rather than
> one byte/clock with only a marginal additional overhead.

I hate benchmarks.

I've finally got around to looking at this again (on x86-64).
I changed the order of the 'single byte' and 'two byte' tests and the
'two byte' loop slowed down massively - to pretty much the same speed
as the 'single byte' loop.
gcc had swapped over the two functions in the object file.
Swapping the order changed the alignment of the loop top between odd and
even multiples of 16 (this alignment is disabled in kernel to avoid bloat).
The loop in the 'two byte' code is 17 bytes, in the slow case the loop
top is aligned to an odd boundary so that the last byte is in a different
32 byte code block - which is presumably slow.
Changing the two 'cmpb $0, mem' to (say) 'cmpb %cl, mem' would reduce
the loop to 15 bytes and so wouldn't cross a 16 byte boundary.
(The 'single byte' loop doesn't cross a 16 byte boundary in the test program.)

The kernel build I just looked at has strlen() aligned to a 16 byte
boundary with the branch crossing the next 16 byte boundary.
So, if the same is true as in my test program, strlen() will run a
lot slower on 50% of kernel builds.
(And other cpu may have costs associated with the 16 byte boundary.)

Mostly this means that however hard you try you are guaranteed to
lose somewhere :-(

> 
> Using 'byte masking' is faster for longer strings - the break-even point
> is around 56 bytes on the same Zen-5 (there is much larger overhead, then
> it runs at 16 bytes in 3 clocks).
> But the majority of kernel calls won't be near that length.
> There will also be extra overhead for big-endian systems and those
> without a fast ffs().

I've had a further thought on that as well.

The 'byte masking' code is somewhat larger (112 rather than 32 or 48).
While the extra overhead is ~20 clocks, that is less than a 'branch
mispredict' penalty that the byte loop suffers every time the length
changes.
So for randomly changing lengths I'm beginning to think the 'byte mask'
version is better.

I ran the code on a Haswell a while back, the break even length was
also somewhat shorter (I'm remembering 32 bytes).

This all means the byte masking code may actually be sensible provided.
- LE or BE with byte swapping memory read.
- fast ffsl()
- 64bit

	David

     prev parent reply	other threads:[~2026-04-19 10:41 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20260327195737.89537-1-david.laight.linux@gmail.com>
2026-03-27 20:37 ` [PATCH next] string: Optimise strlen() Linus Torvalds
2026-03-27 22:49   ` David Laight
2026-03-28  0:29     ` Linus Torvalds
2026-03-28 11:08       ` David Laight
2026-03-28 19:16         ` Linus Torvalds
2026-03-28 21:47           ` David Laight
2026-04-19 10:41 ` David Laight [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260419114117.7cf50b2b@pumpkin \
    --to=david.laight.linux@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=andy@kernel.org \
    --cc=kees@kernel.org \
    --cc=linux-hardening@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox