All of lore.kernel.org
 help / color / mirror / Atom feed
From: david.laight.linux@gmail.com
To: Andrew Morton <akpm@linux-foundation.org>,
	Kees Cook <kees@kernel.org>, Andy Shevchenko <andy@kernel.org>,
	linux-kernel@vger.kernel.org, linux-hardening@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>
Cc: David Laight <david.laight.linux@gmail.com>
Subject: [PATCH next] string: Optimise strlen()
Date: Fri, 27 Mar 2026 19:57:37 +0000	[thread overview]
Message-ID: <20260327195737.89537-1-david.laight.linux@gmail.com> (raw)

From: David Laight <david.laight.linux@gmail.com>

Unrolling the loop once significantly improves performance on some CPU.
Userspace testing on a Zen-5 shows it runs at two bytes/clock rather than
one byte/clock with only a marginal additional overhead.

Using 'byte masking' is faster for longer strings - the break-even point
is around 56 bytes on the same Zen-5 (there is much larger overhead, then
it runs at 16 bytes in 3 clocks).
But the majority of kernel calls won't be near that length.
There will also be extra overhead for big-endian systems and those
without a fast ffs().

Signed-off-by: David Laight <david.laight.linux@gmail.com>
---

For reference 'rep scasb' comes in at 150 + 3 per byte on Zen-5.

I've not tested any Intel CPU, I don't think they can run a
'1 clock loop' but the change might improve performance from
2 clocks/byte to 1 clock/byte.
I can test Intel up to i7-7xxx but don't have any older AMD CPU
or any other architecutes (apart from a pi-5).

Other architectures may well see an improvement.
If only because of a reduced number of taken branches.

I did notice that arm64 uses a very large asm block that is clearly
optimised for very long strings - I suspect the C version will be
faster in the kernel.

 lib/string.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/lib/string.c b/lib/string.c
index b632c71df1a5..31de9aa86409 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -415,11 +415,13 @@ EXPORT_SYMBOL(strnchr);
 #ifndef __HAVE_ARCH_STRLEN
 size_t strlen(const char *s)
 {
-	const char *sc;
+	size_t len;
 
-	for (sc = s; *sc != '\0'; ++sc)
-		/* nothing */;
-	return sc - s;
+	for (len = 0; likely(s[len]); len += 2) {
+		if (!s[len + 1])
+			return len + 1;
+	}
+	return len;
 }
 EXPORT_SYMBOL(strlen);
 #endif
-- 
2.39.5


             reply	other threads:[~2026-03-27 19:57 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-27 19:57 david.laight.linux [this message]
2026-03-27 20:37 ` [PATCH next] string: Optimise strlen() Linus Torvalds
2026-03-27 22:49   ` David Laight
2026-03-28  0:29     ` Linus Torvalds
2026-03-28 11:08       ` David Laight
2026-03-28 19:16         ` Linus Torvalds
2026-03-28 21:47           ` David Laight
2026-04-19 10:41 ` David Laight

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260327195737.89537-1-david.laight.linux@gmail.com \
    --to=david.laight.linux@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=andy@kernel.org \
    --cc=kees@kernel.org \
    --cc=linux-hardening@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.