public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: David Laight <david.laight.linux@gmail.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Kees Cook <kees@kernel.org>, Andy Shevchenko <andy@kernel.org>,
	linux-kernel@vger.kernel.org, linux-hardening@vger.kernel.org
Subject: Re: [PATCH next] string: Optimise strlen()
Date: Fri, 27 Mar 2026 22:49:28 +0000	[thread overview]
Message-ID: <20260327224928.7c4220cb@pumpkin> (raw)
In-Reply-To: <CAHk-=wgADjjXEN_8ANbU_E94uxbQHW-3v5p0-pwLOtTvL+0Wrw@mail.gmail.com>

On Fri, 27 Mar 2026 13:37:29 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Fri, 27 Mar 2026 at 12:57, <david.laight.linux@gmail.com> wrote:
> >
> > Using 'byte masking' is faster for longer strings - the break-even point
> > is around 56 bytes on the same Zen-5 (there is much larger overhead, then
> > it runs at 16 bytes in 3 clocks).  
> 
> What byte masking approach did you actually use?

This is the code I was testing.
It does aligned accesses - I did measure it without the alignment
code, made little/no difference.

The OPTIMIZER_HIDE_VAR() is needed to stop gcc generating different
64bit constants and to make it generate the constant in a sane way
(especially on architectures with only 16bit immediates).

size_t strlen_longs(const char *s)
{       
        unsigned int off = (unsigned long)s % sizeof (long);
        const unsigned long *p = (void *)(s - off);
        unsigned long ones = 0x01010101ul;
        unsigned long val;
        unsigned long mask;
        int first = 1;
                
        OPTIMIZER_HIDE_VAR(ones);
        ones |= ones << 16 << 16;
     
        mask = (~0ul >> 8) >> 8 * (sizeof (long) - 1 - off);

// I've just realised that might be better as:
//	mask = ones >> 1 + 8 * (sizeof (long) - 1 - off);
// which has the right properties and stops the compiler generating
// 0x00ffffffffffffff

        val = *p | mask;        
                
        do {    
                if (!first)
                        val = *++p;
                first = 0;
                mask = (val - ones) & ~val & (ones << 7);
        } while (!mask);

        off = (__builtin_ffsl(mask) - 1)/8;
        
        return (const char *)p + off - s;
}

That loop is the one that compiled best, ISTR it has a 'spare'
register move in it ('first' gets optimised out).

On many BE systems doing a byteswapping memory read may be best.

> We have 'lib/strnlen_user.c', which is actually the only strlen() in
> the kernel that I've really ever seen in profiles (it shows up for
> execve() with lots of arguments).
> 
> That has tons of extra overhead due to the whole user access setup,
> but the core loop should be pretty good with that has_zero() thing.

I've not measured strnlen(), but it wouldn't surprise me if argv[]
processing wouldn't be faster with something like the strlen() in this
patch.
After all arguments are usually relatively short.

If you were going to use the above then both 'ones' and 'ones << 7'
need so be calculated once and kept in registers.

> I do agree that we shouldn't use 'rep scas'. It goes back to the
> *very* original linux kernel sources, though, and I've never seen it
> in profiles because very few things in the kernel actually use strings
> a lot.

True, and most are short.
strscpy() is next on the list...

And the arm64 strlen() has special code to optimise crossing page boundaries.
God knows how slow it is on your typical 10 character string.

	David

> 
>                Linus


  reply	other threads:[~2026-03-27 22:49 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20260327195737.89537-1-david.laight.linux@gmail.com>
2026-03-27 20:37 ` [PATCH next] string: Optimise strlen() Linus Torvalds
2026-03-27 22:49   ` David Laight [this message]
2026-03-28  0:29     ` Linus Torvalds
2026-03-28 11:08       ` David Laight
2026-03-28 19:16         ` Linus Torvalds
2026-03-28 21:47           ` David Laight

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260327224928.7c4220cb@pumpkin \
    --to=david.laight.linux@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=andy@kernel.org \
    --cc=kees@kernel.org \
    --cc=linux-hardening@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox