public inbox for linux-riscv@lists.infradead.org
 help / color / mirror / Atom feed
From: David Laight <david.laight.linux@gmail.com>
To: Paul Walmsley <pjw@kernel.org>
Cc: Feng Jiang <jiangfeng@kylinos.cn>,
	palmer@dabbelt.com, aou@eecs.berkeley.edu, alex@ghiti.fr,
	samuel.holland@sifive.com, charlie@rivosinc.com,
	conor.dooley@microchip.com, linux-riscv@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] riscv: lib: optimize strlen loop efficiency
Date: Thu, 15 Jan 2026 18:46:19 +0000	[thread overview]
Message-ID: <20260115184619.574f1b36@pumpkin> (raw)
In-Reply-To: <20260115111947.54929ed0@pumpkin>

On Thu, 15 Jan 2026 11:19:47 +0000
David Laight <david.laight.linux@gmail.com> wrote:

> For 64bit you can do a lot better (in C) by loading 64bit words and doing
> the correct 'shift and mask' sequence to detect a zero byte.
> It usually isn't worth in for 32bit.
> 
> Does need to handle a mis-aligned base - eg by masking the bits off
> the base pointer and or'ing in non-zero values to the value read from
> the base pointer.
> 
> 	David

The version below seems to work https://www.godbolt.org/z/sME3Ts6vW
It actually looks ok for x86-32, the loop is 8 instructions plus the branch
but the 'register dependency chain' is only 4 instructions.
So maybe better than byte compares for moderate to long strings.
(Especially if the cpu starts speculatively executing the next loop
iteration.)

The OPTIMIZER_HIDE_VAR() helps a lot on (eg) MIPS-64 and a bit elsewhere
since most 64bit cpu can't load 64bit immediates.

I can't get gcc and clang to reliably have a loop with a conditional
jump at the bottom, especially with an unconditional jump into the
loop (to remove the '| mask' from the loop body).

Also KASAN (or one of its friends) wont like the code reading entire
words that hold the string.

And it does need ffs/clz instructions - or a different loop bottom.
(For BE one with clzl() returning 0 will work.) 

While I suspect the per-byte cost is 'two bytes/clock' on x86-64
the fixed cost may move the break-even point above the length of the
average strlen() in the kernel.
Of course, x86 probably falls back to 'rep scasb' at (maybe)
(40 + 2n) clocks for 'n' bytes.
A carefully written slightly unrolled asm loop might manage one
byte per clock!
I could spend weeks benchmarking different versions.

	David

#define OPTIMIZER_HIDE_VAR(var)                                         \
        __asm__ ("" : "=r" (var) : "0" (var))

/* Set BE to test big-endian on little-endian.
 * For real BE either do a byteswapping read or use the BE code. */
#ifdef BE
#define SWP(x) __builtin_bswap64(x)
#define SHIFT <<
#else
#define SWP(x) (x)
#define SHIFT >>
#endif

unsigned long my_strlen(const char *s)
{
        unsigned int off = (unsigned long)s % sizeof (long);
        const unsigned long *p = (void *)(s - off);
        unsigned long val;
        unsigned long mask;
        unsigned long ones = 0x01010101;
        /* Force the compiler to generate the related constants sanely. */
        OPTIMIZER_HIDE_VAR(ones);
        ones |= ones << 16 << 16;

        mask = ((~0ul SHIFT 8) SHIFT 8 * (sizeof (long) - 1 - off));
        do {
                val = SWP(*p++) | mask;
                mask = (val - ones) & ~val & ones << 7;
        } while (!mask);

#ifdef BE
        off = __builtin_clzl(mask);
        /* Correct for "...\x01" */
        val <<= off;
        for (off /= 8; val > (~0ul >> 8); off++)
                val <<= 8;
#else
        off = (__builtin_ffsl(mask) - 1)/8;
#endif

        return (const char *)(p - 1) + off - s;
}

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

  reply	other threads:[~2026-01-15 18:46 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-18  3:26 [PATCH] riscv: lib: optimize strlen loop efficiency Feng Jiang
2026-01-15  2:03 ` Paul Walmsley
2026-01-15  3:23   ` Feng Jiang
2026-01-24  8:14     ` Paul Walmsley
2026-01-26  3:05       ` Feng Jiang
2026-01-15 11:19   ` David Laight
2026-01-15 18:46     ` David Laight [this message]
2026-01-26  2:52       ` Feng Jiang
2026-01-28 18:59       ` David Laight
2026-01-29  8:34         ` Feng Jiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260115184619.574f1b36@pumpkin \
    --to=david.laight.linux@gmail.com \
    --cc=alex@ghiti.fr \
    --cc=aou@eecs.berkeley.edu \
    --cc=charlie@rivosinc.com \
    --cc=conor.dooley@microchip.com \
    --cc=jiangfeng@kylinos.cn \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=palmer@dabbelt.com \
    --cc=pjw@kernel.org \
    --cc=samuel.holland@sifive.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox