[PATCH v2 00/14] riscv: optimize string functions and add kunit tests

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 00/14] riscv: optimize string functions and add kunit tests
@ 2026-01-13  8:27 Feng Jiang
  2026-01-13  8:27 ` [PATCH v2 01/14] lib/string: extract generic strlen() into __generic_strlen() Feng Jiang
                   ` (15 more replies)
  0 siblings, 16 replies; 36+ messages in thread
From: Feng Jiang @ 2026-01-13  8:27 UTC (permalink / raw)
  To: pjw, palmer, aou, alex, kees, andy, akpm, jiangfeng, ebiggers,
	martin.petersen, ardb, ajones, conor.dooley, samuel.holland,
	linus.walleij, nathan
  Cc: linux-riscv, linux-kernel, linux-hardening

This series introduces optimized assembly implementations for strnlen,
strchr, and strrchr on the RISC-V architecture. To support a rigorous
verification process, the series also significantly expands the
string_kunit test suite with both functional correctness tests and
performance benchmarks.

The patchset is organized as follows:
- Refactoring (Patches 1-4): Extract generic C implementations for
  strlen, strnlen, strchr, and strrchr into exported __generic_* functions.
- Correctness Testing (Patches 5-7): Extend string_kunit with detailed
  functional tests for the target functions.
- Performance Benchmarking (Patches 8-11): Add a benchmarking framework
  to string_kunit to measure execution time across various string lengths.
- RISC-V Optimizations (Patches 12-14): Provide the optimized assembly
  implementations for the RISC-V architecture.

Testing:
All patches have been verified using the KUnit framework on QEMU 
virt machine (riscv64). All string-related tests passed.

    $ ./tools/testing/kunit/kunit.py run --arch=riscv \
        --cross_compile=riscv64-linux-gnu- \
        --kunitconfig=my_string.kunitconfig \
        --raw_output
    [15:26:26] Configuring KUnit Kernel ...
    ...
        ok 1 string_test_memset16
        ok 2 string_test_memset32
        ok 3 string_test_memset64
        ok 4 string_test_strlen
        # string_test_strlen_bench: strlen performance (short, len: 8, iters: 100000):
        # string_test_strlen_bench:   arch-optimized: 148900 ns
        # string_test_strlen_bench:   generic C:      5551900 ns
        # string_test_strlen_bench:   speedup:        37.28x
        # string_test_strlen_bench: strlen performance (medium, len: 64, iters: 100000):
        # string_test_strlen_bench:   arch-optimized: 166000 ns
        # string_test_strlen_bench:   generic C:      16250200 ns
        # string_test_strlen_bench:   speedup:        97.89x
        # string_test_strlen_bench: strlen performance (long, len: 2048, iters: 10000):
        # string_test_strlen_bench:   arch-optimized: 14100 ns
        # string_test_strlen_bench:   generic C:      35605600 ns
        # string_test_strlen_bench:   speedup:        2525.21x
        ok 5 string_test_strlen_bench
        ok 6 string_test_strnlen
        # string_test_strnlen_bench: strnlen performance (short, len: 8, iters: 100000):
        # string_test_strnlen_bench:   arch-optimized: 147500 ns
        # string_test_strnlen_bench:   generic C:      6429800 ns
        # string_test_strnlen_bench:   speedup:        43.59x
        # string_test_strnlen_bench: strnlen performance (medium, len: 64, iters: 100000):
        # string_test_strnlen_bench:   arch-optimized: 197900 ns
        # string_test_strnlen_bench:   generic C:      22322500 ns
        # string_test_strnlen_bench:   speedup:        112.79x
        # string_test_strnlen_bench: strnlen performance (long, len: 2048, iters: 10000):
        # string_test_strnlen_bench:   arch-optimized: 14100 ns
        # string_test_strnlen_bench:   generic C:      56162600 ns
        # string_test_strnlen_bench:   speedup:        3983.16x
        ok 7 string_test_strnlen_bench
        ok 8 string_test_strchr
        # string_test_strchr_bench: strchr performance (short, len: 8, iters: 100000):
        # string_test_strchr_bench:   arch-optimized: 166800 ns
        # string_test_strchr_bench:   generic C:      6079400 ns
        # string_test_strchr_bench:   speedup:        36.44x
        # string_test_strchr_bench: strchr performance (medium, len: 64, iters: 100000):
        # string_test_strchr_bench:   arch-optimized: 151500 ns
        # string_test_strchr_bench:   generic C:      21130400 ns
        # string_test_strchr_bench:   speedup:        139.47x
        # string_test_strchr_bench: strchr performance (long, len: 2048, iters: 10000):
        # string_test_strchr_bench:   arch-optimized: 32800 ns
        # string_test_strchr_bench:   generic C:      50630400 ns
        # string_test_strchr_bench:   speedup:        1543.60x
        ok 9 string_test_strchr_bench
        ok 10 string_test_strnchr
        ok 11 string_test_strrchr
        # string_test_strrchr_bench: strrchr performance (short, len: 8, iters: 100000):
        # string_test_strrchr_bench:   arch-optimized: 166300 ns
        # string_test_strrchr_bench:   generic C:      6201400 ns
        # string_test_strrchr_bench:   speedup:        37.29x
        # string_test_strrchr_bench: strrchr performance (medium, len: 64, iters: 100000):
        # string_test_strrchr_bench:   arch-optimized: 207200 ns
        # string_test_strrchr_bench:   generic C:      23062700 ns
        # string_test_strrchr_bench:   speedup:        111.30x
        # string_test_strrchr_bench: strrchr performance (long, len: 2048, iters: 10000):
        # string_test_strrchr_bench:   arch-optimized: 14000 ns
        # string_test_strrchr_bench:   generic C:      51192900 ns
        # string_test_strrchr_bench:   speedup:        3656.63x
        ok 12 string_test_strrchr_bench
        ok 13 string_test_strspn
    ...
    # string: pass:28 fail:0 skip:0 total:28
    # Totals: pass:28 fail:0 skip:0 total:28
    ok 1 string
    reboot: Restarting system
    [15:28:10] Elapsed time: 103.449s total, 0.001s configuring, 101.878s building, 1.569s running

Changes:
v1: Initial submission.

v2: 
- Refactored lib/string.c to export __generic_* functions and added
  corresponding functional/performance tests for strnlen, strchr,
  and strrchr (Andy Shevchenko).
- Replaced magic numbers with STRING_TEST_MAX_LEN etc. (Andy Shevchenko).

---

Feng Jiang (14):
  lib/string: extract generic strlen() into __generic_strlen()
  lib/string: extract generic strnlen() into __generic_strnlen()
  lib/string: extract generic strchr() into __generic_strchr()
  lib/string: extract generic strrchr() into __generic_strrchr()
  lib/string_kunit: add correctness test for strlen
  lib/string_kunit: add correctness test for strnlen
  lib/string_kunit: add correctness test for strrchr()
  lib/string_kunit: add performance benchmark for strlen()
  lib/string_kunit: add performance benchmark for strnlen()
  lib/string_kunit: add performance benchmark for strchr()
  lib/string_kunit: add performance benchmark for strrchr()
  riscv: lib: add strnlen implementation
  riscv: lib: add strchr implementation
  riscv: lib: add strrchr implementation

 arch/riscv/include/asm/string.h |   9 +
 arch/riscv/lib/Makefile         |   3 +
 arch/riscv/lib/strchr.S         |  35 ++++
 arch/riscv/lib/strnlen.S        | 164 +++++++++++++++
 arch/riscv/lib/strrchr.S        |  37 ++++
 arch/riscv/purgatory/Makefile   |  11 +-
 include/linux/string.h          |   4 +
 lib/string.c                    |  53 +++--
 lib/tests/string_kunit.c        | 344 ++++++++++++++++++++++++++++++++
 9 files changed, 645 insertions(+), 15 deletions(-)
 create mode 100644 arch/riscv/lib/strchr.S
 create mode 100644 arch/riscv/lib/strnlen.S
 create mode 100644 arch/riscv/lib/strrchr.S

-- 
2.25.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 01/14] lib/string: extract generic strlen() into __generic_strlen()
  2026-01-13  8:27 [PATCH v2 00/14] riscv: optimize string functions and add kunit tests Feng Jiang
@ 2026-01-13  8:27 ` Feng Jiang
  2026-01-13  8:33   ` Andy Shevchenko
  2026-01-14  0:01   ` Eric Biggers
  2026-01-13  8:27 ` [PATCH v2 02/14] lib/string: extract generic strnlen() into __generic_strnlen() Feng Jiang
                   ` (14 subsequent siblings)
  15 siblings, 2 replies; 36+ messages in thread
From: Feng Jiang @ 2026-01-13  8:27 UTC (permalink / raw)
  To: pjw, palmer, aou, alex, kees, andy, akpm, jiangfeng, ebiggers,
	martin.petersen, ardb, ajones, conor.dooley, samuel.holland,
	linus.walleij, nathan
  Cc: linux-riscv, linux-kernel, linux-hardening

To support performance benchmarking in KUnit tests, extract the
generic C implementation of strlen() into a standalone function
__generic_strlen(). This allows tests to compare architecture-optimized
versions against the generic baseline without duplicating code.

Suggested-by: Andy Shevchenko <andy@kernel.org>
Signed-off-by: Feng Jiang <jiangfeng@kylinos.cn>
---
 include/linux/string.h |  1 +
 lib/string.c           | 10 ++++++++--
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/include/linux/string.h b/include/linux/string.h
index 1b564c36d721..961645633b4d 100644
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@ -197,6 +197,7 @@ extern char * strstr(const char *, const char *);
 #ifndef __HAVE_ARCH_STRNSTR
 extern char * strnstr(const char *, const char *, size_t);
 #endif
+extern __kernel_size_t __generic_strlen(const char *);
 #ifndef __HAVE_ARCH_STRLEN
 extern __kernel_size_t strlen(const char *);
 #endif
diff --git a/lib/string.c b/lib/string.c
index b632c71df1a5..047ecb38e09b 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -412,8 +412,7 @@ char *strnchr(const char *s, size_t count, int c)
 EXPORT_SYMBOL(strnchr);
 #endif
 
-#ifndef __HAVE_ARCH_STRLEN
-size_t strlen(const char *s)
+size_t __generic_strlen(const char *s)
 {
 	const char *sc;
 
@@ -421,6 +420,13 @@ size_t strlen(const char *s)
 		/* nothing */;
 	return sc - s;
 }
+EXPORT_SYMBOL(__generic_strlen);
+
+#ifndef __HAVE_ARCH_STRLEN
+size_t strlen(const char *s)
+{
+	return __generic_strlen(s);
+}
 EXPORT_SYMBOL(strlen);
 #endif
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 01/14] lib/string: extract generic strlen() into __generic_strlen()
  2026-01-13  8:27 ` [PATCH v2 01/14] lib/string: extract generic strlen() into __generic_strlen() Feng Jiang
@ 2026-01-13  8:33   ` Andy Shevchenko
  2026-01-14  0:01   ` Eric Biggers
  1 sibling, 0 replies; 36+ messages in thread
From: Andy Shevchenko @ 2026-01-13  8:33 UTC (permalink / raw)
  To: Feng Jiang
  Cc: pjw, palmer, aou, alex, kees, andy, akpm, ebiggers,
	martin.petersen, ardb, ajones, conor.dooley, samuel.holland,
	linus.walleij, nathan, linux-riscv, linux-kernel, linux-hardening

On Tue, Jan 13, 2026 at 04:27:35PM +0800, Feng Jiang wrote:
> To support performance benchmarking in KUnit tests, extract the
> generic C implementation of strlen() into a standalone function
> __generic_strlen(). This allows tests to compare architecture-optimized
> versions against the generic baseline without duplicating code.

...

> +size_t strlen(const char *s)
> +{
> +	return __generic_strlen(s);
> +}
>  EXPORT_SYMBOL(strlen);

There is no point anymore to have this as an exported function, right? So it can
be moved to string.h as static inline.

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 01/14] lib/string: extract generic strlen() into __generic_strlen()
  2026-01-13  8:27 ` [PATCH v2 01/14] lib/string: extract generic strlen() into __generic_strlen() Feng Jiang
  2026-01-13  8:33   ` Andy Shevchenko
@ 2026-01-14  0:01   ` Eric Biggers
  2026-01-14  1:41     ` Feng Jiang
                       ` (2 more replies)
  1 sibling, 3 replies; 36+ messages in thread
From: Eric Biggers @ 2026-01-14  0:01 UTC (permalink / raw)
  To: Feng Jiang
  Cc: pjw, palmer, aou, alex, kees, andy, akpm, martin.petersen, ardb,
	ajones, conor.dooley, samuel.holland, linus.walleij, nathan,
	linux-riscv, linux-kernel, linux-hardening

On Tue, Jan 13, 2026 at 04:27:35PM +0800, Feng Jiang wrote:
> To support performance benchmarking in KUnit tests, extract the
> generic C implementation of strlen() into a standalone function
> __generic_strlen(). This allows tests to compare architecture-optimized
> versions against the generic baseline without duplicating code.
> 
> Suggested-by: Andy Shevchenko <andy@kernel.org>
> Signed-off-by: Feng Jiang <jiangfeng@kylinos.cn>
> ---
>  include/linux/string.h |  1 +
>  lib/string.c           | 10 ++++++++--
>  2 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/string.h b/include/linux/string.h
> index 1b564c36d721..961645633b4d 100644
> --- a/include/linux/string.h
> +++ b/include/linux/string.h
> @@ -197,6 +197,7 @@ extern char * strstr(const char *, const char *);
>  #ifndef __HAVE_ARCH_STRNSTR
>  extern char * strnstr(const char *, const char *, size_t);
>  #endif
> +extern __kernel_size_t __generic_strlen(const char *);
>  #ifndef __HAVE_ARCH_STRLEN
>  extern __kernel_size_t strlen(const char *);
>  #endif
> diff --git a/lib/string.c b/lib/string.c
> index b632c71df1a5..047ecb38e09b 100644
> --- a/lib/string.c
> +++ b/lib/string.c
> @@ -412,8 +412,7 @@ char *strnchr(const char *s, size_t count, int c)
>  EXPORT_SYMBOL(strnchr);
>  #endif
>  
> -#ifndef __HAVE_ARCH_STRLEN
> -size_t strlen(const char *s)
> +size_t __generic_strlen(const char *s)
>  {
>  	const char *sc;
>  
> @@ -421,6 +420,13 @@ size_t strlen(const char *s)
>  		/* nothing */;
>  	return sc - s;
>  }
> +EXPORT_SYMBOL(__generic_strlen);
> +
> +#ifndef __HAVE_ARCH_STRLEN
> +size_t strlen(const char *s)
> +{
> +	return __generic_strlen(s);
> +}
>  EXPORT_SYMBOL(strlen);

A similar problem exists with the architecture-optimized CRC and crypto
functions.  Historically, these subsystems exported both generic and
architecture-optimized functions.

We've actually been moving away from that design to simplify things.
For example, for CRC-32C there's now just the crc32c() function which
delegates to the "best" CRC-32C implementation, with no direct access to
the generic implementation of CRC-32C.

crc_kunit then just tests and benchmarks crc32c().  To check how the
performance of crc32c() changes when its implementation changes (whether
the change is the addition of an arch-optimized implementation or a
change in an existing arch-optimized implementation), the developer just
needs to run crc_kunit with two kernels, before and after.

I suggest just doing that.  In that case there would be no need to
export the generic implementations of these functions.

(Also note that *if* the generic functions are exported, they probably
should be exported only when the KUnit test is enabled.  There's no need
to include them in the kernel image when the test isn't enabled.)

- Eric

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 01/14] lib/string: extract generic strlen() into __generic_strlen()
  2026-01-14  0:01   ` Eric Biggers
@ 2026-01-14  1:41     ` Feng Jiang
  2026-01-14  7:07     ` Andy Shevchenko
  2026-01-14 10:10     ` David Laight
  2 siblings, 0 replies; 36+ messages in thread
From: Feng Jiang @ 2026-01-14  1:41 UTC (permalink / raw)
  To: Eric Biggers, andy
  Cc: pjw, palmer, aou, alex, kees, andy, akpm, martin.petersen, ardb,
	ajones, conor.dooley, samuel.holland, linus.walleij, nathan,
	linux-riscv, linux-kernel, linux-hardening

On 2026/1/14 08:01, Eric Biggers wrote:
> On Tue, Jan 13, 2026 at 04:27:35PM +0800, Feng Jiang wrote:
>> To support performance benchmarking in KUnit tests, extract the
>> generic C implementation of strlen() into a standalone function
>> __generic_strlen(). This allows tests to compare architecture-optimized
>> versions against the generic baseline without duplicating code.
>>
>> Suggested-by: Andy Shevchenko <andy@kernel.org>
>> Signed-off-by: Feng Jiang <jiangfeng@kylinos.cn>
>> ---
>>  include/linux/string.h |  1 +
>>  lib/string.c           | 10 ++++++++--
>>  2 files changed, 9 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/string.h b/include/linux/string.h
>> index 1b564c36d721..961645633b4d 100644
>> --- a/include/linux/string.h
>> +++ b/include/linux/string.h
>> @@ -197,6 +197,7 @@ extern char * strstr(const char *, const char *);
>>  #ifndef __HAVE_ARCH_STRNSTR
>>  extern char * strnstr(const char *, const char *, size_t);
>>  #endif
>> +extern __kernel_size_t __generic_strlen(const char *);
>>  #ifndef __HAVE_ARCH_STRLEN
>>  extern __kernel_size_t strlen(const char *);
>>  #endif
>> diff --git a/lib/string.c b/lib/string.c
>> index b632c71df1a5..047ecb38e09b 100644
>> --- a/lib/string.c
>> +++ b/lib/string.c
>> @@ -412,8 +412,7 @@ char *strnchr(const char *s, size_t count, int c)
>>  EXPORT_SYMBOL(strnchr);
>>  #endif
>>  
>> -#ifndef __HAVE_ARCH_STRLEN
>> -size_t strlen(const char *s)
>> +size_t __generic_strlen(const char *s)
>>  {
>>  	const char *sc;
>>  
>> @@ -421,6 +420,13 @@ size_t strlen(const char *s)
>>  		/* nothing */;
>>  	return sc - s;
>>  }
>> +EXPORT_SYMBOL(__generic_strlen);
>> +
>> +#ifndef __HAVE_ARCH_STRLEN
>> +size_t strlen(const char *s)
>> +{
>> +	return __generic_strlen(s);
>> +}
>>  EXPORT_SYMBOL(strlen);
> 
> A similar problem exists with the architecture-optimized CRC and crypto
> functions.  Historically, these subsystems exported both generic and
> architecture-optimized functions.
> 
> We've actually been moving away from that design to simplify things.
> For example, for CRC-32C there's now just the crc32c() function which
> delegates to the "best" CRC-32C implementation, with no direct access to
> the generic implementation of CRC-32C.
> 
> crc_kunit then just tests and benchmarks crc32c().  To check how the
> performance of crc32c() changes when its implementation changes (whether
> the change is the addition of an arch-optimized implementation or a
> change in an existing arch-optimized implementation), the developer just
> needs to run crc_kunit with two kernels, before and after.
> 
> I suggest just doing that.  In that case there would be no need to
> export the generic implementations of these functions.
> 
> (Also note that *if* the generic functions are exported, they probably
> should be exported only when the KUnit test is enabled.  There's no need
> to include them in the kernel image when the test isn't enabled.)
> 
> - Eric

Hi Eric, Andy,

Thanks for the insights. I agree with Eric's point on keeping the internal
implementations encapsulated. It's a cleaner design for the long term.
In v3, I will drop the __generic_* exports and simplify the patchset
to benchmark only the standard functions.

To address Andy's concern regarding performance, I will provide a "Before
vs. After" comparison in the v3 cover letter. This should demonstrate
the speedup while keeping the core kernel code tidy. I'll also refine the
benchmark logic to ensure more realistic results as discussed.

This seems to be the most robust way to validate the optimizations without
adding unnecessary exports.

-- 
With Best Regards,
Feng Jiang


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 01/14] lib/string: extract generic strlen() into __generic_strlen()
  2026-01-14  0:01   ` Eric Biggers
  2026-01-14  1:41     ` Feng Jiang
@ 2026-01-14  7:07     ` Andy Shevchenko
  2026-01-14 10:10     ` David Laight
  2 siblings, 0 replies; 36+ messages in thread
From: Andy Shevchenko @ 2026-01-14  7:07 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Feng Jiang, pjw, palmer, aou, alex, kees, andy, akpm,
	martin.petersen, ardb, ajones, conor.dooley, samuel.holland,
	linus.walleij, nathan, linux-riscv, linux-kernel, linux-hardening

On Tue, Jan 13, 2026 at 04:01:51PM -0800, Eric Biggers wrote:
> On Tue, Jan 13, 2026 at 04:27:35PM +0800, Feng Jiang wrote:
> > To support performance benchmarking in KUnit tests, extract the
> > generic C implementation of strlen() into a standalone function
> > __generic_strlen(). This allows tests to compare architecture-optimized
> > versions against the generic baseline without duplicating code.

...

> A similar problem exists with the architecture-optimized CRC and crypto
> functions.  Historically, these subsystems exported both generic and
> architecture-optimized functions.
> 
> We've actually been moving away from that design to simplify things.
> For example, for CRC-32C there's now just the crc32c() function which
> delegates to the "best" CRC-32C implementation, with no direct access to
> the generic implementation of CRC-32C.
> 
> crc_kunit then just tests and benchmarks crc32c().  To check how the
> performance of crc32c() changes when its implementation changes (whether
> the change is the addition of an arch-optimized implementation or a
> change in an existing arch-optimized implementation), the developer just
> needs to run crc_kunit with two kernels, before and after.
> 
> I suggest just doing that.  In that case there would be no need to
> export the generic implementations of these functions.

This also would work for me! Whatever, folks, you find the best from the
readability and maintenance point of view.

> (Also note that *if* the generic functions are exported, they probably
> should be exported only when the KUnit test is enabled.  There's no need
> to include them in the kernel image when the test isn't enabled.)

True.

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 01/14] lib/string: extract generic strlen() into __generic_strlen()
  2026-01-14  0:01   ` Eric Biggers
  2026-01-14  1:41     ` Feng Jiang
  2026-01-14  7:07     ` Andy Shevchenko
@ 2026-01-14 10:10     ` David Laight
  2026-01-15  6:50       ` Feng Jiang
  2 siblings, 1 reply; 36+ messages in thread
From: David Laight @ 2026-01-14 10:10 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Feng Jiang, pjw, palmer, aou, alex, kees, andy, akpm,
	martin.petersen, ardb, ajones, conor.dooley, samuel.holland,
	linus.walleij, nathan, linux-riscv, linux-kernel, linux-hardening

On Tue, 13 Jan 2026 16:01:51 -0800
Eric Biggers <ebiggers@kernel.org> wrote:

..
> A similar problem exists with the architecture-optimized CRC and crypto
> functions.  Historically, these subsystems exported both generic and
> architecture-optimized functions.
> 
> We've actually been moving away from that design to simplify things.
> For example, for CRC-32C there's now just the crc32c() function which
> delegates to the "best" CRC-32C implementation, with no direct access to
> the generic implementation of CRC-32C.
> 
> crc_kunit then just tests and benchmarks crc32c().  To check how the
> performance of crc32c() changes when its implementation changes (whether
> the change is the addition of an arch-optimized implementation or a
> change in an existing arch-optimized implementation), the developer just
> needs to run crc_kunit with two kernels, before and after.

For the mul_div tests I arranged that the test code could #include the
source for the generic implementation so it could run that as well as
the version compiled into the main kernel.

This involved wrapping the function in:
#if !defined(function) || defined(test_function)
type function(args)
...
}
#if !defined(function)
EXPORT_SYMBOL(function)
#endif
#endif

So the test code can use:
#define function generic_function
#define test_function
#include "function.c"

to get a private copy of the generic code.

	David

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 01/14] lib/string: extract generic strlen() into __generic_strlen()
  2026-01-14 10:10     ` David Laight
@ 2026-01-15  6:50       ` Feng Jiang
  2026-01-15  6:55         ` Andy Shevchenko
  0 siblings, 1 reply; 36+ messages in thread
From: Feng Jiang @ 2026-01-15  6:50 UTC (permalink / raw)
  To: David Laight, Eric Biggers
  Cc: pjw, palmer, aou, alex, kees, andy, akpm, martin.petersen, ardb,
	ajones, conor.dooley, samuel.holland, linus.walleij, nathan,
	linux-riscv, linux-kernel, linux-hardening

On 2026/1/14 18:10, David Laight wrote:
> On Tue, 13 Jan 2026 16:01:51 -0800
> Eric Biggers <ebiggers@kernel.org> wrote:
> 
> ..
>> A similar problem exists with the architecture-optimized CRC and crypto
>> functions.  Historically, these subsystems exported both generic and
>> architecture-optimized functions.
>>
>> We've actually been moving away from that design to simplify things.
>> For example, for CRC-32C there's now just the crc32c() function which
>> delegates to the "best" CRC-32C implementation, with no direct access to
>> the generic implementation of CRC-32C.
>>
>> crc_kunit then just tests and benchmarks crc32c().  To check how the
>> performance of crc32c() changes when its implementation changes (whether
>> the change is the addition of an arch-optimized implementation or a
>> change in an existing arch-optimized implementation), the developer just
>> needs to run crc_kunit with two kernels, before and after.
> 
> For the mul_div tests I arranged that the test code could #include the
> source for the generic implementation so it could run that as well as
> the version compiled into the main kernel.
> 
> This involved wrapping the function in:
> #if !defined(function) || defined(test_function)
> type function(args)
> ...
> }
> #if !defined(function)
> EXPORT_SYMBOL(function)
> #endif
> #endif
> 
> So the test code can use:
> #define function generic_function
> #define test_function
> #include "function.c"
> 
> to get a private copy of the generic code.

Thank you for the suggestion! That technique is very clever and interesting—
I've definitely learned something new and will keep it in mind for the future.

However, since lib/string.c is such a foundational and low-level library, I'm
hesitant to add macro wrappers or conditional blocks for KUnit. Given its
importance, I feel that increasing its complexity for side-by-side testing
isn't quite worth it. I'd prefer to keep the core code clean and follow Eric's
minimalist approach of benchmarking across different kernel configurations.

I really appreciate the guidance!

-- 
With Best Regards,
Feng Jiang


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 01/14] lib/string: extract generic strlen() into __generic_strlen()
  2026-01-15  6:50       ` Feng Jiang
@ 2026-01-15  6:55         ` Andy Shevchenko
  0 siblings, 0 replies; 36+ messages in thread
From: Andy Shevchenko @ 2026-01-15  6:55 UTC (permalink / raw)
  To: Feng Jiang
  Cc: David Laight, Eric Biggers, pjw, palmer, aou, alex, kees, andy,
	akpm, martin.petersen, ardb, ajones, conor.dooley, samuel.holland,
	linus.walleij, nathan, linux-riscv, linux-kernel, linux-hardening

On Thu, Jan 15, 2026 at 8:51 AM Feng Jiang <jiangfeng@kylinos.cn> wrote:
> On 2026/1/14 18:10, David Laight wrote:
> > On Tue, 13 Jan 2026 16:01:51 -0800
> > Eric Biggers <ebiggers@kernel.org> wrote:

...

> >> A similar problem exists with the architecture-optimized CRC and crypto
> >> functions.  Historically, these subsystems exported both generic and
> >> architecture-optimized functions.
> >>
> >> We've actually been moving away from that design to simplify things.
> >> For example, for CRC-32C there's now just the crc32c() function which
> >> delegates to the "best" CRC-32C implementation, with no direct access to
> >> the generic implementation of CRC-32C.
> >>
> >> crc_kunit then just tests and benchmarks crc32c().  To check how the
> >> performance of crc32c() changes when its implementation changes (whether
> >> the change is the addition of an arch-optimized implementation or a
> >> change in an existing arch-optimized implementation), the developer just
> >> needs to run crc_kunit with two kernels, before and after.
> >
> > For the mul_div tests I arranged that the test code could #include the
> > source for the generic implementation so it could run that as well as
> > the version compiled into the main kernel.
> >
> > This involved wrapping the function in:
> > #if !defined(function) || defined(test_function)
> > type function(args)
> > ...
> > }
> > #if !defined(function)
> > EXPORT_SYMBOL(function)
> > #endif
> > #endif
> >
> > So the test code can use:
> > #define function generic_function
> > #define test_function
> > #include "function.c"
> >
> > to get a private copy of the generic code.
>
> Thank you for the suggestion! That technique is very clever and interesting—
> I've definitely learned something new and will keep it in mind for the future.
>
> However, since lib/string.c is such a foundational and low-level library, I'm
> hesitant to add macro wrappers or conditional blocks for KUnit. Given its
> importance, I feel that increasing its complexity for side-by-side testing
> isn't quite worth it. I'd prefer to keep the core code clean and follow Eric's
> minimalist approach of benchmarking across different kernel configurations.
>
> I really appreciate the guidance!

I second you, we currently may stick with what Eric proposed and
consider other approaches in the future where appropriate.

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 02/14] lib/string: extract generic strnlen() into __generic_strnlen()
  2026-01-13  8:27 [PATCH v2 00/14] riscv: optimize string functions and add kunit tests Feng Jiang
  2026-01-13  8:27 ` [PATCH v2 01/14] lib/string: extract generic strlen() into __generic_strlen() Feng Jiang
@ 2026-01-13  8:27 ` Feng Jiang
  2026-01-13  8:27 ` [PATCH v2 03/14] lib/string: extract generic strchr() into __generic_strchr() Feng Jiang
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 36+ messages in thread
From: Feng Jiang @ 2026-01-13  8:27 UTC (permalink / raw)
  To: pjw, palmer, aou, alex, kees, andy, akpm, jiangfeng, ebiggers,
	martin.petersen, ardb, ajones, conor.dooley, samuel.holland,
	linus.walleij, nathan
  Cc: linux-riscv, linux-kernel, linux-hardening

To support performance benchmarking in KUnit tests, extract the
generic C implementation of strnlen() into a standalone function
__generic_strnlen(). This allows tests to compare architecture-optimized
versions against the generic baseline without duplicating code.

Suggested-by: Andy Shevchenko <andy@kernel.org>
Signed-off-by: Feng Jiang <jiangfeng@kylinos.cn>
---
 include/linux/string.h |  1 +
 lib/string.c           | 10 ++++++++--
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/include/linux/string.h b/include/linux/string.h
index 961645633b4d..04b5fef7a4ec 100644
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@ -201,6 +201,7 @@ extern __kernel_size_t __generic_strlen(const char *);
 #ifndef __HAVE_ARCH_STRLEN
 extern __kernel_size_t strlen(const char *);
 #endif
+extern __kernel_size_t __generic_strnlen(const char *, __kernel_size_t);
 #ifndef __HAVE_ARCH_STRNLEN
 extern __kernel_size_t strnlen(const char *,__kernel_size_t);
 #endif
diff --git a/lib/string.c b/lib/string.c
index 047ecb38e09b..9dfe7177e290 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -430,8 +430,7 @@ size_t strlen(const char *s)
 EXPORT_SYMBOL(strlen);
 #endif
 
-#ifndef __HAVE_ARCH_STRNLEN
-size_t strnlen(const char *s, size_t count)
+size_t __generic_strnlen(const char *s, size_t count)
 {
 	const char *sc;
 
@@ -439,6 +438,13 @@ size_t strnlen(const char *s, size_t count)
 		/* nothing */;
 	return sc - s;
 }
+EXPORT_SYMBOL(__generic_strnlen);
+
+#ifndef __HAVE_ARCH_STRNLEN
+size_t strnlen(const char *s, size_t count)
+{
+	return __generic_strnlen(s, count);
+}
 EXPORT_SYMBOL(strnlen);
 #endif
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 03/14] lib/string: extract generic strchr() into __generic_strchr()
  2026-01-13  8:27 [PATCH v2 00/14] riscv: optimize string functions and add kunit tests Feng Jiang
  2026-01-13  8:27 ` [PATCH v2 01/14] lib/string: extract generic strlen() into __generic_strlen() Feng Jiang
  2026-01-13  8:27 ` [PATCH v2 02/14] lib/string: extract generic strnlen() into __generic_strnlen() Feng Jiang
@ 2026-01-13  8:27 ` Feng Jiang
  2026-01-13  8:27 ` [PATCH v2 04/14] lib/string: extract generic strrchr() into __generic_strrchr() Feng Jiang
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 36+ messages in thread
From: Feng Jiang @ 2026-01-13  8:27 UTC (permalink / raw)
  To: pjw, palmer, aou, alex, kees, andy, akpm, jiangfeng, ebiggers,
	martin.petersen, ardb, ajones, conor.dooley, samuel.holland,
	linus.walleij, nathan
  Cc: linux-riscv, linux-kernel, linux-hardening

To support performance benchmarking in KUnit tests, extract the
generic C implementation of strchr() into a standalone function
__generic_strchr(). This allows tests to compare architecture-optimized
versions against the generic baseline without duplicating code.

Suggested-by: Andy Shevchenko <andy@kernel.org>
Signed-off-by: Feng Jiang <jiangfeng@kylinos.cn>
---
 include/linux/string.h |  1 +
 lib/string.c           | 14 ++++++++++----
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/include/linux/string.h b/include/linux/string.h
index 04b5fef7a4ec..57f8bf543891 100644
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@ -169,6 +169,7 @@ extern int strcasecmp(const char *s1, const char *s2);
 #ifndef __HAVE_ARCH_STRNCASECMP
 extern int strncasecmp(const char *s1, const char *s2, size_t n);
 #endif
+extern char *__generic_strchr(const char *, int);
 #ifndef __HAVE_ARCH_STRCHR
 extern char * strchr(const char *,int);
 #endif
diff --git a/lib/string.c b/lib/string.c
index 9dfe7177e290..8ad9b73ffe4e 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -317,6 +317,15 @@ int strncmp(const char *cs, const char *ct, size_t count)
 EXPORT_SYMBOL(strncmp);
 #endif
 
+char *__generic_strchr(const char *s, int c)
+{
+	for (; *s != (char)c; ++s)
+		if (*s == '\0')
+			return NULL;
+	return (char *)s;
+}
+EXPORT_SYMBOL(__generic_strchr);
+
 #ifndef __HAVE_ARCH_STRCHR
 /**
  * strchr - Find the first occurrence of a character in a string
@@ -328,10 +337,7 @@ EXPORT_SYMBOL(strncmp);
  */
 char *strchr(const char *s, int c)
 {
-	for (; *s != (char)c; ++s)
-		if (*s == '\0')
-			return NULL;
-	return (char *)s;
+	return __generic_strchr(s, c);
 }
 EXPORT_SYMBOL(strchr);
 #endif
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 04/14] lib/string: extract generic strrchr() into __generic_strrchr()
  2026-01-13  8:27 [PATCH v2 00/14] riscv: optimize string functions and add kunit tests Feng Jiang
                   ` (2 preceding siblings ...)
  2026-01-13  8:27 ` [PATCH v2 03/14] lib/string: extract generic strchr() into __generic_strchr() Feng Jiang
@ 2026-01-13  8:27 ` Feng Jiang
  2026-01-13  8:27 ` [PATCH v2 05/14] lib/string_kunit: add correctness test for strlen Feng Jiang
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 36+ messages in thread
From: Feng Jiang @ 2026-01-13  8:27 UTC (permalink / raw)
  To: pjw, palmer, aou, alex, kees, andy, akpm, jiangfeng, ebiggers,
	martin.petersen, ardb, ajones, conor.dooley, samuel.holland,
	linus.walleij, nathan
  Cc: linux-riscv, linux-kernel, linux-hardening

To support performance benchmarking in KUnit tests, extract the
generic C implementation of strrchr() into a standalone function
__generic_strrchr(). This allows tests to compare architecture-optimized
versions against the generic baseline without duplicating code.

Suggested-by: Andy Shevchenko <andy@kernel.org>
Signed-off-by: Feng Jiang <jiangfeng@kylinos.cn>
---
 include/linux/string.h |  1 +
 lib/string.c           | 19 +++++++++++++------
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/include/linux/string.h b/include/linux/string.h
index 57f8bf543891..2ede3ff0865a 100644
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@ -180,6 +180,7 @@ extern char * strnchrnul(const char *, size_t, int);
 #ifndef __HAVE_ARCH_STRNCHR
 extern char * strnchr(const char *, size_t, int);
 #endif
+extern char *__generic_strrchr(const char *, int);
 #ifndef __HAVE_ARCH_STRRCHR
 extern char * strrchr(const char *,int);
 #endif
diff --git a/lib/string.c b/lib/string.c
index 8ad9b73ffe4e..4405a042eee7 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -377,6 +377,18 @@ char *strnchrnul(const char *s, size_t count, int c)
 	return (char *)s;
 }
 
+char *__generic_strrchr(const char *s, int c)
+{
+	const char *last = NULL;
+
+	do {
+		if (*s == (char)c)
+			last = s;
+	} while (*s++);
+	return (char *)last;
+}
+EXPORT_SYMBOL(__generic_strrchr);
+
 #ifndef __HAVE_ARCH_STRRCHR
 /**
  * strrchr - Find the last occurrence of a character in a string
@@ -385,12 +397,7 @@ char *strnchrnul(const char *s, size_t count, int c)
  */
 char *strrchr(const char *s, int c)
 {
-	const char *last = NULL;
-	do {
-		if (*s == (char)c)
-			last = s;
-	} while (*s++);
-	return (char *)last;
+	return __generic_strrchr(s, c);
 }
 EXPORT_SYMBOL(strrchr);
 #endif
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 05/14] lib/string_kunit: add correctness test for strlen
  2026-01-13  8:27 [PATCH v2 00/14] riscv: optimize string functions and add kunit tests Feng Jiang
                   ` (3 preceding siblings ...)
  2026-01-13  8:27 ` [PATCH v2 04/14] lib/string: extract generic strrchr() into __generic_strrchr() Feng Jiang
@ 2026-01-13  8:27 ` Feng Jiang
  2026-01-13  8:27 ` [PATCH v2 06/14] lib/string_kunit: add correctness test for strnlen Feng Jiang
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 36+ messages in thread
From: Feng Jiang @ 2026-01-13  8:27 UTC (permalink / raw)
  To: pjw, palmer, aou, alex, kees, andy, akpm, jiangfeng, ebiggers,
	martin.petersen, ardb, ajones, conor.dooley, samuel.holland,
	linus.walleij, nathan
  Cc: linux-riscv, linux-kernel, linux-hardening

Add a KUnit test for strlen() to verify correctness across
different string lengths and memory alignments.

Signed-off-by: Feng Jiang <jiangfeng@kylinos.cn>
---
 lib/tests/string_kunit.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/lib/tests/string_kunit.c b/lib/tests/string_kunit.c
index f9a8e557ba77..88da97e50c8e 100644
--- a/lib/tests/string_kunit.c
+++ b/lib/tests/string_kunit.c
@@ -17,6 +17,9 @@
 #define STRCMP_TEST_EXPECT_LOWER(test, fn, ...) KUNIT_EXPECT_LT(test, fn(__VA_ARGS__), 0)
 #define STRCMP_TEST_EXPECT_GREATER(test, fn, ...) KUNIT_EXPECT_GT(test, fn(__VA_ARGS__), 0)
 
+#define STRING_TEST_MAX_LEN	128
+#define STRING_TEST_MAX_OFFSET	16
+
 static void string_test_memset16(struct kunit *test)
 {
 	unsigned i, j, k;
@@ -104,6 +107,28 @@ static void string_test_memset64(struct kunit *test)
 	}
 }
 
+static void string_test_strlen(struct kunit *test)
+{
+	char *s;
+	size_t len, offset;
+	const size_t buf_size = STRING_TEST_MAX_LEN + STRING_TEST_MAX_OFFSET + 1;
+
+	s = kunit_kzalloc(test, buf_size, GFP_KERNEL);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, s);
+
+	memset(s, 'A', buf_size);
+	s[buf_size - 1] = '\0';
+
+	for (offset = 0; offset < STRING_TEST_MAX_OFFSET; offset++) {
+		for (len = 0; len <= STRING_TEST_MAX_LEN; len++) {
+			s[offset + len] = '\0';
+			KUNIT_EXPECT_EQ_MSG(test, strlen(s + offset), len,
+				"offset:%zu len:%zu", offset, len);
+			s[offset + len] = 'A';
+		}
+	}
+}
+
 static void string_test_strchr(struct kunit *test)
 {
 	const char *test_string = "abcdefghijkl";
@@ -618,6 +643,7 @@ static struct kunit_case string_test_cases[] = {
 	KUNIT_CASE(string_test_memset16),
 	KUNIT_CASE(string_test_memset32),
 	KUNIT_CASE(string_test_memset64),
+	KUNIT_CASE(string_test_strlen),
 	KUNIT_CASE(string_test_strchr),
 	KUNIT_CASE(string_test_strnchr),
 	KUNIT_CASE(string_test_strspn),
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 06/14] lib/string_kunit: add correctness test for strnlen
  2026-01-13  8:27 [PATCH v2 00/14] riscv: optimize string functions and add kunit tests Feng Jiang
                   ` (4 preceding siblings ...)
  2026-01-13  8:27 ` [PATCH v2 05/14] lib/string_kunit: add correctness test for strlen Feng Jiang
@ 2026-01-13  8:27 ` Feng Jiang
  2026-01-13  8:41   ` Andy Shevchenko
  2026-01-13  8:27 ` [PATCH v2 07/14] lib/string_kunit: add correctness test for strrchr() Feng Jiang
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 36+ messages in thread
From: Feng Jiang @ 2026-01-13  8:27 UTC (permalink / raw)
  To: pjw, palmer, aou, alex, kees, andy, akpm, jiangfeng, ebiggers,
	martin.petersen, ardb, ajones, conor.dooley, samuel.holland,
	linus.walleij, nathan
  Cc: linux-riscv, linux-kernel, linux-hardening

Add a KUnit test for strnlen() to verify correctness across
different string lengths and memory alignments.

Signed-off-by: Feng Jiang <jiangfeng@kylinos.cn>
---
 lib/tests/string_kunit.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/lib/tests/string_kunit.c b/lib/tests/string_kunit.c
index 88da97e50c8e..89fe52378442 100644
--- a/lib/tests/string_kunit.c
+++ b/lib/tests/string_kunit.c
@@ -129,6 +129,35 @@ static void string_test_strlen(struct kunit *test)
 	}
 }
 
+static void string_test_strnlen(struct kunit *test)
+{
+	char *s;
+	size_t len, offset;
+	const size_t buf_size = STRING_TEST_MAX_LEN + STRING_TEST_MAX_OFFSET + 1;
+
+	s = kunit_kzalloc(test, buf_size, GFP_KERNEL);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, s);
+
+	memset(s, 'A', buf_size);
+	s[buf_size - 1] = '\0';
+
+	for (offset = 0; offset < STRING_TEST_MAX_OFFSET; offset++) {
+		for (len = 0; len <= STRING_TEST_MAX_LEN; len++) {
+			s[offset + len] = '\0';
+
+			if (len > 0)
+				KUNIT_EXPECT_EQ(test, strnlen(s + offset, len - 1), len - 1);
+
+			KUNIT_EXPECT_EQ(test, strnlen(s + offset, len), len);
+
+			KUNIT_EXPECT_EQ(test, strnlen(s + offset, len + 1), len);
+			KUNIT_EXPECT_EQ(test, strnlen(s + offset, len + 10), len);
+
+			s[offset + len] = 'A';
+		}
+	}
+}
+
 static void string_test_strchr(struct kunit *test)
 {
 	const char *test_string = "abcdefghijkl";
@@ -644,6 +673,7 @@ static struct kunit_case string_test_cases[] = {
 	KUNIT_CASE(string_test_memset32),
 	KUNIT_CASE(string_test_memset64),
 	KUNIT_CASE(string_test_strlen),
+	KUNIT_CASE(string_test_strnlen),
 	KUNIT_CASE(string_test_strchr),
 	KUNIT_CASE(string_test_strnchr),
 	KUNIT_CASE(string_test_strspn),
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 06/14] lib/string_kunit: add correctness test for strnlen
  2026-01-13  8:27 ` [PATCH v2 06/14] lib/string_kunit: add correctness test for strnlen Feng Jiang
@ 2026-01-13  8:41   ` Andy Shevchenko
  0 siblings, 0 replies; 36+ messages in thread
From: Andy Shevchenko @ 2026-01-13  8:41 UTC (permalink / raw)
  To: Feng Jiang
  Cc: pjw, palmer, aou, alex, kees, andy, akpm, ebiggers,
	martin.petersen, ardb, ajones, conor.dooley, samuel.holland,
	linus.walleij, nathan, linux-riscv, linux-kernel, linux-hardening

On Tue, Jan 13, 2026 at 04:27:40PM +0800, Feng Jiang wrote:
> Add a KUnit test for strnlen() to verify correctness across
> different string lengths and memory alignments.

...

> +	for (offset = 0; offset < STRING_TEST_MAX_OFFSET; offset++) {
> +		for (len = 0; len <= STRING_TEST_MAX_LEN; len++) {
> +			s[offset + len] = '\0';

This should be one or two characters further so you check should also check for
non-NUL terminated strings...

> +			if (len > 0)
> +				KUNIT_EXPECT_EQ(test, strnlen(s + offset, len - 1), len - 1);

...or make more tests, with len - 2 at least.

> +			KUNIT_EXPECT_EQ(test, strnlen(s + offset, len), len);

> +			KUNIT_EXPECT_EQ(test, strnlen(s + offset, len + 1), len);

I would also add + 2 test case.

> +			KUNIT_EXPECT_EQ(test, strnlen(s + offset, len + 10), len);

> +			s[offset + len] = 'A';
> +		}
> +	}
> +}

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 07/14] lib/string_kunit: add correctness test for strrchr()
  2026-01-13  8:27 [PATCH v2 00/14] riscv: optimize string functions and add kunit tests Feng Jiang
                   ` (5 preceding siblings ...)
  2026-01-13  8:27 ` [PATCH v2 06/14] lib/string_kunit: add correctness test for strnlen Feng Jiang
@ 2026-01-13  8:27 ` Feng Jiang
  2026-01-13  8:27 ` [PATCH v2 08/14] lib/string_kunit: add performance benchmark for strlen() Feng Jiang
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 36+ messages in thread
From: Feng Jiang @ 2026-01-13  8:27 UTC (permalink / raw)
  To: pjw, palmer, aou, alex, kees, andy, akpm, jiangfeng, ebiggers,
	martin.petersen, ardb, ajones, conor.dooley, samuel.holland,
	linus.walleij, nathan
  Cc: linux-riscv, linux-kernel, linux-hardening

Introduce a KUnit test to verify strrchr() across various memory
alignments and character positions. This ensures the implementation
correctly identifies the last occurrence of a character.

Signed-off-by: Feng Jiang <jiangfeng@kylinos.cn>
---
 lib/tests/string_kunit.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/lib/tests/string_kunit.c b/lib/tests/string_kunit.c
index 89fe52378442..8eb095404b95 100644
--- a/lib/tests/string_kunit.c
+++ b/lib/tests/string_kunit.c
@@ -181,6 +181,35 @@ static void string_test_strchr(struct kunit *test)
 	KUNIT_ASSERT_NULL(test, result);
 }
 
+static void string_test_strrchr(struct kunit *test)
+{
+	char *buf;
+	size_t offset, len;
+	const size_t buf_size = STRING_TEST_MAX_LEN + STRING_TEST_MAX_OFFSET + 1;
+
+	buf = kunit_kzalloc(test, buf_size, GFP_KERNEL);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, buf);
+
+	memset(buf, 'A', buf_size);
+	buf[buf_size - 1] = '\0';
+
+	for (offset = 0; offset < STRING_TEST_MAX_OFFSET; offset++) {
+		for (len = 0; len <= STRING_TEST_MAX_LEN; len++) {
+			buf[offset + len] = '\0';
+
+			KUNIT_EXPECT_PTR_EQ(test, strrchr(buf + offset, 'Z'), NULL);
+
+			if (len > 0)
+				KUNIT_EXPECT_PTR_EQ(test, strrchr(buf + offset, 'A'),
+						buf + offset + len - 1);
+			else
+				KUNIT_EXPECT_PTR_EQ(test, strrchr(buf + offset, 'A'), NULL);
+
+			buf[offset + len] = 'A';
+		}
+	}
+}
+
 static void string_test_strnchr(struct kunit *test)
 {
 	const char *test_string = "abcdefghijkl";
@@ -676,6 +705,7 @@ static struct kunit_case string_test_cases[] = {
 	KUNIT_CASE(string_test_strnlen),
 	KUNIT_CASE(string_test_strchr),
 	KUNIT_CASE(string_test_strnchr),
+	KUNIT_CASE(string_test_strrchr),
 	KUNIT_CASE(string_test_strspn),
 	KUNIT_CASE(string_test_strcmp),
 	KUNIT_CASE(string_test_strcmp_long_strings),
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 08/14] lib/string_kunit: add performance benchmark for strlen()
  2026-01-13  8:27 [PATCH v2 00/14] riscv: optimize string functions and add kunit tests Feng Jiang
                   ` (6 preceding siblings ...)
  2026-01-13  8:27 ` [PATCH v2 07/14] lib/string_kunit: add correctness test for strrchr() Feng Jiang
@ 2026-01-13  8:27 ` Feng Jiang
  2026-01-13  8:46   ` Andy Shevchenko
  2026-01-18 11:11   ` kernel test robot
  2026-01-13  8:27 ` [PATCH v2 09/14] lib/string_kunit: add performance benchmark for strnlen() Feng Jiang
                   ` (7 subsequent siblings)
  15 siblings, 2 replies; 36+ messages in thread
From: Feng Jiang @ 2026-01-13  8:27 UTC (permalink / raw)
  To: pjw, palmer, aou, alex, kees, andy, akpm, jiangfeng, ebiggers,
	martin.petersen, ardb, ajones, conor.dooley, samuel.holland,
	linus.walleij, nathan
  Cc: linux-riscv, linux-kernel, linux-hardening

Introduce a benchmark to compare the architecture-optimized strlen()
implementation against the generic C version (__generic_strlen).

The benchmark uses a table-driven approach to evaluate performance
across different string lengths (short, medium, and long). It employs
ktime_get() for timing and get_random_bytes() followed by null-byte
filtering to generate test data that prevents early termination.

This helps in quantifying the performance gains of architecture-specific
optimizations on various platforms.

Suggested-by: Andy Shevchenko <andy@kernel.org>
Signed-off-by: Feng Jiang <jiangfeng@kylinos.cn>
---
 lib/tests/string_kunit.c | 117 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 117 insertions(+)

diff --git a/lib/tests/string_kunit.c b/lib/tests/string_kunit.c
index 8eb095404b95..2266954ae5e0 100644
--- a/lib/tests/string_kunit.c
+++ b/lib/tests/string_kunit.c
@@ -20,6 +20,77 @@
 #define STRING_TEST_MAX_LEN	128
 #define STRING_TEST_MAX_OFFSET	16
 
+#if defined(__HAVE_ARCH_STRLEN)
+#define STRING_BENCH_ENABLED
+#endif
+
+#ifdef STRING_BENCH_ENABLED
+/* Configuration for string benchmark scenarios */
+struct string_bench_case {
+	const char *name;
+	size_t len;
+	unsigned int iterations;
+};
+
+static const struct string_bench_case bench_cases[] = {
+	{"short", 8, 100000},
+	{"medium", 64, 100000},
+	{"long", 2048, 10000},
+};
+
+/**
+ * get_max_bench_len() - Get the maximum length from benchmark cases
+ * @cases: array of test cases
+ * @count: number of cases
+ */
+static size_t get_max_bench_len(const struct string_bench_case *cases, size_t count)
+{
+	size_t i, max_len = 0;
+
+	for (i = 0; i < count; i++) {
+		if (cases[i].len > max_len)
+			max_len = cases[i].len;
+	}
+
+	return max_len;
+}
+
+/**
+ * get_random_nonzero_bytes() - Fill buffer with random non-null bytes
+ * @buf: buffer to fill
+ * @len: number of bytes to fill
+ */
+static void get_random_nonzero_bytes(void *buf, size_t len)
+{
+	u8 *s = (u8 *)buf;
+
+	get_random_bytes(buf, len);
+
+	/* Replace null bytes to avoid early string termination */
+	for (size_t i = 0; i < len; i++) {
+		if (s[i] == '\0')
+			s[i] = 0x01;
+	}
+}
+
+static void string_bench_report(struct kunit *test, const char *func,
+		const struct string_bench_case *bc,
+		u64 time_arch, u64 time_generic)
+{
+	u64 ratio_int, ratio_frac;
+
+	/* Calculate speedup ratio with 2 decimal places. */
+	ratio_int = div64_u64(time_generic, time_arch);
+	ratio_frac = div64_u64((time_generic % time_arch) * 100, time_arch);
+
+	kunit_info(test, "%s performance (%s, len: %zu, iters: %u):\n",
+		func, bc->name, bc->len, bc->iterations);
+	kunit_info(test, "  arch-optimized: %llu ns\n", time_arch);
+	kunit_info(test, "  generic C:      %llu ns\n", time_generic);
+	kunit_info(test, "  speedup:        %llu.%02llux\n", ratio_int, ratio_frac);
+}
+#endif /* STRING_BENCH_ENABLED */
+
 static void string_test_memset16(struct kunit *test)
 {
 	unsigned i, j, k;
@@ -129,6 +200,49 @@ static void string_test_strlen(struct kunit *test)
 	}
 }
 
+#ifdef __HAVE_ARCH_STRLEN
+static void string_test_strlen_bench(struct kunit *test)
+{
+	char *buf;
+	size_t buf_len, iters;
+	ktime_t start, end;
+	u64 time_arch, time_generic;
+
+	buf_len = get_max_bench_len(bench_cases, ARRAY_SIZE(bench_cases)) + 1;
+
+	buf = kunit_kzalloc(test, buf_len, GFP_KERNEL);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, buf);
+
+	for (size_t i = 0; i < ARRAY_SIZE(bench_cases); i++) {
+		get_random_nonzero_bytes(buf, bench_cases[i].len);
+		buf[bench_cases[i].len] = '\0';
+
+		iters = bench_cases[i].iterations;
+
+		/* 1. Benchmark the architecture-optimized version */
+		start = ktime_get();
+		for (unsigned int j = 0; j < iters; j++) {
+			OPTIMIZER_HIDE_VAR(buf);
+			(void)strlen(buf);
+		}
+		end = ktime_get();
+		time_arch = ktime_to_ns(ktime_sub(end, start));
+
+		/* 2. Benchmark the generic C version */
+		start = ktime_get();
+		for (unsigned int j = 0; j < iters; j++) {
+			OPTIMIZER_HIDE_VAR(buf);
+			(void)__generic_strlen(buf);
+		}
+		end = ktime_get();
+		time_generic = ktime_to_ns(ktime_sub(end, start));
+
+		string_bench_report(test, "strlen", &bench_cases[i],
+				time_arch, time_generic);
+	}
+}
+#endif
+
 static void string_test_strnlen(struct kunit *test)
 {
 	char *s;
@@ -702,6 +816,9 @@ static struct kunit_case string_test_cases[] = {
 	KUNIT_CASE(string_test_memset32),
 	KUNIT_CASE(string_test_memset64),
 	KUNIT_CASE(string_test_strlen),
+#ifdef __HAVE_ARCH_STRLEN
+	KUNIT_CASE(string_test_strlen_bench),
+#endif
 	KUNIT_CASE(string_test_strnlen),
 	KUNIT_CASE(string_test_strchr),
 	KUNIT_CASE(string_test_strnchr),
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 08/14] lib/string_kunit: add performance benchmark for strlen()
  2026-01-13  8:27 ` [PATCH v2 08/14] lib/string_kunit: add performance benchmark for strlen() Feng Jiang
@ 2026-01-13  8:46   ` Andy Shevchenko
  2026-01-14  6:14     ` Feng Jiang
  2026-01-18 11:11   ` kernel test robot
  1 sibling, 1 reply; 36+ messages in thread
From: Andy Shevchenko @ 2026-01-13  8:46 UTC (permalink / raw)
  To: Feng Jiang
  Cc: pjw, palmer, aou, alex, kees, andy, akpm, ebiggers,
	martin.petersen, ardb, ajones, conor.dooley, samuel.holland,
	linus.walleij, nathan, linux-riscv, linux-kernel, linux-hardening

On Tue, Jan 13, 2026 at 04:27:42PM +0800, Feng Jiang wrote:
> Introduce a benchmark to compare the architecture-optimized strlen()
> implementation against the generic C version (__generic_strlen).
> 
> The benchmark uses a table-driven approach to evaluate performance
> across different string lengths (short, medium, and long). It employs
> ktime_get() for timing and get_random_bytes() followed by null-byte
> filtering to generate test data that prevents early termination.
> 
> This helps in quantifying the performance gains of architecture-specific
> optimizations on various platforms.

...

> +static void string_test_strlen_bench(struct kunit *test)
> +{
> +	char *buf;
> +	size_t buf_len, iters;
> +	ktime_t start, end;
> +	u64 time_arch, time_generic;
> +
> +	buf_len = get_max_bench_len(bench_cases, ARRAY_SIZE(bench_cases)) + 1;
> +
> +	buf = kunit_kzalloc(test, buf_len, GFP_KERNEL);
> +	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, buf);
> +
> +	for (size_t i = 0; i < ARRAY_SIZE(bench_cases); i++) {
> +		get_random_nonzero_bytes(buf, bench_cases[i].len);
> +		buf[bench_cases[i].len] = '\0';
> +
> +		iters = bench_cases[i].iterations;
> +
> +		/* 1. Benchmark the architecture-optimized version */
> +		start = ktime_get();
> +		for (unsigned int j = 0; j < iters; j++) {
> +			OPTIMIZER_HIDE_VAR(buf);
> +			(void)strlen(buf);

First Q: Are you sure the compiler doesn't replace this with __builtin_strlen() ?

> +		}
> +		end = ktime_get();
> +		time_arch = ktime_to_ns(ktime_sub(end, start));
> +
> +		/* 2. Benchmark the generic C version */
> +		start = ktime_get();
> +		for (unsigned int j = 0; j < iters; j++) {
> +			OPTIMIZER_HIDE_VAR(buf);
> +			(void)__generic_strlen(buf);
> +		}

Are you sure the warmed up caches do not affect the benchmark? I think you need
to flush / make caches dirty or so on each iteration.

> +		end = ktime_get();
> +		time_generic = ktime_to_ns(ktime_sub(end, start));
> +
> +		string_bench_report(test, "strlen", &bench_cases[i],
> +				time_arch, time_generic);
> +	}
> +}


-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 08/14] lib/string_kunit: add performance benchmark for strlen()
  2026-01-13  8:46   ` Andy Shevchenko
@ 2026-01-14  6:14     ` Feng Jiang
  2026-01-14  7:04       ` Feng Jiang
  0 siblings, 1 reply; 36+ messages in thread
From: Feng Jiang @ 2026-01-14  6:14 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: pjw, palmer, aou, alex, kees, andy, akpm, ebiggers,
	martin.petersen, ardb, ajones, conor.dooley, samuel.holland,
	linus.walleij, nathan, linux-riscv, linux-kernel, linux-hardening

On 2026/1/13 16:46, Andy Shevchenko wrote:
> On Tue, Jan 13, 2026 at 04:27:42PM +0800, Feng Jiang wrote:
>> Introduce a benchmark to compare the architecture-optimized strlen()
>> implementation against the generic C version (__generic_strlen).
>>
>> The benchmark uses a table-driven approach to evaluate performance
>> across different string lengths (short, medium, and long). It employs
>> ktime_get() for timing and get_random_bytes() followed by null-byte
>> filtering to generate test data that prevents early termination.
>>
>> This helps in quantifying the performance gains of architecture-specific
>> optimizations on various platforms.
> 
> ...
> 
>> +static void string_test_strlen_bench(struct kunit *test)
>> +{
>> +	char *buf;
>> +	size_t buf_len, iters;
>> +	ktime_t start, end;
>> +	u64 time_arch, time_generic;
>> +
>> +	buf_len = get_max_bench_len(bench_cases, ARRAY_SIZE(bench_cases)) + 1;
>> +
>> +	buf = kunit_kzalloc(test, buf_len, GFP_KERNEL);
>> +	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, buf);
>> +
>> +	for (size_t i = 0; i < ARRAY_SIZE(bench_cases); i++) {
>> +		get_random_nonzero_bytes(buf, bench_cases[i].len);
>> +		buf[bench_cases[i].len] = '\0';
>> +
>> +		iters = bench_cases[i].iterations;
>> +
>> +		/* 1. Benchmark the architecture-optimized version */
>> +		start = ktime_get();
>> +		for (unsigned int j = 0; j < iters; j++) {
>> +			OPTIMIZER_HIDE_VAR(buf);
>> +			(void)strlen(buf);
> 
> First Q: Are you sure the compiler doesn't replace this with __builtin_strlen() ?
> 
>> +		}
>> +		end = ktime_get();
>> +		time_arch = ktime_to_ns(ktime_sub(end, start));
>> +
>> +		/* 2. Benchmark the generic C version */
>> +		start = ktime_get();
>> +		for (unsigned int j = 0; j < iters; j++) {
>> +			OPTIMIZER_HIDE_VAR(buf);
>> +			(void)__generic_strlen(buf);
>> +		}
> 
> Are you sure the warmed up caches do not affect the benchmark? I think you need
> to flush / make caches dirty or so on each iteration.
> 
>> +		end = ktime_get();
>> +		time_generic = ktime_to_ns(ktime_sub(end, start));
>> +
>> +		string_bench_report(test, "strlen", &bench_cases[i],
>> +				time_arch, time_generic);
>> +	}
>> +}
> 
> 

Thank you for the catch. You are absolutely correct—the 2500x figure is heavily
distorted and does not reflect real-world performance.

I've found that by using a volatile function pointer to call the implementations
(instead of direct calls), the results returned to a realistic range. It appears
the previous benchmark logic allowed the compiler to over-optimize the test loop
in ways that skewed the data.

I will refactor the benchmark logic in v3, specifically referencing the crc32
KUnit implementation (e.g., using warm-up loops and adding preempt_disable()
to eliminate context-switch interference) to ensure the data is robust and accurate.

-- 
With Best Regards,
Feng Jiang


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 08/14] lib/string_kunit: add performance benchmark for strlen()
  2026-01-14  6:14     ` Feng Jiang
@ 2026-01-14  7:04       ` Feng Jiang
  2026-01-14  7:21         ` Andy Shevchenko
  2026-01-14 10:21         ` David Laight
  0 siblings, 2 replies; 36+ messages in thread
From: Feng Jiang @ 2026-01-14  7:04 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: pjw, palmer, aou, alex, kees, andy, akpm, ebiggers,
	martin.petersen, ardb, ajones, conor.dooley, samuel.holland,
	linus.walleij, nathan, linux-riscv, linux-kernel, linux-hardening

On 2026/1/14 14:14, Feng Jiang wrote:
> On 2026/1/13 16:46, Andy Shevchenko wrote:
>> On Tue, Jan 13, 2026 at 04:27:42PM +0800, Feng Jiang wrote:
>>> Introduce a benchmark to compare the architecture-optimized strlen()
>>> implementation against the generic C version (__generic_strlen).
>>>
>>> The benchmark uses a table-driven approach to evaluate performance
>>> across different string lengths (short, medium, and long). It employs
>>> ktime_get() for timing and get_random_bytes() followed by null-byte
>>> filtering to generate test data that prevents early termination.
>>>
>>> This helps in quantifying the performance gains of architecture-specific
>>> optimizations on various platforms.
>>
>> ...
>>
>>> +static void string_test_strlen_bench(struct kunit *test)
>>> +{
>>> +	char *buf;
>>> +	size_t buf_len, iters;
>>> +	ktime_t start, end;
>>> +	u64 time_arch, time_generic;
>>> +
>>> +	buf_len = get_max_bench_len(bench_cases, ARRAY_SIZE(bench_cases)) + 1;
>>> +
>>> +	buf = kunit_kzalloc(test, buf_len, GFP_KERNEL);
>>> +	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, buf);
>>> +
>>> +	for (size_t i = 0; i < ARRAY_SIZE(bench_cases); i++) {
>>> +		get_random_nonzero_bytes(buf, bench_cases[i].len);
>>> +		buf[bench_cases[i].len] = '\0';
>>> +
>>> +		iters = bench_cases[i].iterations;
>>> +
>>> +		/* 1. Benchmark the architecture-optimized version */
>>> +		start = ktime_get();
>>> +		for (unsigned int j = 0; j < iters; j++) {
>>> +			OPTIMIZER_HIDE_VAR(buf);
>>> +			(void)strlen(buf);
>>
>> First Q: Are you sure the compiler doesn't replace this with __builtin_strlen() ?
>>
>>> +		}
>>> +		end = ktime_get();
>>> +		time_arch = ktime_to_ns(ktime_sub(end, start));
>>> +
>>> +		/* 2. Benchmark the generic C version */
>>> +		start = ktime_get();
>>> +		for (unsigned int j = 0; j < iters; j++) {
>>> +			OPTIMIZER_HIDE_VAR(buf);
>>> +			(void)__generic_strlen(buf);
>>> +		}
>>
>> Are you sure the warmed up caches do not affect the benchmark? I think you need
>> to flush / make caches dirty or so on each iteration.
>>
>>> +		end = ktime_get();
>>> +		time_generic = ktime_to_ns(ktime_sub(end, start));
>>> +
>>> +		string_bench_report(test, "strlen", &bench_cases[i],
>>> +				time_arch, time_generic);
>>> +	}
>>> +}
>>
>>
> 
> Thank you for the catch. You are absolutely correct—the 2500x figure is heavily
> distorted and does not reflect real-world performance.
> 
> I've found that by using a volatile function pointer to call the implementations
> (instead of direct calls), the results returned to a realistic range. It appears
> the previous benchmark logic allowed the compiler to over-optimize the test loop
> in ways that skewed the data.
> 
> I will refactor the benchmark logic in v3, specifically referencing the crc32
> KUnit implementation (e.g., using warm-up loops and adding preempt_disable()
> to eliminate context-switch interference) to ensure the data is robust and accurate.
> 

Just a quick follow-up: I've also verified that using a volatile variable to store
the return value (as seen in crc_benchmark()) is equally effective at preventing
the optimization.

The core change is as follows:

    volatile size_t len;
    ...
    for (unsigned int j = 0; j < iters; j++) {
        OPTIMIZER_HIDE_VAR(buf);
        len = strlen(buf);
    }

Preliminary results with this change look much more reasonable:

    ok 4 string_test_strlen
    # string_test_strlen_bench: strlen performance (short, len: 8, iters: 100000):
    # string_test_strlen_bench:   arch-optimized: 4767500 ns
    # string_test_strlen_bench:   generic C:      5815800 ns
    # string_test_strlen_bench:   speedup:        1.21x
    # string_test_strlen_bench: strlen performance (medium, len: 64, iters: 100000):
    # string_test_strlen_bench:   arch-optimized: 6573600 ns
    # string_test_strlen_bench:   generic C:      16342500 ns
    # string_test_strlen_bench:   speedup:        2.48x
    # string_test_strlen_bench: strlen performance (long, len: 2048, iters: 10000):
    # string_test_strlen_bench:   arch-optimized: 7931000 ns
    # string_test_strlen_bench:   generic C:      35347300 ns
    # string_test_strlen_bench:   speedup:        4.45x
    ok 5 string_test_strlen_bench

I will adopt this pattern in v3, along with cache warm-up and preempt_disable(),
to stay consistent with existing kernel benchmarks and ensure robust measurements.

-- 
With Best Regards,
Feng Jiang


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 08/14] lib/string_kunit: add performance benchmark for strlen()
  2026-01-14  7:04       ` Feng Jiang
@ 2026-01-14  7:21         ` Andy Shevchenko
  2026-01-14  8:05           ` Feng Jiang
  2026-01-14 10:21         ` David Laight
  1 sibling, 1 reply; 36+ messages in thread
From: Andy Shevchenko @ 2026-01-14  7:21 UTC (permalink / raw)
  To: Feng Jiang
  Cc: pjw, palmer, aou, alex, kees, andy, akpm, ebiggers,
	martin.petersen, ardb, ajones, conor.dooley, samuel.holland,
	linus.walleij, nathan, linux-riscv, linux-kernel, linux-hardening

On Wed, Jan 14, 2026 at 03:04:58PM +0800, Feng Jiang wrote:
> On 2026/1/14 14:14, Feng Jiang wrote:
> > On 2026/1/13 16:46, Andy Shevchenko wrote:

...

> > Thank you for the catch. You are absolutely correct—the 2500x figure is heavily
> > distorted and does not reflect real-world performance.
> > 
> > I've found that by using a volatile function pointer to call the implementations
> > (instead of direct calls), the results returned to a realistic range. It appears
> > the previous benchmark logic allowed the compiler to over-optimize the test loop
> > in ways that skewed the data.
> > 
> > I will refactor the benchmark logic in v3, specifically referencing the crc32
> > KUnit implementation (e.g., using warm-up loops and adding preempt_disable()
> > to eliminate context-switch interference) to ensure the data is robust and accurate.
> > 
> 
> Just a quick follow-up: I've also verified that using a volatile variable to store
> the return value (as seen in crc_benchmark()) is equally effective at preventing
> the optimization.
> 
> The core change is as follows:
> 
>     volatile size_t len;
>     ...
>     for (unsigned int j = 0; j < iters; j++) {
>         OPTIMIZER_HIDE_VAR(buf);
>         len = strlen(buf);

But please, check for sure this is Linux kernel generic implementation (before)
and not __builtin_strlen() from GCC. (OTOH, it would be nice to benchmark that
one as well, although I think that __builtin_strlen() in general maybe slightly
better choice than Linux kernel generic implementation.) I.o.w. be sure *what*
you test.

>     }

Or using WRITE_ONCE() :-) But that one will probably be confusing as it usually
should be paired with READ_ONCE() somewhere else in the code. So, I agree on
crc_benchmark() approach taken.

> Preliminary results with this change look much more reasonable:
> 
>     ok 4 string_test_strlen
>     # string_test_strlen_bench: strlen performance (short, len: 8, iters: 100000):
>     # string_test_strlen_bench:   arch-optimized: 4767500 ns
>     # string_test_strlen_bench:   generic C:      5815800 ns
>     # string_test_strlen_bench:   speedup:        1.21x
>     # string_test_strlen_bench: strlen performance (medium, len: 64, iters: 100000):
>     # string_test_strlen_bench:   arch-optimized: 6573600 ns
>     # string_test_strlen_bench:   generic C:      16342500 ns
>     # string_test_strlen_bench:   speedup:        2.48x
>     # string_test_strlen_bench: strlen performance (long, len: 2048, iters: 10000):
>     # string_test_strlen_bench:   arch-optimized: 7931000 ns
>     # string_test_strlen_bench:   generic C:      35347300 ns
>     # string_test_strlen_bench:   speedup:        4.45x
>     ok 5 string_test_strlen_bench
> 
> I will adopt this pattern in v3, along with cache warm-up and preempt_disable(),
> to stay consistent with existing kernel benchmarks and ensure robust measurements.

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 08/14] lib/string_kunit: add performance benchmark for strlen()
  2026-01-14  7:21         ` Andy Shevchenko
@ 2026-01-14  8:05           ` Feng Jiang
  0 siblings, 0 replies; 36+ messages in thread
From: Feng Jiang @ 2026-01-14  8:05 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: pjw, palmer, aou, alex, kees, andy, akpm, ebiggers,
	martin.petersen, ardb, ajones, conor.dooley, samuel.holland,
	linus.walleij, nathan, linux-riscv, linux-kernel, linux-hardening

On 2026/1/14 15:21, Andy Shevchenko wrote:
> On Wed, Jan 14, 2026 at 03:04:58PM +0800, Feng Jiang wrote:
>> On 2026/1/14 14:14, Feng Jiang wrote:
>>> On 2026/1/13 16:46, Andy Shevchenko wrote:
> 
> ...
> 
>>> Thank you for the catch. You are absolutely correct—the 2500x figure is heavily
>>> distorted and does not reflect real-world performance.
>>>
>>> I've found that by using a volatile function pointer to call the implementations
>>> (instead of direct calls), the results returned to a realistic range. It appears
>>> the previous benchmark logic allowed the compiler to over-optimize the test loop
>>> in ways that skewed the data.
>>>
>>> I will refactor the benchmark logic in v3, specifically referencing the crc32
>>> KUnit implementation (e.g., using warm-up loops and adding preempt_disable()
>>> to eliminate context-switch interference) to ensure the data is robust and accurate.
>>>
>>
>> Just a quick follow-up: I've also verified that using a volatile variable to store
>> the return value (as seen in crc_benchmark()) is equally effective at preventing
>> the optimization.
>>
>> The core change is as follows:
>>
>>     volatile size_t len;
>>     ...
>>     for (unsigned int j = 0; j < iters; j++) {
>>         OPTIMIZER_HIDE_VAR(buf);
>>         len = strlen(buf);
> 
> But please, check for sure this is Linux kernel generic implementation (before)
> and not __builtin_strlen() from GCC. (OTOH, it would be nice to benchmark that
> one as well, although I think that __builtin_strlen() in general maybe slightly
> better choice than Linux kernel generic implementation.) I.o.w. be sure *what*
> you test.
> 

Thanks for the reminder. I actually verified this with objdump and gdb before
submitting the patch—the calls are indeed hitting the intended arch-specific
strlen symbols, not the compiler's __builtin_strlen(). I missed mentioning this
detail in my previous email.

I also just performed an additional test by explicitly calling the exported
arch-specific __pi_strlen() symbol, and the results remained consistent.

Results with riscv __pi_strlen():

    ok 4 string_test_strlen
    # string_test_strlen_bench: strlen performance (short, len: 8, iters: 100000):
    # string_test_strlen_bench:   arch-optimized: 4650500 ns
    # string_test_strlen_bench:   generic C:      5776000 ns
    # string_test_strlen_bench:   speedup:        1.24x
    # string_test_strlen_bench: strlen performance (medium, len: 64, iters: 100000):
    # string_test_strlen_bench:   arch-optimized: 6895000 ns
    # string_test_strlen_bench:   generic C:      16343400 ns
    # string_test_strlen_bench:   speedup:        2.37x
    # string_test_strlen_bench: strlen performance (long, len: 2048, iters: 10000):
    # string_test_strlen_bench:   arch-optimized: 8052800 ns
    # string_test_strlen_bench:   generic C:      35290700 ns
    # string_test_strlen_bench:   speedup:        4.38x
    ok 5 string_test_strlen_bench

>>     }
> 
> Or using WRITE_ONCE() :-) But that one will probably be confusing as it usually
> should be paired with READ_ONCE() somewhere else in the code. So, I agree on
> crc_benchmark() approach taken.
> 

Thanks for the guidance. I'll stick with the crc_benchmark() pattern to avoid any
potential confusion regarding concurrency that WRITE_ONCE() might imply.

I'm still learning the most idiomatic practices in the kernel, so I appreciate the tip.

>> Preliminary results with this change look much more reasonable:
>>
>>     ok 4 string_test_strlen
>>     # string_test_strlen_bench: strlen performance (short, len: 8, iters: 100000):
>>     # string_test_strlen_bench:   arch-optimized: 4767500 ns
>>     # string_test_strlen_bench:   generic C:      5815800 ns
>>     # string_test_strlen_bench:   speedup:        1.21x
>>     # string_test_strlen_bench: strlen performance (medium, len: 64, iters: 100000):
>>     # string_test_strlen_bench:   arch-optimized: 6573600 ns
>>     # string_test_strlen_bench:   generic C:      16342500 ns
>>     # string_test_strlen_bench:   speedup:        2.48x
>>     # string_test_strlen_bench: strlen performance (long, len: 2048, iters: 10000):
>>     # string_test_strlen_bench:   arch-optimized: 7931000 ns
>>     # string_test_strlen_bench:   generic C:      35347300 ns
>>     # string_test_strlen_bench:   speedup:        4.45x
>>     ok 5 string_test_strlen_bench
>>
>> I will adopt this pattern in v3, along with cache warm-up and preempt_disable(),
>> to stay consistent with existing kernel benchmarks and ensure robust measurements.
> 

-- 
With Best Regards,
Feng Jiang


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 08/14] lib/string_kunit: add performance benchmark for strlen()
  2026-01-14  7:04       ` Feng Jiang
  2026-01-14  7:21         ` Andy Shevchenko
@ 2026-01-14 10:21         ` David Laight
  2026-01-15  6:24           ` Feng Jiang
  1 sibling, 1 reply; 36+ messages in thread
From: David Laight @ 2026-01-14 10:21 UTC (permalink / raw)
  To: Feng Jiang
  Cc: Andy Shevchenko, pjw, palmer, aou, alex, kees, andy, akpm,
	ebiggers, martin.petersen, ardb, ajones, conor.dooley,
	samuel.holland, linus.walleij, nathan, linux-riscv, linux-kernel,
	linux-hardening

On Wed, 14 Jan 2026 15:04:58 +0800
Feng Jiang <jiangfeng@kylinos.cn> wrote:

> On 2026/1/14 14:14, Feng Jiang wrote:
> > On 2026/1/13 16:46, Andy Shevchenko wrote:  
> >> On Tue, Jan 13, 2026 at 04:27:42PM +0800, Feng Jiang wrote:  
> >>> Introduce a benchmark to compare the architecture-optimized strlen()
> >>> implementation against the generic C version (__generic_strlen).
> >>>
> >>> The benchmark uses a table-driven approach to evaluate performance
> >>> across different string lengths (short, medium, and long). It employs
> >>> ktime_get() for timing and get_random_bytes() followed by null-byte
> >>> filtering to generate test data that prevents early termination.
> >>>
> >>> This helps in quantifying the performance gains of architecture-specific
> >>> optimizations on various platforms.  
...
> Preliminary results with this change look much more reasonable:
> 
>     ok 4 string_test_strlen
>     # string_test_strlen_bench: strlen performance (short, len: 8, iters: 100000):
>     # string_test_strlen_bench:   arch-optimized: 4767500 ns
>     # string_test_strlen_bench:   generic C:      5815800 ns
>     # string_test_strlen_bench:   speedup:        1.21x
>     # string_test_strlen_bench: strlen performance (medium, len: 64, iters: 100000):
>     # string_test_strlen_bench:   arch-optimized: 6573600 ns
>     # string_test_strlen_bench:   generic C:      16342500 ns
>     # string_test_strlen_bench:   speedup:        2.48x
>     # string_test_strlen_bench: strlen performance (long, len: 2048, iters: 10000):
>     # string_test_strlen_bench:   arch-optimized: 7931000 ns
>     # string_test_strlen_bench:   generic C:      35347300 ns

That is far too long.
In 35ms you are including a lot of timer interrupts.
You are also just testing the 'hot cache' case.
The kernel runs 'cold cache' a lot of the time - especially for instructions.

To time short loops (or even single passes) you need a data dependency
between the 'start time' and the code being tested (easy enough, just add
(time & non_compile_time_zero) to a parameter), and between the result of
the code and the 'end time' - somewhat harder (doable in x86 if you use
the pmc cycle counter).

	David


>     # string_test_strlen_bench:   speedup:        4.45x
>     ok 5 string_test_strlen_bench
> 
> I will adopt this pattern in v3, along with cache warm-up and preempt_disable(),
> to stay consistent with existing kernel benchmarks and ensure robust measurements.
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 08/14] lib/string_kunit: add performance benchmark for strlen()
  2026-01-14 10:21         ` David Laight
@ 2026-01-15  6:24           ` Feng Jiang
  2026-01-15 10:40             ` David Laight
  0 siblings, 1 reply; 36+ messages in thread
From: Feng Jiang @ 2026-01-15  6:24 UTC (permalink / raw)
  To: David Laight
  Cc: Andy Shevchenko, pjw, palmer, aou, alex, kees, andy, akpm,
	ebiggers, martin.petersen, ardb, ajones, conor.dooley,
	samuel.holland, linus.walleij, nathan, linux-riscv, linux-kernel,
	linux-hardening

On 2026/1/14 18:21, David Laight wrote:
> On Wed, 14 Jan 2026 15:04:58 +0800
> Feng Jiang <jiangfeng@kylinos.cn> wrote:
> 
>> On 2026/1/14 14:14, Feng Jiang wrote:
>>> On 2026/1/13 16:46, Andy Shevchenko wrote:  
>>>> On Tue, Jan 13, 2026 at 04:27:42PM +0800, Feng Jiang wrote:  
>>>>> Introduce a benchmark to compare the architecture-optimized strlen()
>>>>> implementation against the generic C version (__generic_strlen).
>>>>>
>>>>> The benchmark uses a table-driven approach to evaluate performance
>>>>> across different string lengths (short, medium, and long). It employs
>>>>> ktime_get() for timing and get_random_bytes() followed by null-byte
>>>>> filtering to generate test data that prevents early termination.
>>>>>
>>>>> This helps in quantifying the performance gains of architecture-specific
>>>>> optimizations on various platforms.  
> ...
>> Preliminary results with this change look much more reasonable:
>>
>>     ok 4 string_test_strlen
>>     # string_test_strlen_bench: strlen performance (short, len: 8, iters: 100000):
>>     # string_test_strlen_bench:   arch-optimized: 4767500 ns
>>     # string_test_strlen_bench:   generic C:      5815800 ns
>>     # string_test_strlen_bench:   speedup:        1.21x
>>     # string_test_strlen_bench: strlen performance (medium, len: 64, iters: 100000):
>>     # string_test_strlen_bench:   arch-optimized: 6573600 ns
>>     # string_test_strlen_bench:   generic C:      16342500 ns
>>     # string_test_strlen_bench:   speedup:        2.48x
>>     # string_test_strlen_bench: strlen performance (long, len: 2048, iters: 10000):
>>     # string_test_strlen_bench:   arch-optimized: 7931000 ns
>>     # string_test_strlen_bench:   generic C:      35347300 ns
> 
> That is far too long.
> In 35ms you are including a lot of timer interrupts.
> You are also just testing the 'hot cache' case.
> The kernel runs 'cold cache' a lot of the time - especially for instructions.
> 
> To time short loops (or even single passes) you need a data dependency
> between the 'start time' and the code being tested (easy enough, just add
> (time & non_compile_time_zero) to a parameter), and between the result of
> the code and the 'end time' - somewhat harder (doable in x86 if you use
> the pmc cycle counter).

Hi David,

I appreciate the feedback! You're absolutely right that 35ms is quite long; it
was measured in a TCG environment, and on real hardware (ARM64 KVM), it's
actually an order of magnitude faster. I'll definitely tighten the iterations
in v3 to avoid potential noise.

For the more advanced suggestions like cold cache and data dependency, I can
see how they would make the benchmark much more rigorous. My plan is to follow
the pattern in crc_benchmark() to refine the logic, as I feel this approach is
simple, easy to maintain, and provides a good enough baseline for our needs.

While I understand that simulating a cold cache would be more precise, I'm
concerned it might introduce significant complexity at this stage. I hope the
current focus on hot-path throughput is a reasonable starting point for a
general KUnit test.

-- 
With Best Regards,
Feng Jiang


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 08/14] lib/string_kunit: add performance benchmark for strlen()
  2026-01-15  6:24           ` Feng Jiang
@ 2026-01-15 10:40             ` David Laight
  0 siblings, 0 replies; 36+ messages in thread
From: David Laight @ 2026-01-15 10:40 UTC (permalink / raw)
  To: Feng Jiang
  Cc: Andy Shevchenko, pjw, palmer, aou, alex, kees, andy, akpm,
	ebiggers, martin.petersen, ardb, ajones, conor.dooley,
	samuel.holland, linus.walleij, nathan, linux-riscv, linux-kernel,
	linux-hardening

On Thu, 15 Jan 2026 14:24:16 +0800
Feng Jiang <jiangfeng@kylinos.cn> wrote:

> On 2026/1/14 18:21, David Laight wrote:
> > On Wed, 14 Jan 2026 15:04:58 +0800
> > Feng Jiang <jiangfeng@kylinos.cn> wrote:
> >   
> >> On 2026/1/14 14:14, Feng Jiang wrote:  
> >>> On 2026/1/13 16:46, Andy Shevchenko wrote:    
> >>>> On Tue, Jan 13, 2026 at 04:27:42PM +0800, Feng Jiang wrote:    
> >>>>> Introduce a benchmark to compare the architecture-optimized strlen()
> >>>>> implementation against the generic C version (__generic_strlen).
> >>>>>
> >>>>> The benchmark uses a table-driven approach to evaluate performance
> >>>>> across different string lengths (short, medium, and long). It employs
> >>>>> ktime_get() for timing and get_random_bytes() followed by null-byte
> >>>>> filtering to generate test data that prevents early termination.
> >>>>>
> >>>>> This helps in quantifying the performance gains of architecture-specific
> >>>>> optimizations on various platforms.    
> > ...  
> >> Preliminary results with this change look much more reasonable:
> >>
> >>     ok 4 string_test_strlen
> >>     # string_test_strlen_bench: strlen performance (short, len: 8, iters: 100000):
> >>     # string_test_strlen_bench:   arch-optimized: 4767500 ns
> >>     # string_test_strlen_bench:   generic C:      5815800 ns
> >>     # string_test_strlen_bench:   speedup:        1.21x
> >>     # string_test_strlen_bench: strlen performance (medium, len: 64, iters: 100000):
> >>     # string_test_strlen_bench:   arch-optimized: 6573600 ns
> >>     # string_test_strlen_bench:   generic C:      16342500 ns
> >>     # string_test_strlen_bench:   speedup:        2.48x
> >>     # string_test_strlen_bench: strlen performance (long, len: 2048, iters: 10000):
> >>     # string_test_strlen_bench:   arch-optimized: 7931000 ns
> >>     # string_test_strlen_bench:   generic C:      35347300 ns  
> >>     # string_test_strlen_bench:   speedup:        4.45x
> > 
> > That is far too long.
> > In 35ms you are including a lot of timer interrupts.
> > You are also just testing the 'hot cache' case.
> > The kernel runs 'cold cache' a lot of the time - especially for instructions.
> > 
> > To time short loops (or even single passes) you need a data dependency
> > between the 'start time' and the code being tested (easy enough, just add
> > (time & non_compile_time_zero) to a parameter), and between the result of
> > the code and the 'end time' - somewhat harder (doable in x86 if you use
> > the pmc cycle counter).  
> 
> Hi David,
> 
> I appreciate the feedback! You're absolutely right that 35ms is quite long; it
> was measured in a TCG environment, and on real hardware (ARM64 KVM), it's
> actually an order of magnitude faster. I'll definitely tighten the iterations
> in v3 to avoid potential noise.

Doing time-based measurements on anything but real hardware is pointless.
(It is even problematic on some real hardware because the cpu clock speed
changes dynamically - which is why I've started using the x86 pmc to count
actual clock cycles.)

You only really need enough iterations to get enough 'ticks' from the
timer for the answer to make sense.
Other effects mean you can't really quote values to even 1% - so 100 ticks
from the timer is more than enough.
I'm not sure what the resolution of ktime_get_ns() is (will be hardware
dependant) 
You are better off running the test a few times and using the best value.

Also you don't need to do a very long test to show a x4 improvement!

To see how good an algorithm really is you really need to work out the
'fixed cost' and 'cost per byte'  in 'clocks' and 'clocks per byte'
(or 'bytes per clock') although they can be 'noisy' for short lengths.
The latter tells you how near to 'optimal' the algorithm is and lets you
compare results between different cpus (eg Zen-5 v i7-12xxx).
For instance the x86-64 IP checksum code (nominally 16bit add with carry)
actually runs at more than 8 bytes/clock on most cpu (IIRC it manages 12
but not 16).

> 
> For the more advanced suggestions like cold cache and data dependency, I can
> see how they would make the benchmark much more rigorous. My plan is to follow
> the pattern in crc_benchmark() to refine the logic, as I feel this approach is
> simple, easy to maintain, and provides a good enough baseline for our needs.
> 
> While I understand that simulating a cold cache would be more precise, I'm
> concerned it might introduce significant complexity at this stage. I hope the
> current focus on hot-path throughput is a reasonable starting point for a
> general KUnit test.
> 

I've only done 'cold cache' testing in userspace - counting the actual
clocks for the each call (the first value is cold cache).

Gives a massive difference for large functions like blake2s where the
unrolled loop is somewhat faster, but for the cold-cache it is only
worth it for buffers over (about) 8k (and that might be worse if the
cpu is running at full speed which makes the memory effectively slower).

The other issue with running a test multiple times is that the branch
predictor will correctly predict all the branches.
So something like memcpy() which might have different code for different
lengths will always pick the correct one.
Branch mis-prediction seems to cost about 20 clocks on my zen-5.

Anyway, some measurements are better than none.

	David

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 08/14] lib/string_kunit: add performance benchmark for strlen()
  2026-01-13  8:27 ` [PATCH v2 08/14] lib/string_kunit: add performance benchmark for strlen() Feng Jiang
  2026-01-13  8:46   ` Andy Shevchenko
@ 2026-01-18 11:11   ` kernel test robot
  1 sibling, 0 replies; 36+ messages in thread
From: kernel test robot @ 2026-01-18 11:11 UTC (permalink / raw)
  To: Feng Jiang, pjw, palmer, aou, alex, kees, andy, akpm, ebiggers,
	martin.petersen, ardb, ajones, conor.dooley, samuel.holland,
	linus.walleij, nathan
  Cc: oe-kbuild-all, linux-riscv, linux-kernel, linux-hardening

Hi Feng,

kernel test robot noticed the following build errors:

[auto build test ERROR on kees/for-next/hardening]
[also build test ERROR on linus/master v6.19-rc5 next-20260116]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Feng-Jiang/lib-string-extract-generic-strlen-into-__generic_strlen/20260113-163741
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git for-next/hardening
patch link:    https://lore.kernel.org/r/20260113082748.250916-9-jiangfeng%40kylinos.cn
patch subject: [PATCH v2 08/14] lib/string_kunit: add performance benchmark for strlen()
config: i386-randconfig-015-20251207 (https://download.01.org/0day-ci/archive/20260118/202601181845.EiSSqJu7-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260118/202601181845.EiSSqJu7-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202601181845.EiSSqJu7-lkp@intel.com/

All errors (new ones prefixed by >>, old ones prefixed by <<):

>> ERROR: modpost: "__umoddi3" [lib/tests/string_kunit.ko] undefined!

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 09/14] lib/string_kunit: add performance benchmark for strnlen()
  2026-01-13  8:27 [PATCH v2 00/14] riscv: optimize string functions and add kunit tests Feng Jiang
                   ` (7 preceding siblings ...)
  2026-01-13  8:27 ` [PATCH v2 08/14] lib/string_kunit: add performance benchmark for strlen() Feng Jiang
@ 2026-01-13  8:27 ` Feng Jiang
  2026-01-13  8:27 ` [PATCH v2 10/14] lib/string_kunit: add performance benchmark for strchr() Feng Jiang
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 36+ messages in thread
From: Feng Jiang @ 2026-01-13  8:27 UTC (permalink / raw)
  To: pjw, palmer, aou, alex, kees, andy, akpm, jiangfeng, ebiggers,
	martin.petersen, ardb, ajones, conor.dooley, samuel.holland,
	linus.walleij, nathan
  Cc: linux-riscv, linux-kernel, linux-hardening

Introduce a benchmark to compare the architecture-optimized strnlen()
implementation against the generic C version (__generic_strnlen).

Suggested-by: Andy Shevchenko <andy@kernel.org>
Signed-off-by: Feng Jiang <jiangfeng@kylinos.cn>
---
 lib/tests/string_kunit.c | 48 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 47 insertions(+), 1 deletion(-)

diff --git a/lib/tests/string_kunit.c b/lib/tests/string_kunit.c
index 2266954ae5e0..6578b36213bc 100644
--- a/lib/tests/string_kunit.c
+++ b/lib/tests/string_kunit.c
@@ -20,7 +20,7 @@
 #define STRING_TEST_MAX_LEN	128
 #define STRING_TEST_MAX_OFFSET	16
 
-#if defined(__HAVE_ARCH_STRLEN)
+#if defined(__HAVE_ARCH_STRLEN) || defined(__HAVE_ARCH_STRNLEN)
 #define STRING_BENCH_ENABLED
 #endif
 
@@ -272,6 +272,49 @@ static void string_test_strnlen(struct kunit *test)
 	}
 }
 
+#ifdef __HAVE_ARCH_STRNLEN
+static void string_test_strnlen_bench(struct kunit *test)
+{
+	char *buf;
+	size_t buf_len, iters;
+	ktime_t start, end;
+	u64 time_arch, time_generic;
+
+	buf_len = get_max_bench_len(bench_cases, ARRAY_SIZE(bench_cases)) + 1;
+
+	buf = kunit_kzalloc(test, buf_len, GFP_KERNEL);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, buf);
+
+	for (size_t i = 0; i < ARRAY_SIZE(bench_cases); i++) {
+		get_random_nonzero_bytes(buf, bench_cases[i].len);
+		buf[bench_cases[i].len] = '\0';
+
+		iters = bench_cases[i].iterations;
+
+		/* 1. Benchmark the architecture-optimized version */
+		start = ktime_get();
+		for (unsigned int j = 0; j < iters; j++) {
+			OPTIMIZER_HIDE_VAR(buf);
+			(void)strnlen(buf, buf_len);
+		}
+		end = ktime_get();
+		time_arch = ktime_to_ns(ktime_sub(end, start));
+
+		/* 2. Benchmark the generic C version */
+		start = ktime_get();
+		for (unsigned int j = 0; j < iters; j++) {
+			OPTIMIZER_HIDE_VAR(buf);
+			(void)__generic_strnlen(buf, buf_len);
+		}
+		end = ktime_get();
+		time_generic = ktime_to_ns(ktime_sub(end, start));
+
+		string_bench_report(test, "strnlen", &bench_cases[i],
+				time_arch, time_generic);
+	}
+}
+#endif
+
 static void string_test_strchr(struct kunit *test)
 {
 	const char *test_string = "abcdefghijkl";
@@ -820,6 +863,9 @@ static struct kunit_case string_test_cases[] = {
 	KUNIT_CASE(string_test_strlen_bench),
 #endif
 	KUNIT_CASE(string_test_strnlen),
+#ifdef __HAVE_ARCH_STRNLEN
+	KUNIT_CASE(string_test_strnlen_bench),
+#endif
 	KUNIT_CASE(string_test_strchr),
 	KUNIT_CASE(string_test_strnchr),
 	KUNIT_CASE(string_test_strrchr),
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 10/14] lib/string_kunit: add performance benchmark for strchr()
  2026-01-13  8:27 [PATCH v2 00/14] riscv: optimize string functions and add kunit tests Feng Jiang
                   ` (8 preceding siblings ...)
  2026-01-13  8:27 ` [PATCH v2 09/14] lib/string_kunit: add performance benchmark for strnlen() Feng Jiang
@ 2026-01-13  8:27 ` Feng Jiang
  2026-01-13  8:27 ` [PATCH v2 11/14] lib/string_kunit: add performance benchmark for strrchr() Feng Jiang
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 36+ messages in thread
From: Feng Jiang @ 2026-01-13  8:27 UTC (permalink / raw)
  To: pjw, palmer, aou, alex, kees, andy, akpm, jiangfeng, ebiggers,
	martin.petersen, ardb, ajones, conor.dooley, samuel.holland,
	linus.walleij, nathan
  Cc: linux-riscv, linux-kernel, linux-hardening

Introduce a benchmark to compare the architecture-optimized strchr()
implementation against the generic C version (__generic_strchr).

Suggested-by: Andy Shevchenko <andy@kernel.org>
Signed-off-by: Feng Jiang <jiangfeng@kylinos.cn>
---
 lib/tests/string_kunit.c | 49 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 48 insertions(+), 1 deletion(-)

diff --git a/lib/tests/string_kunit.c b/lib/tests/string_kunit.c
index 6578b36213bc..e8d527336030 100644
--- a/lib/tests/string_kunit.c
+++ b/lib/tests/string_kunit.c
@@ -20,7 +20,8 @@
 #define STRING_TEST_MAX_LEN	128
 #define STRING_TEST_MAX_OFFSET	16
 
-#if defined(__HAVE_ARCH_STRLEN) || defined(__HAVE_ARCH_STRNLEN)
+#if defined(__HAVE_ARCH_STRLEN) || defined(__HAVE_ARCH_STRNLEN) || \
+	defined(__HAVE_ARCH_STRCHR)
 #define STRING_BENCH_ENABLED
 #endif
 
@@ -367,6 +368,49 @@ static void string_test_strrchr(struct kunit *test)
 	}
 }
 
+#ifdef __HAVE_ARCH_STRCHR
+static void string_test_strchr_bench(struct kunit *test)
+{
+	char *buf;
+	size_t buf_len, iters;
+	ktime_t start, end;
+	u64 time_arch, time_generic;
+
+	buf_len = get_max_bench_len(bench_cases, ARRAY_SIZE(bench_cases)) + 1;
+
+	buf = kunit_kzalloc(test, buf_len, GFP_KERNEL);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, buf);
+
+	for (size_t i = 0; i < ARRAY_SIZE(bench_cases); i++) {
+		get_random_nonzero_bytes(buf, bench_cases[i].len);
+		buf[bench_cases[i].len] = '\0';
+
+		iters = bench_cases[i].iterations;
+
+		/* 1. Benchmark the architecture-optimized version */
+		start = ktime_get();
+		for (unsigned int j = 0; j < iters; j++) {
+			OPTIMIZER_HIDE_VAR(buf);
+			(void)strchr(buf, '\0');
+		}
+		end = ktime_get();
+		time_arch = ktime_to_ns(ktime_sub(end, start));
+
+		/* 2. Benchmark the generic C version */
+		start = ktime_get();
+		for (unsigned int j = 0; j < iters; j++) {
+			OPTIMIZER_HIDE_VAR(buf);
+			(void)__generic_strchr(buf, '\0');
+		}
+		end = ktime_get();
+		time_generic = ktime_to_ns(ktime_sub(end, start));
+
+		string_bench_report(test, "strchr", &bench_cases[i],
+				time_arch, time_generic);
+	}
+}
+#endif
+
 static void string_test_strnchr(struct kunit *test)
 {
 	const char *test_string = "abcdefghijkl";
@@ -867,6 +911,9 @@ static struct kunit_case string_test_cases[] = {
 	KUNIT_CASE(string_test_strnlen_bench),
 #endif
 	KUNIT_CASE(string_test_strchr),
+#ifdef __HAVE_ARCH_STRCHR
+	KUNIT_CASE(string_test_strchr_bench),
+#endif
 	KUNIT_CASE(string_test_strnchr),
 	KUNIT_CASE(string_test_strrchr),
 	KUNIT_CASE(string_test_strspn),
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 11/14] lib/string_kunit: add performance benchmark for strrchr()
  2026-01-13  8:27 [PATCH v2 00/14] riscv: optimize string functions and add kunit tests Feng Jiang
                   ` (9 preceding siblings ...)
  2026-01-13  8:27 ` [PATCH v2 10/14] lib/string_kunit: add performance benchmark for strchr() Feng Jiang
@ 2026-01-13  8:27 ` Feng Jiang
  2026-01-13  8:27 ` [PATCH v2 12/14] riscv: lib: add strnlen implementation Feng Jiang
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 36+ messages in thread
From: Feng Jiang @ 2026-01-13  8:27 UTC (permalink / raw)
  To: pjw, palmer, aou, alex, kees, andy, akpm, jiangfeng, ebiggers,
	martin.petersen, ardb, ajones, conor.dooley, samuel.holland,
	linus.walleij, nathan
  Cc: linux-riscv, linux-kernel, linux-hardening

Introduce a benchmark to compare the architecture-optimized strrchr()
implementation against the generic C version (__generic_strrchr).

Suggested-by: Andy Shevchenko <andy@kernel.org>
Signed-off-by: Feng Jiang <jiangfeng@kylinos.cn>
---
 lib/tests/string_kunit.c | 50 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 49 insertions(+), 1 deletion(-)

diff --git a/lib/tests/string_kunit.c b/lib/tests/string_kunit.c
index e8d527336030..89b583cec2c8 100644
--- a/lib/tests/string_kunit.c
+++ b/lib/tests/string_kunit.c
@@ -21,7 +21,7 @@
 #define STRING_TEST_MAX_OFFSET	16
 
 #if defined(__HAVE_ARCH_STRLEN) || defined(__HAVE_ARCH_STRNLEN) || \
-	defined(__HAVE_ARCH_STRCHR)
+	defined(__HAVE_ARCH_STRCHR) || defined(__HAVE_ARCH_STRRCHR)
 #define STRING_BENCH_ENABLED
 #endif
 
@@ -444,6 +444,51 @@ static void string_test_strnchr(struct kunit *test)
 	KUNIT_ASSERT_NULL(test, result);
 }
 
+#ifdef __HAVE_ARCH_STRRCHR
+static void string_test_strrchr_bench(struct kunit *test)
+{
+	char *buf;
+	size_t buf_len, iters;
+	ktime_t start, end;
+	u64 time_arch, time_generic;
+
+	buf_len = get_max_bench_len(bench_cases, ARRAY_SIZE(bench_cases)) + 1;
+
+	buf = kunit_kzalloc(test, buf_len, GFP_KERNEL);
+	KUNIT_ASSERT_NOT_ERR_OR_NULL(test, buf);
+
+	for (size_t i = 0; i < ARRAY_SIZE(bench_cases); i++) {
+		get_random_nonzero_bytes(buf, bench_cases[i].len);
+		if (bench_cases[i].len > 0)
+			buf[bench_cases[i].len - 1] = 'A';
+		buf[bench_cases[i].len] = '\0';
+
+		iters = bench_cases[i].iterations;
+
+		/* 1. Benchmark the architecture-optimized version */
+		start = ktime_get();
+		for (unsigned int j = 0; j < iters; j++) {
+			OPTIMIZER_HIDE_VAR(buf);
+			(void)strrchr(buf, 'A');
+		}
+		end = ktime_get();
+		time_arch = ktime_to_ns(ktime_sub(end, start));
+
+		/* 2. Benchmark the generic C version */
+		start = ktime_get();
+		for (unsigned int j = 0; j < iters; j++) {
+			OPTIMIZER_HIDE_VAR(buf);
+			(void)__generic_strrchr(buf, 'A');
+		}
+		end = ktime_get();
+		time_generic = ktime_to_ns(ktime_sub(end, start));
+
+		string_bench_report(test, "strrchr", &bench_cases[i],
+				time_arch, time_generic);
+	}
+}
+#endif
+
 static void string_test_strspn(struct kunit *test)
 {
 	static const struct strspn_test {
@@ -916,6 +961,9 @@ static struct kunit_case string_test_cases[] = {
 #endif
 	KUNIT_CASE(string_test_strnchr),
 	KUNIT_CASE(string_test_strrchr),
+#ifdef __HAVE_ARCH_STRRCHR
+	KUNIT_CASE(string_test_strrchr_bench),
+#endif
 	KUNIT_CASE(string_test_strspn),
 	KUNIT_CASE(string_test_strcmp),
 	KUNIT_CASE(string_test_strcmp_long_strings),
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 12/14] riscv: lib: add strnlen implementation
  2026-01-13  8:27 [PATCH v2 00/14] riscv: optimize string functions and add kunit tests Feng Jiang
                   ` (10 preceding siblings ...)
  2026-01-13  8:27 ` [PATCH v2 11/14] lib/string_kunit: add performance benchmark for strrchr() Feng Jiang
@ 2026-01-13  8:27 ` Feng Jiang
  2026-01-13  8:48   ` Andy Shevchenko
  2026-01-13  8:27 ` [PATCH v2 13/14] riscv: lib: add strchr implementation Feng Jiang
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 36+ messages in thread
From: Feng Jiang @ 2026-01-13  8:27 UTC (permalink / raw)
  To: pjw, palmer, aou, alex, kees, andy, akpm, jiangfeng, ebiggers,
	martin.petersen, ardb, ajones, conor.dooley, samuel.holland,
	linus.walleij, nathan
  Cc: linux-riscv, linux-kernel, linux-hardening

Add strnlen() implementation for RISC-V with both generic and
Zbb-optimized versions, derived from strlen.S.

Signed-off-by: Feng Jiang <jiangfeng@kylinos.cn>
---
 arch/riscv/include/asm/string.h |   3 +
 arch/riscv/lib/Makefile         |   1 +
 arch/riscv/lib/strnlen.S        | 164 ++++++++++++++++++++++++++++++++
 arch/riscv/purgatory/Makefile   |   5 +-
 4 files changed, 172 insertions(+), 1 deletion(-)
 create mode 100644 arch/riscv/lib/strnlen.S

diff --git a/arch/riscv/include/asm/string.h b/arch/riscv/include/asm/string.h
index 5ba77f60bf0b..16634d67c217 100644
--- a/arch/riscv/include/asm/string.h
+++ b/arch/riscv/include/asm/string.h
@@ -28,6 +28,9 @@ extern asmlinkage __kernel_size_t strlen(const char *);
 
 #define __HAVE_ARCH_STRNCMP
 extern asmlinkage int strncmp(const char *cs, const char *ct, size_t count);
+
+#define __HAVE_ARCH_STRNLEN
+extern asmlinkage __kernel_size_t strnlen(const char *, size_t);
 #endif
 
 /* For those files which don't want to check by kasan. */
diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile
index bbc031124974..0969d8136df0 100644
--- a/arch/riscv/lib/Makefile
+++ b/arch/riscv/lib/Makefile
@@ -7,6 +7,7 @@ ifeq ($(CONFIG_KASAN_GENERIC)$(CONFIG_KASAN_SW_TAGS),)
 lib-y			+= strcmp.o
 lib-y			+= strlen.o
 lib-y			+= strncmp.o
+lib-y			+= strnlen.o
 endif
 lib-y			+= csum.o
 ifeq ($(CONFIG_MMU), y)
diff --git a/arch/riscv/lib/strnlen.S b/arch/riscv/lib/strnlen.S
new file mode 100644
index 000000000000..4af0df9442f1
--- /dev/null
+++ b/arch/riscv/lib/strnlen.S
@@ -0,0 +1,164 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+/*
+ * Base on arch/riscv/lib/strlen.S
+ *
+ * Copyright (C) Feng Jiang <jiangfeng@kylinos.cn>
+ */
+
+#include <linux/linkage.h>
+#include <asm/asm.h>
+#include <asm/alternative-macros.h>
+#include <asm/hwcap.h>
+
+/* size_t strnlen(const char *s, size_t count) */
+SYM_FUNC_START(strnlen)
+
+	__ALTERNATIVE_CFG("nop", "j strnlen_zbb", 0, RISCV_ISA_EXT_ZBB,
+		IS_ENABLED(CONFIG_RISCV_ISA_ZBB) && IS_ENABLED(CONFIG_TOOLCHAIN_HAS_ZBB))
+
+
+	/*
+	 * Returns
+	 *   a0 - String length
+	 *
+	 * Parameters
+	 *   a0 - String to measure
+	 *   a1 - Max length of string
+	 *
+	 * Clobbers
+	 *   t0, t1, t2
+	 */
+	addi	t1, a0, -1
+	add	t2, a0, a1
+1:
+	addi	t1, t1, 1
+	beq	t1, t2, 2f
+	lbu	t0, 0(t1)
+	bnez	t0, 1b
+2:
+	sub	a0, t1, a0
+	ret
+
+
+/*
+ * Variant of strnlen using the ZBB extension if available
+ */
+#if defined(CONFIG_RISCV_ISA_ZBB) && defined(CONFIG_TOOLCHAIN_HAS_ZBB)
+strnlen_zbb:
+
+#ifdef CONFIG_CPU_BIG_ENDIAN
+# define CZ	clz
+# define SHIFT	sll
+#else
+# define CZ	ctz
+# define SHIFT	srl
+#endif
+
+.option push
+.option arch,+zbb
+
+	/*
+	 * Returns
+	 *   a0 - String length
+	 *
+	 * Parameters
+	 *   a0 - String to measure
+	 *   a1 - Max length of string
+	 *
+	 * Clobbers
+	 *   t0, t1, t2, t3, t4
+	 */
+
+	/* If maxlen is 0, return 0. */
+	beqz	a1, 3f
+
+	/* Number of irrelevant bytes in the first word. */
+	andi	t2, a0, SZREG-1
+
+	/* Align pointer. */
+	andi	t0, a0, -SZREG
+
+	li	t3, SZREG
+	sub	t3, t3, t2
+	slli	t2, t2, 3
+
+	/* Aligned boundary. */
+	add	t4, a0, a1
+	andi	t4, t4, -SZREG
+
+	/* Get the first word.  */
+	REG_L	t1, 0(t0)
+
+	/*
+	 * Shift away the partial data we loaded to remove the irrelevant bytes
+	 * preceding the string with the effect of adding NUL bytes at the
+	 * end of the string's first word.
+	 */
+	SHIFT	t1, t1, t2
+
+	/* Convert non-NUL into 0xff and NUL into 0x00. */
+	orc.b	t1, t1
+
+	/* Convert non-NUL into 0x00 and NUL into 0xff. */
+	not	t1, t1
+
+	/*
+	 * Search for the first set bit (corresponding to a NUL byte in the
+	 * original chunk).
+	 */
+	CZ	t1, t1
+
+	/*
+	 * The first chunk is special: compare against the number
+	 * of valid bytes in this chunk.
+	 */
+	srli	a0, t1, 3
+
+	/* Limit the result by maxlen. */
+	bleu	a1, a0, 3f
+
+	bgtu	t3, a0, 2f
+
+	/* Prepare for the word comparison loop. */
+	addi	t2, t0, SZREG
+	li	t3, -1
+
+	/*
+	 * Our critical loop is 4 instructions and processes data in
+	 * 4 byte or 8 byte chunks.
+	 */
+	.p2align 3
+1:
+	REG_L	t1, SZREG(t0)
+	addi	t0, t0, SZREG
+	orc.b	t1, t1
+	bgeu	t0, t4, 4f
+	beq	t1, t3, 1b
+4:
+	not	t1, t1
+	CZ	t1, t1
+	srli	t1, t1, 3
+
+	/* Get number of processed bytes. */
+	sub	t2, t0, t2
+
+	/* Add number of characters in the first word.  */
+	add	a0, a0, t2
+
+	/* Add number of characters in the last word.  */
+	add	a0, a0, t1
+
+	/* Ensure the final result does not exceed maxlen. */
+	bgeu	a0, a1, 3f
+2:
+	ret
+3:
+	mv	a0, a1
+	ret
+
+.option pop
+#endif
+SYM_FUNC_END(strnlen)
+SYM_FUNC_ALIAS(__pi_strnlen, strnlen)
+EXPORT_SYMBOL(strnlen)
diff --git a/arch/riscv/purgatory/Makefile b/arch/riscv/purgatory/Makefile
index 530e497ca2f9..d7c0533108be 100644
--- a/arch/riscv/purgatory/Makefile
+++ b/arch/riscv/purgatory/Makefile
@@ -2,7 +2,7 @@
 
 purgatory-y := purgatory.o sha256.o entry.o string.o ctype.o memcpy.o memset.o
 ifeq ($(CONFIG_KASAN_GENERIC)$(CONFIG_KASAN_SW_TAGS),)
-purgatory-y += strcmp.o strlen.o strncmp.o
+purgatory-y += strcmp.o strlen.o strncmp.o strnlen.o
 endif
 
 targets += $(purgatory-y)
@@ -32,6 +32,9 @@ $(obj)/strncmp.o: $(srctree)/arch/riscv/lib/strncmp.S FORCE
 $(obj)/sha256.o: $(srctree)/lib/crypto/sha256.c FORCE
 	$(call if_changed_rule,cc_o_c)
 
+$(obj)/strnlen.o: $(srctree)/arch/riscv/lib/strnlen.S FORCE
+	$(call if_changed_rule,as_o_S)
+
 CFLAGS_sha256.o := -D__DISABLE_EXPORTS -D__NO_FORTIFY
 CFLAGS_string.o := -D__DISABLE_EXPORTS
 CFLAGS_ctype.o := -D__DISABLE_EXPORTS
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 12/14] riscv: lib: add strnlen implementation
  2026-01-13  8:27 ` [PATCH v2 12/14] riscv: lib: add strnlen implementation Feng Jiang
@ 2026-01-13  8:48   ` Andy Shevchenko
  0 siblings, 0 replies; 36+ messages in thread
From: Andy Shevchenko @ 2026-01-13  8:48 UTC (permalink / raw)
  To: Feng Jiang
  Cc: pjw, palmer, aou, alex, kees, andy, akpm, ebiggers,
	martin.petersen, ardb, ajones, conor.dooley, samuel.holland,
	linus.walleij, nathan, linux-riscv, linux-kernel, linux-hardening

On Tue, Jan 13, 2026 at 04:27:46PM +0800, Feng Jiang wrote:
> Add strnlen() implementation for RISC-V with both generic and
> Zbb-optimized versions, derived from strlen.S.

Now you have benchmark tool, what are the results? Please, put them in a short
form here in the commit message. (Yes, I know about cover letter.)

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 13/14] riscv: lib: add strchr implementation
  2026-01-13  8:27 [PATCH v2 00/14] riscv: optimize string functions and add kunit tests Feng Jiang
                   ` (11 preceding siblings ...)
  2026-01-13  8:27 ` [PATCH v2 12/14] riscv: lib: add strnlen implementation Feng Jiang
@ 2026-01-13  8:27 ` Feng Jiang
  2026-01-13  8:27 ` [PATCH v2 14/14] riscv: lib: add strrchr implementation Feng Jiang
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 36+ messages in thread
From: Feng Jiang @ 2026-01-13  8:27 UTC (permalink / raw)
  To: pjw, palmer, aou, alex, kees, andy, akpm, jiangfeng, ebiggers,
	martin.petersen, ardb, ajones, conor.dooley, samuel.holland,
	linus.walleij, nathan
  Cc: linux-riscv, linux-kernel, linux-hardening

Add a basic assembly implementation of strchr() for RISC-V.
This provides a functional byte-by-byte version as a foundation
for future optimizations.

Signed-off-by: Feng Jiang <jiangfeng@kylinos.cn>
---
 arch/riscv/include/asm/string.h |  3 +++
 arch/riscv/lib/Makefile         |  1 +
 arch/riscv/lib/strchr.S         | 35 +++++++++++++++++++++++++++++++++
 arch/riscv/purgatory/Makefile   |  5 ++++-
 4 files changed, 43 insertions(+), 1 deletion(-)
 create mode 100644 arch/riscv/lib/strchr.S

diff --git a/arch/riscv/include/asm/string.h b/arch/riscv/include/asm/string.h
index 16634d67c217..ca3ade82b124 100644
--- a/arch/riscv/include/asm/string.h
+++ b/arch/riscv/include/asm/string.h
@@ -31,6 +31,9 @@ extern asmlinkage int strncmp(const char *cs, const char *ct, size_t count);
 
 #define __HAVE_ARCH_STRNLEN
 extern asmlinkage __kernel_size_t strnlen(const char *, size_t);
+
+#define __HAVE_ARCH_STRCHR
+extern asmlinkage char *strchr(const char *, int);
 #endif
 
 /* For those files which don't want to check by kasan. */
diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile
index 0969d8136df0..b7f804dce1c3 100644
--- a/arch/riscv/lib/Makefile
+++ b/arch/riscv/lib/Makefile
@@ -8,6 +8,7 @@ lib-y			+= strcmp.o
 lib-y			+= strlen.o
 lib-y			+= strncmp.o
 lib-y			+= strnlen.o
+lib-y			+= strchr.o
 endif
 lib-y			+= csum.o
 ifeq ($(CONFIG_MMU), y)
diff --git a/arch/riscv/lib/strchr.S b/arch/riscv/lib/strchr.S
new file mode 100644
index 000000000000..48c3a9da53e3
--- /dev/null
+++ b/arch/riscv/lib/strchr.S
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+/*
+ * Copyright (C) 2025 Feng Jiang <jiangfeng@kylinos.cn>
+ */
+
+#include <linux/linkage.h>
+#include <asm/asm.h>
+
+/* char *strchr(const char *s, int c) */
+SYM_FUNC_START(strchr)
+	/*
+	 * Parameters
+	 *   a0 - The string to be searched
+	 *   a1 - The character to search for
+	 *
+	 * Returns
+	 *   a0 - Address of first occurrence of 'c' or 0
+	 *
+	 * Clobbers
+	 *   t0
+	 */
+	andi	a1, a1, 0xff
+1:
+	lbu	t0, 0(a0)
+	beq	t0, a1, 2f
+	addi	a0, a0, 1
+	bnez	t0, 1b
+	li	a0, 0
+2:
+	ret
+SYM_FUNC_END(strchr)
+
+SYM_FUNC_ALIAS_WEAK(__pi_strchr, strchr)
+EXPORT_SYMBOL(strchr)
diff --git a/arch/riscv/purgatory/Makefile b/arch/riscv/purgatory/Makefile
index d7c0533108be..e7b3d748c913 100644
--- a/arch/riscv/purgatory/Makefile
+++ b/arch/riscv/purgatory/Makefile
@@ -2,7 +2,7 @@
 
 purgatory-y := purgatory.o sha256.o entry.o string.o ctype.o memcpy.o memset.o
 ifeq ($(CONFIG_KASAN_GENERIC)$(CONFIG_KASAN_SW_TAGS),)
-purgatory-y += strcmp.o strlen.o strncmp.o strnlen.o
+purgatory-y += strcmp.o strlen.o strncmp.o strnlen.o strchr.o
 endif
 
 targets += $(purgatory-y)
@@ -35,6 +35,9 @@ $(obj)/sha256.o: $(srctree)/lib/crypto/sha256.c FORCE
 $(obj)/strnlen.o: $(srctree)/arch/riscv/lib/strnlen.S FORCE
 	$(call if_changed_rule,as_o_S)
 
+$(obj)/strchr.o: $(srctree)/arch/riscv/lib/strchr.S FORCE
+	$(call if_changed_rule,as_o_S)
+
 CFLAGS_sha256.o := -D__DISABLE_EXPORTS -D__NO_FORTIFY
 CFLAGS_string.o := -D__DISABLE_EXPORTS
 CFLAGS_ctype.o := -D__DISABLE_EXPORTS
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 14/14] riscv: lib: add strrchr implementation
  2026-01-13  8:27 [PATCH v2 00/14] riscv: optimize string functions and add kunit tests Feng Jiang
                   ` (12 preceding siblings ...)
  2026-01-13  8:27 ` [PATCH v2 13/14] riscv: lib: add strchr implementation Feng Jiang
@ 2026-01-13  8:27 ` Feng Jiang
  2026-01-13  8:52 ` [PATCH v2 00/14] riscv: optimize string functions and add kunit tests Andy Shevchenko
  2026-01-15  4:43 ` Joel Stanley
  15 siblings, 0 replies; 36+ messages in thread
From: Feng Jiang @ 2026-01-13  8:27 UTC (permalink / raw)
  To: pjw, palmer, aou, alex, kees, andy, akpm, jiangfeng, ebiggers,
	martin.petersen, ardb, ajones, conor.dooley, samuel.holland,
	linus.walleij, nathan
  Cc: linux-riscv, linux-kernel, linux-hardening

Add a basic assembly implementation of strrchr() for RISC-V.
This provides a functional byte-by-byte version as a foundation
for future optimizations.

Signed-off-by: Feng Jiang <jiangfeng@kylinos.cn>
---
 arch/riscv/include/asm/string.h |  3 +++
 arch/riscv/lib/Makefile         |  1 +
 arch/riscv/lib/strrchr.S        | 37 +++++++++++++++++++++++++++++++++
 arch/riscv/purgatory/Makefile   |  5 ++++-
 4 files changed, 45 insertions(+), 1 deletion(-)
 create mode 100644 arch/riscv/lib/strrchr.S

diff --git a/arch/riscv/include/asm/string.h b/arch/riscv/include/asm/string.h
index ca3ade82b124..764ffe8f6479 100644
--- a/arch/riscv/include/asm/string.h
+++ b/arch/riscv/include/asm/string.h
@@ -34,6 +34,9 @@ extern asmlinkage __kernel_size_t strnlen(const char *, size_t);
 
 #define __HAVE_ARCH_STRCHR
 extern asmlinkage char *strchr(const char *, int);
+
+#define __HAVE_ARCH_STRRCHR
+extern asmlinkage char *strrchr(const char *, int);
 #endif
 
 /* For those files which don't want to check by kasan. */
diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile
index b7f804dce1c3..735d0b665536 100644
--- a/arch/riscv/lib/Makefile
+++ b/arch/riscv/lib/Makefile
@@ -9,6 +9,7 @@ lib-y			+= strlen.o
 lib-y			+= strncmp.o
 lib-y			+= strnlen.o
 lib-y			+= strchr.o
+lib-y			+= strrchr.o
 endif
 lib-y			+= csum.o
 ifeq ($(CONFIG_MMU), y)
diff --git a/arch/riscv/lib/strrchr.S b/arch/riscv/lib/strrchr.S
new file mode 100644
index 000000000000..ac58b20ca21d
--- /dev/null
+++ b/arch/riscv/lib/strrchr.S
@@ -0,0 +1,37 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+/*
+ * Copyright (C) 2025 Feng Jiang <jiangfeng@kylinos.cn>
+ */
+
+#include <linux/linkage.h>
+#include <asm/asm.h>
+
+/* char *strrchr(const char *s, int c) */
+SYM_FUNC_START(strrchr)
+	/*
+	 * Parameters
+	 *	a0 - The string to be searched
+	 *	a1 - The character to seaerch for
+	 *
+	 * Returns
+	 *	a0 - Address of last occurrence of 'c' or 0
+	 *
+	 * Clobbers
+	 *	t0, t1
+	 */
+	andi	a1, a1, 0xff
+	mv	t1, a0
+	li	a0, 0
+1:
+	lbu	t0, 0(t1)
+	bne	t0, a1, 2f
+	mv	a0, t1
+2:
+	addi	t1, t1, 1
+	bnez	t0, 1b
+	ret
+SYM_FUNC_END(strrchr)
+
+SYM_FUNC_ALIAS_WEAK(__pi_strrchr, strrchr)
+EXPORT_SYMBOL(strrchr)
diff --git a/arch/riscv/purgatory/Makefile b/arch/riscv/purgatory/Makefile
index e7b3d748c913..b0358a78f11a 100644
--- a/arch/riscv/purgatory/Makefile
+++ b/arch/riscv/purgatory/Makefile
@@ -2,7 +2,7 @@
 
 purgatory-y := purgatory.o sha256.o entry.o string.o ctype.o memcpy.o memset.o
 ifeq ($(CONFIG_KASAN_GENERIC)$(CONFIG_KASAN_SW_TAGS),)
-purgatory-y += strcmp.o strlen.o strncmp.o strnlen.o strchr.o
+purgatory-y += strcmp.o strlen.o strncmp.o strnlen.o strchr.o strrchr.o
 endif
 
 targets += $(purgatory-y)
@@ -38,6 +38,9 @@ $(obj)/strnlen.o: $(srctree)/arch/riscv/lib/strnlen.S FORCE
 $(obj)/strchr.o: $(srctree)/arch/riscv/lib/strchr.S FORCE
 	$(call if_changed_rule,as_o_S)
 
+$(obj)/strrchr.o: $(srctree)/arch/riscv/lib/strrchr.S FORCE
+	$(call if_changed_rule,as_o_S)
+
 CFLAGS_sha256.o := -D__DISABLE_EXPORTS -D__NO_FORTIFY
 CFLAGS_string.o := -D__DISABLE_EXPORTS
 CFLAGS_ctype.o := -D__DISABLE_EXPORTS
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 00/14] riscv: optimize string functions and add kunit tests
  2026-01-13  8:27 [PATCH v2 00/14] riscv: optimize string functions and add kunit tests Feng Jiang
                   ` (13 preceding siblings ...)
  2026-01-13  8:27 ` [PATCH v2 14/14] riscv: lib: add strrchr implementation Feng Jiang
@ 2026-01-13  8:52 ` Andy Shevchenko
  2026-01-15  4:43 ` Joel Stanley
  15 siblings, 0 replies; 36+ messages in thread
From: Andy Shevchenko @ 2026-01-13  8:52 UTC (permalink / raw)
  To: Feng Jiang
  Cc: pjw, palmer, aou, alex, kees, andy, akpm, ebiggers,
	martin.petersen, ardb, ajones, conor.dooley, samuel.holland,
	linus.walleij, nathan, linux-riscv, linux-kernel, linux-hardening

On Tue, Jan 13, 2026 at 04:27:34PM +0800, Feng Jiang wrote:
> This series introduces optimized assembly implementations for strnlen,
> strchr, and strrchr on the RISC-V architecture. To support a rigorous
> verification process, the series also significantly expands the
> string_kunit test suite with both functional correctness tests and
> performance benchmarks.
> 
> The patchset is organized as follows:
> - Refactoring (Patches 1-4): Extract generic C implementations for
>   strlen, strnlen, strchr, and strrchr into exported __generic_* functions.
> - Correctness Testing (Patches 5-7): Extend string_kunit with detailed
>   functional tests for the target functions.
> - Performance Benchmarking (Patches 8-11): Add a benchmarking framework
>   to string_kunit to measure execution time across various string lengths.
> - RISC-V Optimizations (Patches 12-14): Provide the optimized assembly
>   implementations for the RISC-V architecture.

...

>         # string_test_strlen_bench: strlen performance (long, len: 2048, iters: 10000):
>         # string_test_strlen_bench:   arch-optimized: 14100 ns
>         # string_test_strlen_bench:   generic C:      35605600 ns
>         # string_test_strlen_bench:   speedup:        2525.21x

Doesn't sound right. I think you measured cache performance and not your algo.

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 00/14] riscv: optimize string functions and add kunit tests
  2026-01-13  8:27 [PATCH v2 00/14] riscv: optimize string functions and add kunit tests Feng Jiang
                   ` (14 preceding siblings ...)
  2026-01-13  8:52 ` [PATCH v2 00/14] riscv: optimize string functions and add kunit tests Andy Shevchenko
@ 2026-01-15  4:43 ` Joel Stanley
  2026-01-19  9:24   ` Feng Jiang
  15 siblings, 1 reply; 36+ messages in thread
From: Joel Stanley @ 2026-01-15  4:43 UTC (permalink / raw)
  To: Feng Jiang
  Cc: pjw, palmer, aou, alex, kees, andy, akpm, ebiggers,
	martin.petersen, ardb, ajones, conor.dooley, samuel.holland,
	linus.walleij, nathan, linux-riscv, linux-kernel, linux-hardening,
	dfustini@oss.tenstorrent.com, Michael Ellerman

On Tue, 13 Jan 2026 at 18:58, Feng Jiang <jiangfeng@kylinos.cn> wrote:
>
> This series introduces optimized assembly implementations for strnlen,
> strchr, and strrchr on the RISC-V architecture. To support a rigorous
> verification process, the series also significantly expands the
> string_kunit test suite with both functional correctness tests and
> performance benchmarks.

I ran the kunit tests on Ascalon, a RVA23 CPU, in emulation. The arch
optimised version showed significant improvements over the plain
version.

I didn't have time to investigate if the numbers made sense. As Andy
noted, the 'long' benchmark had a much higher ratio improvement than
the short and medium.

Tested-by: Joel Stanley <joel@jms.id.au>

Cheers,

Joel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 00/14] riscv: optimize string functions and add kunit tests
  2026-01-15  4:43 ` Joel Stanley
@ 2026-01-19  9:24   ` Feng Jiang
  0 siblings, 0 replies; 36+ messages in thread
From: Feng Jiang @ 2026-01-19  9:24 UTC (permalink / raw)
  To: Joel Stanley
  Cc: pjw, palmer, aou, alex, kees, andy, akpm, ebiggers,
	martin.petersen, ardb, ajones, conor.dooley, samuel.holland,
	linus.walleij, nathan, linux-riscv, linux-kernel, linux-hardening,
	dfustini@oss.tenstorrent.com, Michael Ellerman

On 2026/1/15 12:43, Joel Stanley wrote:
> On Tue, 13 Jan 2026 at 18:58, Feng Jiang <jiangfeng@kylinos.cn> wrote:
>>
>> This series introduces optimized assembly implementations for strnlen,
>> strchr, and strrchr on the RISC-V architecture. To support a rigorous
>> verification process, the series also significantly expands the
>> string_kunit test suite with both functional correctness tests and
>> performance benchmarks.
> 
> I ran the kunit tests on Ascalon, a RVA23 CPU, in emulation. The arch
> optimised version showed significant improvements over the plain
> version.
> 
> I didn't have time to investigate if the numbers made sense. As Andy
> noted, the 'long' benchmark had a much higher ratio improvement than
> the short and medium.
> 
> Tested-by: Joel Stanley <joel@jms.id.au>
> 

Thank you very much for your time and the test results.

You were absolutely right to question the numbers. I've realized there were
some flaws in the previous benchmark logic that led to those inconsistent
ratios. I am sincerely sorry for the confusion this may have caused.

I am currently refining the implementation for v3 to ensure much more
accurate and reliable measurements. I'll send out the updated series
once it's ready.

Thanks again for helping me catch this!

-- 
With Best Regards,
Feng Jiang


^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2026-01-19  9:24 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-13  8:27 [PATCH v2 00/14] riscv: optimize string functions and add kunit tests Feng Jiang
2026-01-13  8:27 ` [PATCH v2 01/14] lib/string: extract generic strlen() into __generic_strlen() Feng Jiang
2026-01-13  8:33   ` Andy Shevchenko
2026-01-14  0:01   ` Eric Biggers
2026-01-14  1:41     ` Feng Jiang
2026-01-14  7:07     ` Andy Shevchenko
2026-01-14 10:10     ` David Laight
2026-01-15  6:50       ` Feng Jiang
2026-01-15  6:55         ` Andy Shevchenko
2026-01-13  8:27 ` [PATCH v2 02/14] lib/string: extract generic strnlen() into __generic_strnlen() Feng Jiang
2026-01-13  8:27 ` [PATCH v2 03/14] lib/string: extract generic strchr() into __generic_strchr() Feng Jiang
2026-01-13  8:27 ` [PATCH v2 04/14] lib/string: extract generic strrchr() into __generic_strrchr() Feng Jiang
2026-01-13  8:27 ` [PATCH v2 05/14] lib/string_kunit: add correctness test for strlen Feng Jiang
2026-01-13  8:27 ` [PATCH v2 06/14] lib/string_kunit: add correctness test for strnlen Feng Jiang
2026-01-13  8:41   ` Andy Shevchenko
2026-01-13  8:27 ` [PATCH v2 07/14] lib/string_kunit: add correctness test for strrchr() Feng Jiang
2026-01-13  8:27 ` [PATCH v2 08/14] lib/string_kunit: add performance benchmark for strlen() Feng Jiang
2026-01-13  8:46   ` Andy Shevchenko
2026-01-14  6:14     ` Feng Jiang
2026-01-14  7:04       ` Feng Jiang
2026-01-14  7:21         ` Andy Shevchenko
2026-01-14  8:05           ` Feng Jiang
2026-01-14 10:21         ` David Laight
2026-01-15  6:24           ` Feng Jiang
2026-01-15 10:40             ` David Laight
2026-01-18 11:11   ` kernel test robot
2026-01-13  8:27 ` [PATCH v2 09/14] lib/string_kunit: add performance benchmark for strnlen() Feng Jiang
2026-01-13  8:27 ` [PATCH v2 10/14] lib/string_kunit: add performance benchmark for strchr() Feng Jiang
2026-01-13  8:27 ` [PATCH v2 11/14] lib/string_kunit: add performance benchmark for strrchr() Feng Jiang
2026-01-13  8:27 ` [PATCH v2 12/14] riscv: lib: add strnlen implementation Feng Jiang
2026-01-13  8:48   ` Andy Shevchenko
2026-01-13  8:27 ` [PATCH v2 13/14] riscv: lib: add strchr implementation Feng Jiang
2026-01-13  8:27 ` [PATCH v2 14/14] riscv: lib: add strrchr implementation Feng Jiang
2026-01-13  8:52 ` [PATCH v2 00/14] riscv: optimize string functions and add kunit tests Andy Shevchenko
2026-01-15  4:43 ` Joel Stanley
2026-01-19  9:24   ` Feng Jiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox