* [PATCH v3 0/8] riscv: optimize string functions and add kunit tests
@ 2026-01-20 6:58 Feng Jiang
2026-01-20 6:58 ` [PATCH v3 1/8] lib/string_kunit: add correctness test for strlen Feng Jiang
` (8 more replies)
0 siblings, 9 replies; 29+ messages in thread
From: Feng Jiang @ 2026-01-20 6:58 UTC (permalink / raw)
To: pjw, palmer, aou, alex, akpm, kees, andy, jiangfeng, ebiggers,
martin.petersen, ardb, charlie, conor.dooley, ajones,
linus.walleij, nathan
Cc: linux-riscv, linux-kernel, linux-hardening
This series provides optimized implementations of strnlen(), strchr(),
and strrchr() for the RISC-V architecture. The strnlen implementation
is derived from the existing optimized strlen. For strchr and strrchr,
the current versions use simple byte-by-byte assembly logic, which
will serve as a baseline for future Zbb-based optimizations.
The patch series is organized into three parts:
1. Correctness Testing: The first three patches add KUnit test cases
for strlen, strnlen, and strrchr to ensure the baseline and optimized
versions are functionally correct.
2. Benchmarking Tool: Patches 4 and 5 extend string_kunit to include
performance measurement capabilities, allowing for comparative
analysis within the KUnit environment.
3. Architectural Optimizations: The final three patches introduce the
RISC-V specific assembly implementations.
Following suggestions from Andy Shevchenko, performance benchmarks have
been added to string_kunit.c to provide quantifiable evidence of the
improvements. Andy provided many specific comments on the implementation
of the benchmark logic, which is also inspired by Eric Biggers'
crc_benchmark(). Performance was measured in a QEMU TCG (rv64) environment,
comparing the generic C implementation with the new RISC-V assembly versions.
Performance Summary (Improvement %):
---------------------------------------------------------------
Function | 16 B (Short) | 512 B (Mid) | 4096 B (Long)
---------------------------------------------------------------
strnlen | +64.0% | +346.2% | +410.7%
strchr | +4.0% | +6.4% | +1.5%
strrchr | +6.6% | +2.8% | +0.0%
---------------------------------------------------------------
The benchmarks can be reproduced by enabling CONFIG_STRING_KUNIT_BENCH
and running: ./tools/testing/kunit/kunit.py run --arch=riscv \
--cross_compile=riscv64-linux-gnu- --kunitconfig=my_string.kunitconfig \
--raw_output
The strnlen implementation leverages the Zbb 'orc.b' instruction and
word-at-a-time logic, showing significant gains as the string length
increases. For strchr and strrchr, the handwritten assembly reduces
fixed overhead by eliminating stack frame management. The gain is most
prominent on short strings (1-16B) where function call overhead dominates,
while the performance converges with the C implementation for longer
strings in the TCG environment.
I would like to thank Andy Shevchenko for the suggestion to add benchmarks
and for his detailed feedback on the test framework, and Eric Biggers for
the benchmarking approach. Thanks also to Joel Stanley for testing support
and feedback, and to David Laight for his suggestions regarding performance
measurement.
Changes:
v3:
- Re-implement benchmark logic inspired by crc_benchmark().
- Add 'len - 2' test case to strnlen correctness tests.
- Incorporate detailed benchmark data into individual commit messages.
v2:
- Refactored lib/string.c to export __generic_* functions and added
corresponding functional/performance tests for strnlen, strchr,
and strrchr (Andy Shevchenko).
- Replaced magic numbers with STRING_TEST_MAX_LEN etc. (Andy Shevchenko).
v1: Initial submission.
---
Feng Jiang (8):
lib/string_kunit: add correctness test for strlen
lib/string_kunit: add correctness test for strnlen
lib/string_kunit: add correctness test for strrchr()
lib/string_kunit: add performance benchmarks for strlen
lib/string_kunit: extend benchmarks to strnlen and chr searches
riscv: lib: add strnlen implementation
riscv: lib: add strchr implementation
riscv: lib: add strrchr implementation
arch/riscv/include/asm/string.h | 9 ++
arch/riscv/lib/Makefile | 3 +
arch/riscv/lib/strchr.S | 35 +++++
arch/riscv/lib/strnlen.S | 164 ++++++++++++++++++++
arch/riscv/lib/strrchr.S | 37 +++++
arch/riscv/purgatory/Makefile | 11 +-
lib/Kconfig.debug | 11 ++
lib/tests/string_kunit.c | 258 ++++++++++++++++++++++++++++++++
8 files changed, 527 insertions(+), 1 deletion(-)
create mode 100644 arch/riscv/lib/strchr.S
create mode 100644 arch/riscv/lib/strnlen.S
create mode 100644 arch/riscv/lib/strrchr.S
--
2.25.1
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH v3 1/8] lib/string_kunit: add correctness test for strlen
2026-01-20 6:58 [PATCH v3 0/8] riscv: optimize string functions and add kunit tests Feng Jiang
@ 2026-01-20 6:58 ` Feng Jiang
2026-01-20 7:28 ` Andy Shevchenko
2026-01-20 6:58 ` [PATCH v3 2/8] lib/string_kunit: add correctness test for strnlen Feng Jiang
` (7 subsequent siblings)
8 siblings, 1 reply; 29+ messages in thread
From: Feng Jiang @ 2026-01-20 6:58 UTC (permalink / raw)
To: pjw, palmer, aou, alex, akpm, kees, andy, jiangfeng, ebiggers,
martin.petersen, ardb, charlie, conor.dooley, ajones,
linus.walleij, nathan
Cc: linux-riscv, linux-kernel, linux-hardening, Joel Stanley
Add a KUnit test for strlen() to verify correctness across
different string lengths and memory alignments.
Suggested-by: Andy Shevchenko <andy@kernel.org>
Tested-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Feng Jiang <jiangfeng@kylinos.cn>
---
lib/tests/string_kunit.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/lib/tests/string_kunit.c b/lib/tests/string_kunit.c
index f9a8e557ba77..88da97e50c8e 100644
--- a/lib/tests/string_kunit.c
+++ b/lib/tests/string_kunit.c
@@ -17,6 +17,9 @@
#define STRCMP_TEST_EXPECT_LOWER(test, fn, ...) KUNIT_EXPECT_LT(test, fn(__VA_ARGS__), 0)
#define STRCMP_TEST_EXPECT_GREATER(test, fn, ...) KUNIT_EXPECT_GT(test, fn(__VA_ARGS__), 0)
+#define STRING_TEST_MAX_LEN 128
+#define STRING_TEST_MAX_OFFSET 16
+
static void string_test_memset16(struct kunit *test)
{
unsigned i, j, k;
@@ -104,6 +107,28 @@ static void string_test_memset64(struct kunit *test)
}
}
+static void string_test_strlen(struct kunit *test)
+{
+ char *s;
+ size_t len, offset;
+ const size_t buf_size = STRING_TEST_MAX_LEN + STRING_TEST_MAX_OFFSET + 1;
+
+ s = kunit_kzalloc(test, buf_size, GFP_KERNEL);
+ KUNIT_ASSERT_NOT_ERR_OR_NULL(test, s);
+
+ memset(s, 'A', buf_size);
+ s[buf_size - 1] = '\0';
+
+ for (offset = 0; offset < STRING_TEST_MAX_OFFSET; offset++) {
+ for (len = 0; len <= STRING_TEST_MAX_LEN; len++) {
+ s[offset + len] = '\0';
+ KUNIT_EXPECT_EQ_MSG(test, strlen(s + offset), len,
+ "offset:%zu len:%zu", offset, len);
+ s[offset + len] = 'A';
+ }
+ }
+}
+
static void string_test_strchr(struct kunit *test)
{
const char *test_string = "abcdefghijkl";
@@ -618,6 +643,7 @@ static struct kunit_case string_test_cases[] = {
KUNIT_CASE(string_test_memset16),
KUNIT_CASE(string_test_memset32),
KUNIT_CASE(string_test_memset64),
+ KUNIT_CASE(string_test_strlen),
KUNIT_CASE(string_test_strchr),
KUNIT_CASE(string_test_strnchr),
KUNIT_CASE(string_test_strspn),
--
2.25.1
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v3 2/8] lib/string_kunit: add correctness test for strnlen
2026-01-20 6:58 [PATCH v3 0/8] riscv: optimize string functions and add kunit tests Feng Jiang
2026-01-20 6:58 ` [PATCH v3 1/8] lib/string_kunit: add correctness test for strlen Feng Jiang
@ 2026-01-20 6:58 ` Feng Jiang
2026-01-20 7:29 ` Andy Shevchenko
2026-01-20 6:58 ` [PATCH v3 3/8] lib/string_kunit: add correctness test for strrchr() Feng Jiang
` (6 subsequent siblings)
8 siblings, 1 reply; 29+ messages in thread
From: Feng Jiang @ 2026-01-20 6:58 UTC (permalink / raw)
To: pjw, palmer, aou, alex, akpm, kees, andy, jiangfeng, ebiggers,
martin.petersen, ardb, charlie, conor.dooley, ajones,
linus.walleij, nathan
Cc: linux-riscv, linux-kernel, linux-hardening, Joel Stanley
Add a KUnit test for strnlen() to verify correctness across
different string lengths and memory alignments.
Suggested-by: Andy Shevchenko <andy@kernel.org>
Tested-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Feng Jiang <jiangfeng@kylinos.cn>
---
lib/tests/string_kunit.c | 33 +++++++++++++++++++++++++++++++++
1 file changed, 33 insertions(+)
diff --git a/lib/tests/string_kunit.c b/lib/tests/string_kunit.c
index 88da97e50c8e..546cf403317c 100644
--- a/lib/tests/string_kunit.c
+++ b/lib/tests/string_kunit.c
@@ -129,6 +129,38 @@ static void string_test_strlen(struct kunit *test)
}
}
+static void string_test_strnlen(struct kunit *test)
+{
+ char *s;
+ size_t len, offset;
+ const size_t buf_size = STRING_TEST_MAX_LEN + STRING_TEST_MAX_OFFSET + 1;
+
+ s = kunit_kzalloc(test, buf_size, GFP_KERNEL);
+ KUNIT_ASSERT_NOT_ERR_OR_NULL(test, s);
+
+ memset(s, 'A', buf_size);
+ s[buf_size - 1] = '\0';
+
+ for (offset = 0; offset < STRING_TEST_MAX_OFFSET; offset++) {
+ for (len = 0; len <= STRING_TEST_MAX_LEN; len++) {
+ s[offset + len] = '\0';
+
+ if (len > 0)
+ KUNIT_EXPECT_EQ(test, strnlen(s + offset, len - 1), len - 1);
+ if (len > 1)
+ KUNIT_EXPECT_EQ(test, strnlen(s + offset, len - 2), len - 2);
+
+ KUNIT_EXPECT_EQ(test, strnlen(s + offset, len), len);
+
+ KUNIT_EXPECT_EQ(test, strnlen(s + offset, len + 1), len);
+ KUNIT_EXPECT_EQ(test, strnlen(s + offset, len + 2), len);
+ KUNIT_EXPECT_EQ(test, strnlen(s + offset, len + 10), len);
+
+ s[offset + len] = 'A';
+ }
+ }
+}
+
static void string_test_strchr(struct kunit *test)
{
const char *test_string = "abcdefghijkl";
@@ -644,6 +676,7 @@ static struct kunit_case string_test_cases[] = {
KUNIT_CASE(string_test_memset32),
KUNIT_CASE(string_test_memset64),
KUNIT_CASE(string_test_strlen),
+ KUNIT_CASE(string_test_strnlen),
KUNIT_CASE(string_test_strchr),
KUNIT_CASE(string_test_strnchr),
KUNIT_CASE(string_test_strspn),
--
2.25.1
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v3 3/8] lib/string_kunit: add correctness test for strrchr()
2026-01-20 6:58 [PATCH v3 0/8] riscv: optimize string functions and add kunit tests Feng Jiang
2026-01-20 6:58 ` [PATCH v3 1/8] lib/string_kunit: add correctness test for strlen Feng Jiang
2026-01-20 6:58 ` [PATCH v3 2/8] lib/string_kunit: add correctness test for strnlen Feng Jiang
@ 2026-01-20 6:58 ` Feng Jiang
2026-01-20 7:30 ` Andy Shevchenko
2026-01-20 6:58 ` [PATCH v3 4/8] lib/string_kunit: add performance benchmarks for strlen Feng Jiang
` (5 subsequent siblings)
8 siblings, 1 reply; 29+ messages in thread
From: Feng Jiang @ 2026-01-20 6:58 UTC (permalink / raw)
To: pjw, palmer, aou, alex, akpm, kees, andy, jiangfeng, ebiggers,
martin.petersen, ardb, charlie, conor.dooley, ajones,
linus.walleij, nathan
Cc: linux-riscv, linux-kernel, linux-hardening, Joel Stanley
Introduce a KUnit test to verify strrchr() across various memory
alignments and character positions. This ensures the implementation
correctly identifies the last occurrence of a character.
Suggested-by: Andy Shevchenko <andy@kernel.org>
Tested-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Feng Jiang <jiangfeng@kylinos.cn>
---
lib/tests/string_kunit.c | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/lib/tests/string_kunit.c b/lib/tests/string_kunit.c
index 546cf403317c..8f836847a80e 100644
--- a/lib/tests/string_kunit.c
+++ b/lib/tests/string_kunit.c
@@ -184,6 +184,35 @@ static void string_test_strchr(struct kunit *test)
KUNIT_ASSERT_NULL(test, result);
}
+static void string_test_strrchr(struct kunit *test)
+{
+ char *buf;
+ size_t offset, len;
+ const size_t buf_size = STRING_TEST_MAX_LEN + STRING_TEST_MAX_OFFSET + 1;
+
+ buf = kunit_kzalloc(test, buf_size, GFP_KERNEL);
+ KUNIT_ASSERT_NOT_ERR_OR_NULL(test, buf);
+
+ memset(buf, 'A', buf_size);
+ buf[buf_size - 1] = '\0';
+
+ for (offset = 0; offset < STRING_TEST_MAX_OFFSET; offset++) {
+ for (len = 0; len <= STRING_TEST_MAX_LEN; len++) {
+ buf[offset + len] = '\0';
+
+ KUNIT_EXPECT_PTR_EQ(test, strrchr(buf + offset, 'Z'), NULL);
+
+ if (len > 0)
+ KUNIT_EXPECT_PTR_EQ(test, strrchr(buf + offset, 'A'),
+ buf + offset + len - 1);
+ else
+ KUNIT_EXPECT_PTR_EQ(test, strrchr(buf + offset, 'A'), NULL);
+
+ buf[offset + len] = 'A';
+ }
+ }
+}
+
static void string_test_strnchr(struct kunit *test)
{
const char *test_string = "abcdefghijkl";
@@ -679,6 +708,7 @@ static struct kunit_case string_test_cases[] = {
KUNIT_CASE(string_test_strnlen),
KUNIT_CASE(string_test_strchr),
KUNIT_CASE(string_test_strnchr),
+ KUNIT_CASE(string_test_strrchr),
KUNIT_CASE(string_test_strspn),
KUNIT_CASE(string_test_strcmp),
KUNIT_CASE(string_test_strcmp_long_strings),
--
2.25.1
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v3 4/8] lib/string_kunit: add performance benchmarks for strlen
2026-01-20 6:58 [PATCH v3 0/8] riscv: optimize string functions and add kunit tests Feng Jiang
` (2 preceding siblings ...)
2026-01-20 6:58 ` [PATCH v3 3/8] lib/string_kunit: add correctness test for strrchr() Feng Jiang
@ 2026-01-20 6:58 ` Feng Jiang
2026-01-20 7:46 ` Andy Shevchenko
2026-01-20 6:58 ` [PATCH v3 5/8] lib/string_kunit: extend benchmarks to strnlen and chr searches Feng Jiang
` (4 subsequent siblings)
8 siblings, 1 reply; 29+ messages in thread
From: Feng Jiang @ 2026-01-20 6:58 UTC (permalink / raw)
To: pjw, palmer, aou, alex, akpm, kees, andy, jiangfeng, ebiggers,
martin.petersen, ardb, charlie, conor.dooley, ajones,
linus.walleij, nathan
Cc: linux-riscv, linux-kernel, linux-hardening, Joel Stanley
Introduce a benchmarking framework to the string_kunit test suite to
measure the execution efficiency of string functions.
The implementation is inspired by crc_benchmark(), measuring throughput
(MB/s) and latency (ns/call) across a range of string lengths. It
includes a warm-up phase, disables preemption during measurement, and
uses a fixed seed for reproducible results.
This allows for comparing different implementations (e.g., generic C vs.
architecture-optimized assembly) within the KUnit environment.
Initially, provide benchmarks for strlen().
Suggested-by: Andy Shevchenko <andy@kernel.org>
Suggested-by: Eric Biggers <ebiggers@kernel.org>
Tested-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Feng Jiang <jiangfeng@kylinos.cn>
---
lib/Kconfig.debug | 11 +++
lib/tests/string_kunit.c | 151 +++++++++++++++++++++++++++++++++++++++
2 files changed, 162 insertions(+)
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index ba36939fda79..21b058ae815f 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2475,6 +2475,17 @@ config STRING_HELPERS_KUNIT_TEST
depends on KUNIT
default KUNIT_ALL_TESTS
+config STRING_KUNIT_BENCH
+ bool "Benchmark string functions at runtime"
+ depends on STRING_KUNIT_TEST
+ help
+ Enable performance measurement for string functions.
+
+ This measures the execution efficiency of string functions
+ during the KUnit test run.
+
+ If unsure, say N.
+
config FFS_KUNIT_TEST
tristate "KUnit test ffs-family functions at runtime" if !KUNIT_ALL_TESTS
depends on KUNIT
diff --git a/lib/tests/string_kunit.c b/lib/tests/string_kunit.c
index 8f836847a80e..e20e924d1c67 100644
--- a/lib/tests/string_kunit.c
+++ b/lib/tests/string_kunit.c
@@ -6,7 +6,9 @@
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
#include <kunit/test.h>
+#include <linux/math64.h>
#include <linux/module.h>
+#include <linux/prandom.h>
#include <linux/printk.h>
#include <linux/slab.h>
#include <linux/string.h>
@@ -20,6 +22,9 @@
#define STRING_TEST_MAX_LEN 128
#define STRING_TEST_MAX_OFFSET 16
+#define STRING_BENCH_SEED 888
+#define STRING_BENCH_WORKLOAD 1000000UL
+
static void string_test_memset16(struct kunit *test)
{
unsigned i, j, k;
@@ -700,6 +705,151 @@ static void string_test_strends(struct kunit *test)
KUNIT_EXPECT_TRUE(test, strends("", ""));
}
+/* Target string lengths for benchmarking */
+static const size_t bench_lens[] = {
+ 0, 1, 7, 8, 16, 31, 64, 127, 512, 1024, 3173, 4096
+};
+
+/**
+ * alloc_max_bench_buffer() - Allocate buffer for the max test case.
+ * @test: KUnit context for managed allocation.
+ * @lens: Array of lengths used in the benchmark cases.
+ * @count: Number of elements in the @lens array.
+ * @buf_len: [out] Pointer to store the actually allocated buffer
+ * size (including null).
+ *
+ * Return: Pointer to the allocated memory, or NULL on failure.
+ */
+static void *alloc_max_bench_buffer(struct kunit *test,
+ const size_t *lens, size_t count, size_t *buf_len)
+{
+ void *buf;
+ size_t i, max_len = 0;
+
+ for (i = 0; i < count; i++) {
+ if (max_len < lens[i])
+ max_len = lens[i];
+ }
+
+ /* Add space for NUL terminator */
+ max_len += 1;
+
+ buf = kunit_kzalloc(test, max_len, GFP_KERNEL);
+ if (buf && buf_len)
+ *buf_len = max_len;
+
+ return buf;
+}
+
+/**
+ * fill_random_string() - Fill buffer with random non-null bytes.
+ * @buf: Buffer to fill.
+ * @len: Number of bytes to fill.
+ */
+static void fill_random_string(char *buf, size_t len)
+{
+ size_t i;
+ struct rnd_state state;
+
+ if (!buf || !len)
+ return;
+
+ /* Use a fixed seed to ensure deterministic benchmark results */
+ prandom_seed_state(&state, 888);
+ prandom_bytes_state(&state, buf, len);
+
+ /* Replace null bytes to avoid early string termination */
+ for (i = 0; i < len; i++) {
+ if (buf[i] == '\0')
+ buf[i] = 0x01;
+ }
+
+ buf[len - 1] = '\0';
+}
+
+/**
+ * STRING_BENCH() - Benchmark string functions.
+ * @iters: Number of iterations to run.
+ * @func: Function to benchmark.
+ * @...: Variable arguments passed to @func.
+ *
+ * Disables preemption and measures the total time in nanoseconds to execute
+ * @func(@__VA_ARGS__) for @iters times, including a small warm-up phase.
+ *
+ * Context: Disables preemption during measurement.
+ * Return: Total execution time in nanoseconds (u64).
+ */
+#define STRING_BENCH(iters, func, ...) \
+({ \
+ u64 __bn_t; \
+ size_t __bn_i; \
+ size_t __bn_iters = (iters); \
+ size_t __bn_warm_iters = max_t(size_t, __bn_iters / 10, 50U); \
+ /* Volatile function pointer prevents dead code elimination */ \
+ typeof(func) (* volatile __func) = (func); \
+ \
+ for (__bn_i = 0; __bn_i < __bn_warm_iters; __bn_i++) \
+ (void)__func(__VA_ARGS__); \
+ \
+ preempt_disable(); \
+ __bn_t = ktime_get_ns(); \
+ for (__bn_i = 0; __bn_i < __bn_iters; __bn_i++) \
+ (void)__func(__VA_ARGS__); \
+ __bn_t = ktime_get_ns() - __bn_t; \
+ preempt_enable(); \
+ __bn_t; \
+})
+
+/**
+ * STRING_BENCH_BUF() - Benchmark harness for single-buffer functions.
+ * @test: KUnit context.
+ * @buf_name: Local char * variable name to be defined.
+ * @buf_size: Local size_t variable name to be defined.
+ * @func: Function to benchmark.
+ * @...: Extra arguments for @func.
+ *
+ * Prepares a randomized, null-terminated buffer and iterates through lengths
+ * in bench_lens, defining @buf_name and @buf_size in each loop.
+ */
+#define STRING_BENCH_BUF(test, buf_name, buf_size, func, ...) \
+do { \
+ char *buf_name, *_bn_buf; \
+ size_t buf_size, _bn_i, _bn_iters, _bn_size = 0; \
+ u64 _bn_t, _bn_mbps = 0, _bn_lat = 0; \
+ \
+ if (!IS_ENABLED(CONFIG_STRING_KUNIT_BENCH)) \
+ kunit_skip(test, "not enabled"); \
+ \
+ _bn_buf = alloc_max_bench_buffer(test, bench_lens, \
+ ARRAY_SIZE(bench_lens), &_bn_size); \
+ KUNIT_ASSERT_NOT_ERR_OR_NULL(test, _bn_buf); \
+ \
+ fill_random_string(_bn_buf, _bn_size); \
+ _bn_buf[_bn_size - 1] = '\0'; \
+ \
+ for (_bn_i = 0; _bn_i < ARRAY_SIZE(bench_lens); _bn_i++) { \
+ buf_size = bench_lens[_bn_i]; \
+ buf_name = _bn_buf + _bn_size - buf_size - 1; \
+ _bn_iters = STRING_BENCH_WORKLOAD / \
+ max_t(size_t, buf_size, 1U); \
+ \
+ _bn_t = STRING_BENCH(_bn_iters, func, ##__VA_ARGS__); \
+ \
+ if (_bn_t > 0) { \
+ _bn_mbps = (u64)(buf_size) * _bn_iters * 1000; \
+ _bn_mbps = div64_u64(_bn_mbps, _bn_t); \
+ _bn_lat = div64_u64(_bn_t, _bn_iters); \
+ } \
+ kunit_info(test, "len=%zu: %llu MB/s (%llu ns/call)\n", \
+ buf_size, _bn_mbps, _bn_lat); \
+ } \
+} while (0)
+
+static void string_bench_strlen(struct kunit *test)
+{
+ STRING_BENCH_BUF(test, buf, len, strlen, buf);
+}
+
static struct kunit_case string_test_cases[] = {
KUNIT_CASE(string_test_memset16),
KUNIT_CASE(string_test_memset32),
@@ -725,6 +875,7 @@ static struct kunit_case string_test_cases[] = {
KUNIT_CASE(string_test_strtomem),
KUNIT_CASE(string_test_memtostr),
KUNIT_CASE(string_test_strends),
+ KUNIT_CASE(string_bench_strlen),
{}
};
--
2.25.1
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v3 5/8] lib/string_kunit: extend benchmarks to strnlen and chr searches
2026-01-20 6:58 [PATCH v3 0/8] riscv: optimize string functions and add kunit tests Feng Jiang
` (3 preceding siblings ...)
2026-01-20 6:58 ` [PATCH v3 4/8] lib/string_kunit: add performance benchmarks for strlen Feng Jiang
@ 2026-01-20 6:58 ` Feng Jiang
2026-01-20 7:48 ` Andy Shevchenko
2026-01-20 6:58 ` [PATCH v3 6/8] riscv: lib: add strnlen implementation Feng Jiang
` (3 subsequent siblings)
8 siblings, 1 reply; 29+ messages in thread
From: Feng Jiang @ 2026-01-20 6:58 UTC (permalink / raw)
To: pjw, palmer, aou, alex, akpm, kees, andy, jiangfeng, ebiggers,
martin.petersen, ardb, charlie, conor.dooley, ajones,
linus.walleij, nathan
Cc: linux-riscv, linux-kernel, linux-hardening, Joel Stanley
Extend the string benchmarking suite to include strnlen(), strchr(),
and strrchr().
For character search functions (strchr and strrchr), the benchmark
targets the null terminator. This ensures the entire string is scanned,
providing a consistent measure of full-length processing efficiency
comparable to strlen().
Suggested-by: Andy Shevchenko <andy@kernel.org>
Suggested-by: Eric Biggers <ebiggers@kernel.org>
Tested-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Feng Jiang <jiangfeng@kylinos.cn>
---
lib/tests/string_kunit.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/lib/tests/string_kunit.c b/lib/tests/string_kunit.c
index e20e924d1c67..9d76777ad753 100644
--- a/lib/tests/string_kunit.c
+++ b/lib/tests/string_kunit.c
@@ -850,6 +850,21 @@ static void string_bench_strlen(struct kunit *test)
STRING_BENCH_BUF(test, buf, len, strlen, buf);
}
+static void string_bench_strnlen(struct kunit *test)
+{
+ STRING_BENCH_BUF(test, buf, len, strnlen, buf, len);
+}
+
+static void string_bench_strchr(struct kunit *test)
+{
+ STRING_BENCH_BUF(test, buf, len, strchr, buf, '\0');
+}
+
+static void string_bench_strrchr(struct kunit *test)
+{
+ STRING_BENCH_BUF(test, buf, len, strrchr, buf, '\0');
+}
+
static struct kunit_case string_test_cases[] = {
KUNIT_CASE(string_test_memset16),
KUNIT_CASE(string_test_memset32),
@@ -876,6 +891,9 @@ static struct kunit_case string_test_cases[] = {
KUNIT_CASE(string_test_memtostr),
KUNIT_CASE(string_test_strends),
KUNIT_CASE(string_bench_strlen),
+ KUNIT_CASE(string_bench_strnlen),
+ KUNIT_CASE(string_bench_strchr),
+ KUNIT_CASE(string_bench_strrchr),
{}
};
--
2.25.1
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v3 6/8] riscv: lib: add strnlen implementation
2026-01-20 6:58 [PATCH v3 0/8] riscv: optimize string functions and add kunit tests Feng Jiang
` (4 preceding siblings ...)
2026-01-20 6:58 ` [PATCH v3 5/8] lib/string_kunit: extend benchmarks to strnlen and chr searches Feng Jiang
@ 2026-01-20 6:58 ` Feng Jiang
2026-01-20 7:31 ` Andy Shevchenko
2026-01-21 7:24 ` Qingfang Deng
2026-01-20 6:58 ` [PATCH v3 7/8] riscv: lib: add strchr implementation Feng Jiang
` (2 subsequent siblings)
8 siblings, 2 replies; 29+ messages in thread
From: Feng Jiang @ 2026-01-20 6:58 UTC (permalink / raw)
To: pjw, palmer, aou, alex, akpm, kees, andy, jiangfeng, ebiggers,
martin.petersen, ardb, charlie, conor.dooley, ajones,
linus.walleij, nathan
Cc: linux-riscv, linux-kernel, linux-hardening, Joel Stanley
Add an optimized strnlen() implementation for RISC-V. This version
includes a generic word-at-a-time optimization and a Zbb-powered
optimization using the 'orc.b' instruction, derived from the strlen
implementation.
Benchmark results (QEMU TCG, rv64):
Length | Original (MB/s) | Optimized (MB/s) | Improvement
-------|-----------------|------------------|------------
16 B | 189 | 310 | +64.0%
512 B | 344 | 1535 | +346.2%
4096 B | 363 | 1854 | +410.7%
Suggested-by: Andy Shevchenko <andy@kernel.org>
Tested-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Feng Jiang <jiangfeng@kylinos.cn>
---
arch/riscv/include/asm/string.h | 3 +
arch/riscv/lib/Makefile | 1 +
arch/riscv/lib/strnlen.S | 164 ++++++++++++++++++++++++++++++++
arch/riscv/purgatory/Makefile | 5 +-
4 files changed, 172 insertions(+), 1 deletion(-)
create mode 100644 arch/riscv/lib/strnlen.S
diff --git a/arch/riscv/include/asm/string.h b/arch/riscv/include/asm/string.h
index 5ba77f60bf0b..16634d67c217 100644
--- a/arch/riscv/include/asm/string.h
+++ b/arch/riscv/include/asm/string.h
@@ -28,6 +28,9 @@ extern asmlinkage __kernel_size_t strlen(const char *);
#define __HAVE_ARCH_STRNCMP
extern asmlinkage int strncmp(const char *cs, const char *ct, size_t count);
+
+#define __HAVE_ARCH_STRNLEN
+extern asmlinkage __kernel_size_t strnlen(const char *, size_t);
#endif
/* For those files which don't want to check by kasan. */
diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile
index bbc031124974..0969d8136df0 100644
--- a/arch/riscv/lib/Makefile
+++ b/arch/riscv/lib/Makefile
@@ -7,6 +7,7 @@ ifeq ($(CONFIG_KASAN_GENERIC)$(CONFIG_KASAN_SW_TAGS),)
lib-y += strcmp.o
lib-y += strlen.o
lib-y += strncmp.o
+lib-y += strnlen.o
endif
lib-y += csum.o
ifeq ($(CONFIG_MMU), y)
diff --git a/arch/riscv/lib/strnlen.S b/arch/riscv/lib/strnlen.S
new file mode 100644
index 000000000000..4af0df9442f1
--- /dev/null
+++ b/arch/riscv/lib/strnlen.S
@@ -0,0 +1,164 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+/*
+ * Base on arch/riscv/lib/strlen.S
+ *
+ * Copyright (C) Feng Jiang <jiangfeng@kylinos.cn>
+ */
+
+#include <linux/linkage.h>
+#include <asm/asm.h>
+#include <asm/alternative-macros.h>
+#include <asm/hwcap.h>
+
+/* size_t strnlen(const char *s, size_t count) */
+SYM_FUNC_START(strnlen)
+
+ __ALTERNATIVE_CFG("nop", "j strnlen_zbb", 0, RISCV_ISA_EXT_ZBB,
+ IS_ENABLED(CONFIG_RISCV_ISA_ZBB) && IS_ENABLED(CONFIG_TOOLCHAIN_HAS_ZBB))
+
+
+ /*
+ * Returns
+ * a0 - String length
+ *
+ * Parameters
+ * a0 - String to measure
+ * a1 - Max length of string
+ *
+ * Clobbers
+ * t0, t1, t2
+ */
+ addi t1, a0, -1
+ add t2, a0, a1
+1:
+ addi t1, t1, 1
+ beq t1, t2, 2f
+ lbu t0, 0(t1)
+ bnez t0, 1b
+2:
+ sub a0, t1, a0
+ ret
+
+
+/*
+ * Variant of strnlen using the ZBB extension if available
+ */
+#if defined(CONFIG_RISCV_ISA_ZBB) && defined(CONFIG_TOOLCHAIN_HAS_ZBB)
+strnlen_zbb:
+
+#ifdef CONFIG_CPU_BIG_ENDIAN
+# define CZ clz
+# define SHIFT sll
+#else
+# define CZ ctz
+# define SHIFT srl
+#endif
+
+.option push
+.option arch,+zbb
+
+ /*
+ * Returns
+ * a0 - String length
+ *
+ * Parameters
+ * a0 - String to measure
+ * a1 - Max length of string
+ *
+ * Clobbers
+ * t0, t1, t2, t3, t4
+ */
+
+ /* If maxlen is 0, return 0. */
+ beqz a1, 3f
+
+ /* Number of irrelevant bytes in the first word. */
+ andi t2, a0, SZREG-1
+
+ /* Align pointer. */
+ andi t0, a0, -SZREG
+
+ li t3, SZREG
+ sub t3, t3, t2
+ slli t2, t2, 3
+
+ /* Aligned boundary. */
+ add t4, a0, a1
+ andi t4, t4, -SZREG
+
+ /* Get the first word. */
+ REG_L t1, 0(t0)
+
+ /*
+ * Shift away the partial data we loaded to remove the irrelevant bytes
+ * preceding the string with the effect of adding NUL bytes at the
+ * end of the string's first word.
+ */
+ SHIFT t1, t1, t2
+
+ /* Convert non-NUL into 0xff and NUL into 0x00. */
+ orc.b t1, t1
+
+ /* Convert non-NUL into 0x00 and NUL into 0xff. */
+ not t1, t1
+
+ /*
+ * Search for the first set bit (corresponding to a NUL byte in the
+ * original chunk).
+ */
+ CZ t1, t1
+
+ /*
+ * The first chunk is special: compare against the number
+ * of valid bytes in this chunk.
+ */
+ srli a0, t1, 3
+
+ /* Limit the result by maxlen. */
+ bleu a1, a0, 3f
+
+ bgtu t3, a0, 2f
+
+ /* Prepare for the word comparison loop. */
+ addi t2, t0, SZREG
+ li t3, -1
+
+ /*
+ * Our critical loop is 4 instructions and processes data in
+ * 4 byte or 8 byte chunks.
+ */
+ .p2align 3
+1:
+ REG_L t1, SZREG(t0)
+ addi t0, t0, SZREG
+ orc.b t1, t1
+ bgeu t0, t4, 4f
+ beq t1, t3, 1b
+4:
+ not t1, t1
+ CZ t1, t1
+ srli t1, t1, 3
+
+ /* Get number of processed bytes. */
+ sub t2, t0, t2
+
+ /* Add number of characters in the first word. */
+ add a0, a0, t2
+
+ /* Add number of characters in the last word. */
+ add a0, a0, t1
+
+ /* Ensure the final result does not exceed maxlen. */
+ bgeu a0, a1, 3f
+2:
+ ret
+3:
+ mv a0, a1
+ ret
+
+.option pop
+#endif
+SYM_FUNC_END(strnlen)
+SYM_FUNC_ALIAS(__pi_strnlen, strnlen)
+EXPORT_SYMBOL(strnlen)
diff --git a/arch/riscv/purgatory/Makefile b/arch/riscv/purgatory/Makefile
index 530e497ca2f9..d7c0533108be 100644
--- a/arch/riscv/purgatory/Makefile
+++ b/arch/riscv/purgatory/Makefile
@@ -2,7 +2,7 @@
purgatory-y := purgatory.o sha256.o entry.o string.o ctype.o memcpy.o memset.o
ifeq ($(CONFIG_KASAN_GENERIC)$(CONFIG_KASAN_SW_TAGS),)
-purgatory-y += strcmp.o strlen.o strncmp.o
+purgatory-y += strcmp.o strlen.o strncmp.o strnlen.o
endif
targets += $(purgatory-y)
@@ -32,6 +32,9 @@ $(obj)/strncmp.o: $(srctree)/arch/riscv/lib/strncmp.S FORCE
$(obj)/sha256.o: $(srctree)/lib/crypto/sha256.c FORCE
$(call if_changed_rule,cc_o_c)
+$(obj)/strnlen.o: $(srctree)/arch/riscv/lib/strnlen.S FORCE
+ $(call if_changed_rule,as_o_S)
+
CFLAGS_sha256.o := -D__DISABLE_EXPORTS -D__NO_FORTIFY
CFLAGS_string.o := -D__DISABLE_EXPORTS
CFLAGS_ctype.o := -D__DISABLE_EXPORTS
--
2.25.1
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v3 7/8] riscv: lib: add strchr implementation
2026-01-20 6:58 [PATCH v3 0/8] riscv: optimize string functions and add kunit tests Feng Jiang
` (5 preceding siblings ...)
2026-01-20 6:58 ` [PATCH v3 6/8] riscv: lib: add strnlen implementation Feng Jiang
@ 2026-01-20 6:58 ` Feng Jiang
2026-01-20 7:31 ` Andy Shevchenko
2026-01-20 6:58 ` [PATCH v3 8/8] riscv: lib: add strrchr implementation Feng Jiang
2026-01-20 7:36 ` [PATCH v3 0/8] riscv: optimize string functions and add kunit tests Andy Shevchenko
8 siblings, 1 reply; 29+ messages in thread
From: Feng Jiang @ 2026-01-20 6:58 UTC (permalink / raw)
To: pjw, palmer, aou, alex, akpm, kees, andy, jiangfeng, ebiggers,
martin.petersen, ardb, charlie, conor.dooley, ajones,
linus.walleij, nathan
Cc: linux-riscv, linux-kernel, linux-hardening, Joel Stanley
Add an assembly implementation of strchr() for RISC-V.
By eliminating stack frame management (prologue/epilogue) and optimizing
the function entries, the assembly version provides significant relative
gains for short strings where the fixed overhead of the C function is
most prominent. As the string length increases, the performance converges
with the byte-oriented scan logic.
Benchmark results (QEMU TCG, rv64):
Length | Original (MB/s) | Optimized (MB/s) | Improvement
-------|-----------------|------------------|------------
1 B | 21 | 23 | +9.5%
7 B | 118 | 126 | +6.7%
16 B | 200 | 208 | +4.0%
512 B | 375 | 399 | +6.4%
4096 B | 395 | 401 | +1.5%
Suggested-by: Andy Shevchenko <andy@kernel.org>
Tested-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Feng Jiang <jiangfeng@kylinos.cn>
---
arch/riscv/include/asm/string.h | 3 +++
arch/riscv/lib/Makefile | 1 +
arch/riscv/lib/strchr.S | 35 +++++++++++++++++++++++++++++++++
arch/riscv/purgatory/Makefile | 5 ++++-
4 files changed, 43 insertions(+), 1 deletion(-)
create mode 100644 arch/riscv/lib/strchr.S
diff --git a/arch/riscv/include/asm/string.h b/arch/riscv/include/asm/string.h
index 16634d67c217..ca3ade82b124 100644
--- a/arch/riscv/include/asm/string.h
+++ b/arch/riscv/include/asm/string.h
@@ -31,6 +31,9 @@ extern asmlinkage int strncmp(const char *cs, const char *ct, size_t count);
#define __HAVE_ARCH_STRNLEN
extern asmlinkage __kernel_size_t strnlen(const char *, size_t);
+
+#define __HAVE_ARCH_STRCHR
+extern asmlinkage char *strchr(const char *, int);
#endif
/* For those files which don't want to check by kasan. */
diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile
index 0969d8136df0..b7f804dce1c3 100644
--- a/arch/riscv/lib/Makefile
+++ b/arch/riscv/lib/Makefile
@@ -8,6 +8,7 @@ lib-y += strcmp.o
lib-y += strlen.o
lib-y += strncmp.o
lib-y += strnlen.o
+lib-y += strchr.o
endif
lib-y += csum.o
ifeq ($(CONFIG_MMU), y)
diff --git a/arch/riscv/lib/strchr.S b/arch/riscv/lib/strchr.S
new file mode 100644
index 000000000000..48c3a9da53e3
--- /dev/null
+++ b/arch/riscv/lib/strchr.S
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+/*
+ * Copyright (C) 2025 Feng Jiang <jiangfeng@kylinos.cn>
+ */
+
+#include <linux/linkage.h>
+#include <asm/asm.h>
+
+/* char *strchr(const char *s, int c) */
+SYM_FUNC_START(strchr)
+ /*
+ * Parameters
+ * a0 - The string to be searched
+ * a1 - The character to search for
+ *
+ * Returns
+ * a0 - Address of first occurrence of 'c' or 0
+ *
+ * Clobbers
+ * t0
+ */
+ andi a1, a1, 0xff
+1:
+ lbu t0, 0(a0)
+ beq t0, a1, 2f
+ addi a0, a0, 1
+ bnez t0, 1b
+ li a0, 0
+2:
+ ret
+SYM_FUNC_END(strchr)
+
+SYM_FUNC_ALIAS_WEAK(__pi_strchr, strchr)
+EXPORT_SYMBOL(strchr)
diff --git a/arch/riscv/purgatory/Makefile b/arch/riscv/purgatory/Makefile
index d7c0533108be..e7b3d748c913 100644
--- a/arch/riscv/purgatory/Makefile
+++ b/arch/riscv/purgatory/Makefile
@@ -2,7 +2,7 @@
purgatory-y := purgatory.o sha256.o entry.o string.o ctype.o memcpy.o memset.o
ifeq ($(CONFIG_KASAN_GENERIC)$(CONFIG_KASAN_SW_TAGS),)
-purgatory-y += strcmp.o strlen.o strncmp.o strnlen.o
+purgatory-y += strcmp.o strlen.o strncmp.o strnlen.o strchr.o
endif
targets += $(purgatory-y)
@@ -35,6 +35,9 @@ $(obj)/sha256.o: $(srctree)/lib/crypto/sha256.c FORCE
$(obj)/strnlen.o: $(srctree)/arch/riscv/lib/strnlen.S FORCE
$(call if_changed_rule,as_o_S)
+$(obj)/strchr.o: $(srctree)/arch/riscv/lib/strchr.S FORCE
+ $(call if_changed_rule,as_o_S)
+
CFLAGS_sha256.o := -D__DISABLE_EXPORTS -D__NO_FORTIFY
CFLAGS_string.o := -D__DISABLE_EXPORTS
CFLAGS_ctype.o := -D__DISABLE_EXPORTS
--
2.25.1
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v3 8/8] riscv: lib: add strrchr implementation
2026-01-20 6:58 [PATCH v3 0/8] riscv: optimize string functions and add kunit tests Feng Jiang
` (6 preceding siblings ...)
2026-01-20 6:58 ` [PATCH v3 7/8] riscv: lib: add strchr implementation Feng Jiang
@ 2026-01-20 6:58 ` Feng Jiang
2026-01-20 7:32 ` Andy Shevchenko
2026-01-20 7:36 ` [PATCH v3 0/8] riscv: optimize string functions and add kunit tests Andy Shevchenko
8 siblings, 1 reply; 29+ messages in thread
From: Feng Jiang @ 2026-01-20 6:58 UTC (permalink / raw)
To: pjw, palmer, aou, alex, akpm, kees, andy, jiangfeng, ebiggers,
martin.petersen, ardb, charlie, conor.dooley, ajones,
linus.walleij, nathan
Cc: linux-riscv, linux-kernel, linux-hardening, Joel Stanley
Add an assembly implementation of strrchr() for RISC-V.
This implementation minimizes instruction count and avoids unnecessary
memory access to the stack. The performance benefits are most visible
on small workloads (1-16 bytes) where the architectural savings in
function overhead outweigh the execution time of the scan loop.
Benchmark results (QEMU TCG, rv64):
Length | Original (MB/s) | Optimized (MB/s) | Improvement
-------|-----------------|------------------|------------
1 B | 21 | 22 | +4.7%
7 B | 116 | 122 | +5.1%
16 B | 195 | 208 | +6.6%
512 B | 388 | 399 | +2.8%
4096 B | 411 | 411 | +0.0%
Suggested-by: Andy Shevchenko <andy@kernel.org>
Tested-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Feng Jiang <jiangfeng@kylinos.cn>
---
arch/riscv/include/asm/string.h | 3 +++
arch/riscv/lib/Makefile | 1 +
arch/riscv/lib/strrchr.S | 37 +++++++++++++++++++++++++++++++++
arch/riscv/purgatory/Makefile | 5 ++++-
4 files changed, 45 insertions(+), 1 deletion(-)
create mode 100644 arch/riscv/lib/strrchr.S
diff --git a/arch/riscv/include/asm/string.h b/arch/riscv/include/asm/string.h
index ca3ade82b124..764ffe8f6479 100644
--- a/arch/riscv/include/asm/string.h
+++ b/arch/riscv/include/asm/string.h
@@ -34,6 +34,9 @@ extern asmlinkage __kernel_size_t strnlen(const char *, size_t);
#define __HAVE_ARCH_STRCHR
extern asmlinkage char *strchr(const char *, int);
+
+#define __HAVE_ARCH_STRRCHR
+extern asmlinkage char *strrchr(const char *, int);
#endif
/* For those files which don't want to check by kasan. */
diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile
index b7f804dce1c3..735d0b665536 100644
--- a/arch/riscv/lib/Makefile
+++ b/arch/riscv/lib/Makefile
@@ -9,6 +9,7 @@ lib-y += strlen.o
lib-y += strncmp.o
lib-y += strnlen.o
lib-y += strchr.o
+lib-y += strrchr.o
endif
lib-y += csum.o
ifeq ($(CONFIG_MMU), y)
diff --git a/arch/riscv/lib/strrchr.S b/arch/riscv/lib/strrchr.S
new file mode 100644
index 000000000000..ac58b20ca21d
--- /dev/null
+++ b/arch/riscv/lib/strrchr.S
@@ -0,0 +1,37 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+/*
+ * Copyright (C) 2025 Feng Jiang <jiangfeng@kylinos.cn>
+ */
+
+#include <linux/linkage.h>
+#include <asm/asm.h>
+
+/* char *strrchr(const char *s, int c) */
+SYM_FUNC_START(strrchr)
+ /*
+ * Parameters
+ * a0 - The string to be searched
+ * a1 - The character to seaerch for
+ *
+ * Returns
+ * a0 - Address of last occurrence of 'c' or 0
+ *
+ * Clobbers
+ * t0, t1
+ */
+ andi a1, a1, 0xff
+ mv t1, a0
+ li a0, 0
+1:
+ lbu t0, 0(t1)
+ bne t0, a1, 2f
+ mv a0, t1
+2:
+ addi t1, t1, 1
+ bnez t0, 1b
+ ret
+SYM_FUNC_END(strrchr)
+
+SYM_FUNC_ALIAS_WEAK(__pi_strrchr, strrchr)
+EXPORT_SYMBOL(strrchr)
diff --git a/arch/riscv/purgatory/Makefile b/arch/riscv/purgatory/Makefile
index e7b3d748c913..b0358a78f11a 100644
--- a/arch/riscv/purgatory/Makefile
+++ b/arch/riscv/purgatory/Makefile
@@ -2,7 +2,7 @@
purgatory-y := purgatory.o sha256.o entry.o string.o ctype.o memcpy.o memset.o
ifeq ($(CONFIG_KASAN_GENERIC)$(CONFIG_KASAN_SW_TAGS),)
-purgatory-y += strcmp.o strlen.o strncmp.o strnlen.o strchr.o
+purgatory-y += strcmp.o strlen.o strncmp.o strnlen.o strchr.o strrchr.o
endif
targets += $(purgatory-y)
@@ -38,6 +38,9 @@ $(obj)/strnlen.o: $(srctree)/arch/riscv/lib/strnlen.S FORCE
$(obj)/strchr.o: $(srctree)/arch/riscv/lib/strchr.S FORCE
$(call if_changed_rule,as_o_S)
+$(obj)/strrchr.o: $(srctree)/arch/riscv/lib/strrchr.S FORCE
+ $(call if_changed_rule,as_o_S)
+
CFLAGS_sha256.o := -D__DISABLE_EXPORTS -D__NO_FORTIFY
CFLAGS_string.o := -D__DISABLE_EXPORTS
CFLAGS_ctype.o := -D__DISABLE_EXPORTS
--
2.25.1
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH v3 1/8] lib/string_kunit: add correctness test for strlen
2026-01-20 6:58 ` [PATCH v3 1/8] lib/string_kunit: add correctness test for strlen Feng Jiang
@ 2026-01-20 7:28 ` Andy Shevchenko
0 siblings, 0 replies; 29+ messages in thread
From: Andy Shevchenko @ 2026-01-20 7:28 UTC (permalink / raw)
To: Feng Jiang
Cc: pjw, palmer, aou, alex, akpm, kees, andy, ebiggers,
martin.petersen, ardb, charlie, conor.dooley, ajones,
linus.walleij, nathan, linux-riscv, linux-kernel, linux-hardening,
Joel Stanley
On Tue, Jan 20, 2026 at 02:58:45PM +0800, Feng Jiang wrote:
> Add a KUnit test for strlen() to verify correctness across
> different string lengths and memory alignments.
Acked-by: Andy Shevchenko <andy@kernel.org>
--
With Best Regards,
Andy Shevchenko
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v3 2/8] lib/string_kunit: add correctness test for strnlen
2026-01-20 6:58 ` [PATCH v3 2/8] lib/string_kunit: add correctness test for strnlen Feng Jiang
@ 2026-01-20 7:29 ` Andy Shevchenko
0 siblings, 0 replies; 29+ messages in thread
From: Andy Shevchenko @ 2026-01-20 7:29 UTC (permalink / raw)
To: Feng Jiang
Cc: pjw, palmer, aou, alex, akpm, kees, andy, ebiggers,
martin.petersen, ardb, charlie, conor.dooley, ajones,
linus.walleij, nathan, linux-riscv, linux-kernel, linux-hardening,
Joel Stanley
On Tue, Jan 20, 2026 at 02:58:46PM +0800, Feng Jiang wrote:
> Add a KUnit test for strnlen() to verify correctness across
> different string lengths and memory alignments.
Acked-by: Andy Shevchenko <andy@kernel.org>
--
With Best Regards,
Andy Shevchenko
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v3 3/8] lib/string_kunit: add correctness test for strrchr()
2026-01-20 6:58 ` [PATCH v3 3/8] lib/string_kunit: add correctness test for strrchr() Feng Jiang
@ 2026-01-20 7:30 ` Andy Shevchenko
0 siblings, 0 replies; 29+ messages in thread
From: Andy Shevchenko @ 2026-01-20 7:30 UTC (permalink / raw)
To: Feng Jiang
Cc: pjw, palmer, aou, alex, akpm, kees, andy, ebiggers,
martin.petersen, ardb, charlie, conor.dooley, ajones,
linus.walleij, nathan, linux-riscv, linux-kernel, linux-hardening,
Joel Stanley
On Tue, Jan 20, 2026 at 02:58:47PM +0800, Feng Jiang wrote:
> Introduce a KUnit test to verify strrchr() across various memory
> alignments and character positions. This ensures the implementation
> correctly identifies the last occurrence of a character.
Acked-by: Andy Shevchenko <andy@kernel.org>
--
With Best Regards,
Andy Shevchenko
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v3 6/8] riscv: lib: add strnlen implementation
2026-01-20 6:58 ` [PATCH v3 6/8] riscv: lib: add strnlen implementation Feng Jiang
@ 2026-01-20 7:31 ` Andy Shevchenko
2026-01-21 5:52 ` Feng Jiang
2026-01-21 7:24 ` Qingfang Deng
1 sibling, 1 reply; 29+ messages in thread
From: Andy Shevchenko @ 2026-01-20 7:31 UTC (permalink / raw)
To: Feng Jiang
Cc: pjw, palmer, aou, alex, akpm, kees, andy, ebiggers,
martin.petersen, ardb, charlie, conor.dooley, ajones,
linus.walleij, nathan, linux-riscv, linux-kernel, linux-hardening,
Joel Stanley
On Tue, Jan 20, 2026 at 02:58:50PM +0800, Feng Jiang wrote:
> Add an optimized strnlen() implementation for RISC-V. This version
> includes a generic word-at-a-time optimization and a Zbb-powered
> optimization using the 'orc.b' instruction, derived from the strlen
> implementation.
>
> Benchmark results (QEMU TCG, rv64):
> Length | Original (MB/s) | Optimized (MB/s) | Improvement
> -------|-----------------|------------------|------------
> 16 B | 189 | 310 | +64.0%
> 512 B | 344 | 1535 | +346.2%
> 4096 B | 363 | 1854 | +410.7%
> Suggested-by: Andy Shevchenko <andy@kernel.org>
Wrong tag, I have zero knowledge about RISC V.
--
With Best Regards,
Andy Shevchenko
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v3 7/8] riscv: lib: add strchr implementation
2026-01-20 6:58 ` [PATCH v3 7/8] riscv: lib: add strchr implementation Feng Jiang
@ 2026-01-20 7:31 ` Andy Shevchenko
0 siblings, 0 replies; 29+ messages in thread
From: Andy Shevchenko @ 2026-01-20 7:31 UTC (permalink / raw)
To: Feng Jiang
Cc: pjw, palmer, aou, alex, akpm, kees, andy, ebiggers,
martin.petersen, ardb, charlie, conor.dooley, ajones,
linus.walleij, nathan, linux-riscv, linux-kernel, linux-hardening,
Joel Stanley
On Tue, Jan 20, 2026 at 02:58:51PM +0800, Feng Jiang wrote:
> Add an assembly implementation of strchr() for RISC-V.
>
> By eliminating stack frame management (prologue/epilogue) and optimizing
> the function entries, the assembly version provides significant relative
> gains for short strings where the fixed overhead of the C function is
> most prominent. As the string length increases, the performance converges
> with the byte-oriented scan logic.
>
> Benchmark results (QEMU TCG, rv64):
> Length | Original (MB/s) | Optimized (MB/s) | Improvement
> -------|-----------------|------------------|------------
> 1 B | 21 | 23 | +9.5%
> 7 B | 118 | 126 | +6.7%
> 16 B | 200 | 208 | +4.0%
> 512 B | 375 | 399 | +6.4%
> 4096 B | 395 | 401 | +1.5%
> Suggested-by: Andy Shevchenko <andy@kernel.org>
Wrong tag.
--
With Best Regards,
Andy Shevchenko
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v3 8/8] riscv: lib: add strrchr implementation
2026-01-20 6:58 ` [PATCH v3 8/8] riscv: lib: add strrchr implementation Feng Jiang
@ 2026-01-20 7:32 ` Andy Shevchenko
0 siblings, 0 replies; 29+ messages in thread
From: Andy Shevchenko @ 2026-01-20 7:32 UTC (permalink / raw)
To: Feng Jiang
Cc: pjw, palmer, aou, alex, akpm, kees, andy, ebiggers,
martin.petersen, ardb, charlie, conor.dooley, ajones,
linus.walleij, nathan, linux-riscv, linux-kernel, linux-hardening,
Joel Stanley
On Tue, Jan 20, 2026 at 02:58:52PM +0800, Feng Jiang wrote:
> Add an assembly implementation of strrchr() for RISC-V.
>
> This implementation minimizes instruction count and avoids unnecessary
> memory access to the stack. The performance benefits are most visible
> on small workloads (1-16 bytes) where the architectural savings in
> function overhead outweigh the execution time of the scan loop.
>
> Benchmark results (QEMU TCG, rv64):
> Length | Original (MB/s) | Optimized (MB/s) | Improvement
> -------|-----------------|------------------|------------
> 1 B | 21 | 22 | +4.7%
> 7 B | 116 | 122 | +5.1%
> 16 B | 195 | 208 | +6.6%
> 512 B | 388 | 399 | +2.8%
> 4096 B | 411 | 411 | +0.0%
> Suggested-by: Andy Shevchenko <andy@kernel.org>
Wrong tag.
--
With Best Regards,
Andy Shevchenko
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v3 0/8] riscv: optimize string functions and add kunit tests
2026-01-20 6:58 [PATCH v3 0/8] riscv: optimize string functions and add kunit tests Feng Jiang
` (7 preceding siblings ...)
2026-01-20 6:58 ` [PATCH v3 8/8] riscv: lib: add strrchr implementation Feng Jiang
@ 2026-01-20 7:36 ` Andy Shevchenko
2026-01-21 6:44 ` Feng Jiang
8 siblings, 1 reply; 29+ messages in thread
From: Andy Shevchenko @ 2026-01-20 7:36 UTC (permalink / raw)
To: Feng Jiang
Cc: pjw, palmer, aou, alex, akpm, kees, andy, ebiggers,
martin.petersen, ardb, charlie, conor.dooley, ajones,
linus.walleij, nathan, linux-riscv, linux-kernel, linux-hardening
On Tue, Jan 20, 2026 at 02:58:44PM +0800, Feng Jiang wrote:
> This series provides optimized implementations of strnlen(), strchr(),
> and strrchr() for the RISC-V architecture. The strnlen implementation
> is derived from the existing optimized strlen. For strchr and strrchr,
strchr() and strrchr()
> the current versions use simple byte-by-byte assembly logic, which
> will serve as a baseline for future Zbb-based optimizations.
>
> The patch series is organized into three parts:
> 1. Correctness Testing: The first three patches add KUnit test cases
> for strlen, strnlen, and strrchr to ensure the baseline and optimized
strlen(), strnlen(), and strrchr()
> versions are functionally correct.
> 2. Benchmarking Tool: Patches 4 and 5 extend string_kunit to include
> performance measurement capabilities, allowing for comparative
> analysis within the KUnit environment.
> 3. Architectural Optimizations: The final three patches introduce the
> RISC-V specific assembly implementations.
>
> Following suggestions from Andy Shevchenko, performance benchmarks have
> been added to string_kunit.c to provide quantifiable evidence of the
> improvements. Andy provided many specific comments on the implementation
> of the benchmark logic, which is also inspired by Eric Biggers'
> crc_benchmark(). Performance was measured in a QEMU TCG (rv64) environment,
> comparing the generic C implementation with the new RISC-V assembly versions.
>
> Performance Summary (Improvement %):
> ---------------------------------------------------------------
> Function | 16 B (Short) | 512 B (Mid) | 4096 B (Long)
> ---------------------------------------------------------------
> strnlen | +64.0% | +346.2% | +410.7%
This is still suspicious.
> strchr | +4.0% | +6.4% | +1.5%
> strrchr | +6.6% | +2.8% | +0.0%
> ---------------------------------------------------------------
> The benchmarks can be reproduced by enabling CONFIG_STRING_KUNIT_BENCH
> and running: ./tools/testing/kunit/kunit.py run --arch=riscv \
> --cross_compile=riscv64-linux-gnu- --kunitconfig=my_string.kunitconfig \
> --raw_output
>
> The strnlen implementation leverages the Zbb 'orc.b' instruction and
strnlen()
> word-at-a-time logic, showing significant gains as the string length
> increases.
Hmm... Have you tried to optimise the generic implementation to use
word-at-a-time logic and compare?
> For strchr and strrchr, the handwritten assembly reduces
strchr() and strrchr()
> fixed overhead by eliminating stack frame management. The gain is most
> prominent on short strings (1-16B) where function call overhead dominates,
> while the performance converges with the C implementation for longer
> strings in the TCG environment.
--
With Best Regards,
Andy Shevchenko
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v3 4/8] lib/string_kunit: add performance benchmarks for strlen
2026-01-20 6:58 ` [PATCH v3 4/8] lib/string_kunit: add performance benchmarks for strlen Feng Jiang
@ 2026-01-20 7:46 ` Andy Shevchenko
2026-01-21 5:45 ` Feng Jiang
0 siblings, 1 reply; 29+ messages in thread
From: Andy Shevchenko @ 2026-01-20 7:46 UTC (permalink / raw)
To: Feng Jiang
Cc: pjw, palmer, aou, alex, akpm, kees, andy, ebiggers,
martin.petersen, ardb, charlie, conor.dooley, ajones,
linus.walleij, nathan, linux-riscv, linux-kernel, linux-hardening,
Joel Stanley
On Tue, Jan 20, 2026 at 02:58:48PM +0800, Feng Jiang wrote:
> Introduce a benchmarking framework to the string_kunit test suite to
> measure the execution efficiency of string functions.
>
> The implementation is inspired by crc_benchmark(), measuring throughput
> (MB/s) and latency (ns/call) across a range of string lengths. It
> includes a warm-up phase, disables preemption during measurement, and
> uses a fixed seed for reproducible results.
>
> This allows for comparing different implementations (e.g., generic C vs.
> architecture-optimized assembly) within the KUnit environment.
>
> Initially, provide benchmarks for strlen().
...
> +#define STRING_BENCH_SEED 888
> +#define STRING_BENCH_WORKLOAD 1000000UL
Can also be (1 * MEGA) from units.h.
...
> +static const size_t bench_lens[] = {
> + 0, 1, 7, 8, 16, 31, 64, 127, 512, 1024, 3173, 4096
Leave trailing comma.
> +};
...
> +static void *alloc_max_bench_buffer(struct kunit *test,
> + const size_t *lens, size_t count, size_t *buf_len)
> +{
> + void *buf;
> + size_t i, max_len = 0;
> +
> + for (i = 0; i < count; i++) {
> + if (max_len < lens[i])
> + max_len = lens[i];
> + }
> +
> + /* Add space for NUL terminator */
> + max_len += 1;
> + buf = kunit_kzalloc(test, max_len, GFP_KERNEL);
> + if (buf && buf_len)
> + *buf_len = max_len;
> +
> + return buf;
if (!buf)
return NULL;
*buf_len ...
return buf;
> +}
...
> +static void fill_random_string(char *buf, size_t len)
> +{
> + size_t i;
> + struct rnd_state state;
Reversed xmas tree ordering?
> + if (!buf || !len)
> + return;
> +
> + /* Use a fixed seed to ensure deterministic benchmark results */
> + prandom_seed_state(&state, 888);
> + prandom_bytes_state(&state, buf, len);
> +
> + /* Replace null bytes to avoid early string termination */
> + for (i = 0; i < len; i++) {
> + if (buf[i] == '\0')
> + buf[i] = 0x01;
> + }
> +
> + buf[len - 1] = '\0';
> +}
...
> +#define STRING_BENCH(iters, func, ...) \
Is this same / similar code to crc_benchmark()? Perhaps we need to have KUnit
provided macro / environment to perform such tests... Have you talked to KUnit
people about all this?
> +({ \
> + u64 __bn_t; \
> + size_t __bn_i; \
> + size_t __bn_iters = (iters); \
> + size_t __bn_warm_iters = max_t(size_t, __bn_iters / 10, 50U); \
Try to avoid max_t() as much as possible. Wouldn't max() suffice?
> + /* Volatile function pointer prevents dead code elimination */ \
> + typeof(func) (* volatile __func) = (func); \
> + \
> + for (__bn_i = 0; __bn_i < __bn_warm_iters; __bn_i++) \
> + (void)__func(__VA_ARGS__); \
> + \
> + preempt_disable(); \
> + __bn_t = ktime_get_ns(); \
> + for (__bn_i = 0; __bn_i < __bn_iters; __bn_i++) \
> + (void)__func(__VA_ARGS__); \
> + __bn_t = ktime_get_ns() - __bn_t; \
> + preempt_enable(); \
> + __bn_t; \
> +})
> +
> +/**
> + * STRING_BENCH_BUF() - Benchmark harness for single-buffer functions.
> + * @test: KUnit context.
> + * @buf_name: Local char * variable name to be defined.
> + * @buf_size: Local size_t variable name to be defined.
> + * @func: Function to benchmark.
> + * @...: Extra arguments for @func.
> + *
> + * Prepares a randomized, null-terminated buffer and iterates through lengths
> + * in bench_lens, defining @buf_name and @buf_size in each loop.
> + */
> +#define STRING_BENCH_BUF(test, buf_name, buf_size, func, ...) \
> +do { \
> + char *buf_name, *_bn_buf; \
> + size_t buf_size, _bn_i, _bn_iters, _bn_size = 0; \
> + u64 _bn_t, _bn_mbps = 0, _bn_lat = 0; \
> + \
> + if (!IS_ENABLED(CONFIG_STRING_KUNIT_BENCH)) \
> + kunit_skip(test, "not enabled"); \
> + \
> + _bn_buf = alloc_max_bench_buffer(test, bench_lens, \
> + ARRAY_SIZE(bench_lens), &_bn_size); \
> + KUNIT_ASSERT_NOT_ERR_OR_NULL(test, _bn_buf); \
> + \
> + fill_random_string(_bn_buf, _bn_size); \
> + _bn_buf[_bn_size - 1] = '\0'; \
You have already this there in the function, no?
> + for (_bn_i = 0; _bn_i < ARRAY_SIZE(bench_lens); _bn_i++) { \
> + buf_size = bench_lens[_bn_i]; \
> + buf_name = _bn_buf + _bn_size - buf_size - 1; \
> + _bn_iters = STRING_BENCH_WORKLOAD / \
> + max_t(size_t, buf_size, 1U); \
max()
> + _bn_t = STRING_BENCH(_bn_iters, func, ##__VA_ARGS__); \
> + \
> + if (_bn_t > 0) { \
> + _bn_mbps = (u64)(buf_size) * _bn_iters * 1000; \
> + _bn_mbps = div64_u64(_bn_mbps, _bn_t); \
> + _bn_lat = div64_u64(_bn_t, _bn_iters); \
> + } \
> + kunit_info(test, "len=%zu: %llu MB/s (%llu ns/call)\n", \
> + buf_size, _bn_mbps, _bn_lat); \
> + } \
> +} while (0)
--
With Best Regards,
Andy Shevchenko
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v3 5/8] lib/string_kunit: extend benchmarks to strnlen and chr searches
2026-01-20 6:58 ` [PATCH v3 5/8] lib/string_kunit: extend benchmarks to strnlen and chr searches Feng Jiang
@ 2026-01-20 7:48 ` Andy Shevchenko
2026-01-21 5:48 ` Feng Jiang
0 siblings, 1 reply; 29+ messages in thread
From: Andy Shevchenko @ 2026-01-20 7:48 UTC (permalink / raw)
To: Feng Jiang
Cc: pjw, palmer, aou, alex, akpm, kees, andy, ebiggers,
martin.petersen, ardb, charlie, conor.dooley, ajones,
linus.walleij, nathan, linux-riscv, linux-kernel, linux-hardening,
Joel Stanley
On Tue, Jan 20, 2026 at 02:58:49PM +0800, Feng Jiang wrote:
> Extend the string benchmarking suite to include strnlen(), strchr(),
> and strrchr().
>
> For character search functions (strchr and strrchr), the benchmark
strchr() and strrchr()
> targets the null terminator. This ensures the entire string is scanned,
NUL character
(Also check terminology everywhere: NULL — is for NULL pointers, NUL is for
'\0' characters.)
> providing a consistent measure of full-length processing efficiency
> comparable to strlen().
Acked-by: Andy Shevchenko <andy@kernel.org>
--
With Best Regards,
Andy Shevchenko
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v3 4/8] lib/string_kunit: add performance benchmarks for strlen
2026-01-20 7:46 ` Andy Shevchenko
@ 2026-01-21 5:45 ` Feng Jiang
0 siblings, 0 replies; 29+ messages in thread
From: Feng Jiang @ 2026-01-21 5:45 UTC (permalink / raw)
To: Andy Shevchenko
Cc: pjw, palmer, aou, alex, akpm, kees, andy, ebiggers,
martin.petersen, ardb, charlie, conor.dooley, ajones,
linus.walleij, nathan, linux-riscv, linux-kernel, linux-hardening,
Joel Stanley
On 2026/1/20 15:46, Andy Shevchenko wrote:
> On Tue, Jan 20, 2026 at 02:58:48PM +0800, Feng Jiang wrote:
>> Introduce a benchmarking framework to the string_kunit test suite to
>> measure the execution efficiency of string functions.
>>
>> The implementation is inspired by crc_benchmark(), measuring throughput
>> (MB/s) and latency (ns/call) across a range of string lengths. It
>> includes a warm-up phase, disables preemption during measurement, and
>> uses a fixed seed for reproducible results.
>>
>> This allows for comparing different implementations (e.g., generic C vs.
>> architecture-optimized assembly) within the KUnit environment.
>>
>> Initially, provide benchmarks for strlen().
>
> ...
>
>> +#define STRING_BENCH_SEED 888
>> +#define STRING_BENCH_WORKLOAD 1000000UL
>
> Can also be (1 * MEGA) from units.h.
Fixed.
> ...
>
>> +static const size_t bench_lens[] = {
>> + 0, 1, 7, 8, 16, 31, 64, 127, 512, 1024, 3173, 4096
>
> Leave trailing comma.
Fixed.
> ...
>
>> +static void *alloc_max_bench_buffer(struct kunit *test,
>> + const size_t *lens, size_t count, size_t *buf_len)
>> +{
>> + void *buf;
>> + size_t i, max_len = 0;
>> +
>> + for (i = 0; i < count; i++) {
>> + if (max_len < lens[i])
>> + max_len = lens[i];
>> + }
>> +
>> + /* Add space for NUL terminator */
>> + max_len += 1;
>
>> + buf = kunit_kzalloc(test, max_len, GFP_KERNEL);
>> + if (buf && buf_len)
>> + *buf_len = max_len;
>> +
>> + return buf;
>
> if (!buf)
> return NULL;
>
> *buf_len ...
> return buf;
>
Fixed.
> ...
>
>> +static void fill_random_string(char *buf, size_t len)
>> +{
>> + size_t i;
>> + struct rnd_state state;
>
> Reversed xmas tree ordering?
Fixed.
>> + if (!buf || !len)
>> + return;
>> +
>> + /* Use a fixed seed to ensure deterministic benchmark results */
>> + prandom_seed_state(&state, 888);
>> + prandom_bytes_state(&state, buf, len);
>> +
>> + /* Replace null bytes to avoid early string termination */
>> + for (i = 0; i < len; i++) {
>> + if (buf[i] == '\0')
>> + buf[i] = 0x01;
>> + }
>> +
>> + buf[len - 1] = '\0';
>> +}
>
> ...
>
>> +#define STRING_BENCH(iters, func, ...) \
>
> Is this same / similar code to crc_benchmark()? Perhaps we need to have KUnit
> provided macro / environment to perform such tests... Have you talked to KUnit
> people about all this?
>
I haven't reached out to the KUnit maintainers yet. This implementation is currently
a lightweight adaptation specifically for string benchmarks. However, I agree that
a generic KUnit benchmarking harness would be beneficial for the kernel. For now,
I'll refine this version based on your feedback.
>> +({ \
>> + u64 __bn_t; \
>> + size_t __bn_i; \
>> + size_t __bn_iters = (iters); \
>> + size_t __bn_warm_iters = max_t(size_t, __bn_iters / 10, 50U); \
>
> Try to avoid max_t() as much as possible. Wouldn't max() suffice?
>
Will do.
>> + /* Volatile function pointer prevents dead code elimination */ \
>> + typeof(func) (* volatile __func) = (func); \
>> + \
>> + for (__bn_i = 0; __bn_i < __bn_warm_iters; __bn_i++) \
>> + (void)__func(__VA_ARGS__); \
>> + \
>> + preempt_disable(); \
>> + __bn_t = ktime_get_ns(); \
>> + for (__bn_i = 0; __bn_i < __bn_iters; __bn_i++) \
>> + (void)__func(__VA_ARGS__); \
>> + __bn_t = ktime_get_ns() - __bn_t; \
>> + preempt_enable(); \
>> + __bn_t; \
>> +})
>> +
>> +/**
>> + * STRING_BENCH_BUF() - Benchmark harness for single-buffer functions.
>> + * @test: KUnit context.
>> + * @buf_name: Local char * variable name to be defined.
>> + * @buf_size: Local size_t variable name to be defined.
>> + * @func: Function to benchmark.
>> + * @...: Extra arguments for @func.
>> + *
>> + * Prepares a randomized, null-terminated buffer and iterates through lengths
>> + * in bench_lens, defining @buf_name and @buf_size in each loop.
>> + */
>> +#define STRING_BENCH_BUF(test, buf_name, buf_size, func, ...) \
>> +do { \
>> + char *buf_name, *_bn_buf; \
>> + size_t buf_size, _bn_i, _bn_iters, _bn_size = 0; \
>> + u64 _bn_t, _bn_mbps = 0, _bn_lat = 0; \
>> + \
>> + if (!IS_ENABLED(CONFIG_STRING_KUNIT_BENCH)) \
>> + kunit_skip(test, "not enabled"); \
>> + \
>> + _bn_buf = alloc_max_bench_buffer(test, bench_lens, \
>> + ARRAY_SIZE(bench_lens), &_bn_size); \
>> + KUNIT_ASSERT_NOT_ERR_OR_NULL(test, _bn_buf); \
>> + \
>> + fill_random_string(_bn_buf, _bn_size); \
>
>> + _bn_buf[_bn_size - 1] = '\0'; \
>
> You have already this there in the function, no?
>
Indeed, that's redundant. I'll remove it.
>> + for (_bn_i = 0; _bn_i < ARRAY_SIZE(bench_lens); _bn_i++) { \
>> + buf_size = bench_lens[_bn_i]; \
>> + buf_name = _bn_buf + _bn_size - buf_size - 1; \
>> + _bn_iters = STRING_BENCH_WORKLOAD / \
>> + max_t(size_t, buf_size, 1U); \
>
> max()
Fixed.
>> + _bn_t = STRING_BENCH(_bn_iters, func, ##__VA_ARGS__); \
>> + \
>> + if (_bn_t > 0) { \
>> + _bn_mbps = (u64)(buf_size) * _bn_iters * 1000; \
>> + _bn_mbps = div64_u64(_bn_mbps, _bn_t); \
>> + _bn_lat = div64_u64(_bn_t, _bn_iters); \
>> + } \
>> + kunit_info(test, "len=%zu: %llu MB/s (%llu ns/call)\n", \
>> + buf_size, _bn_mbps, _bn_lat); \
>> + } \
>> +} while (0)
>
Thanks for the catch. I will incorporate all your suggestions into v4.
--
With Best Regards,
Feng Jiang
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v3 5/8] lib/string_kunit: extend benchmarks to strnlen and chr searches
2026-01-20 7:48 ` Andy Shevchenko
@ 2026-01-21 5:48 ` Feng Jiang
0 siblings, 0 replies; 29+ messages in thread
From: Feng Jiang @ 2026-01-21 5:48 UTC (permalink / raw)
To: Andy Shevchenko
Cc: pjw, palmer, aou, alex, akpm, kees, andy, ebiggers,
martin.petersen, ardb, charlie, conor.dooley, ajones,
linus.walleij, nathan, linux-riscv, linux-kernel, linux-hardening,
Joel Stanley
On 2026/1/20 15:48, Andy Shevchenko wrote:
> On Tue, Jan 20, 2026 at 02:58:49PM +0800, Feng Jiang wrote:
>> Extend the string benchmarking suite to include strnlen(), strchr(),
>> and strrchr().
>>
>> For character search functions (strchr and strrchr), the benchmark
>
> strchr() and strrchr()
Will fix.
>> targets the null terminator. This ensures the entire string is scanned,
>
> NUL character
>
> (Also check terminology everywhere: NULL — is for NULL pointers, NUL is for
> '\0' characters.)
>
Thanks for the correction, I'll fix this and check other places as well.
>> providing a consistent measure of full-length processing efficiency
>> comparable to strlen().
>
> Acked-by: Andy Shevchenko <andy@kernel.org>
>
--
With Best Regards,
Feng Jiang
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v3 6/8] riscv: lib: add strnlen implementation
2026-01-20 7:31 ` Andy Shevchenko
@ 2026-01-21 5:52 ` Feng Jiang
0 siblings, 0 replies; 29+ messages in thread
From: Feng Jiang @ 2026-01-21 5:52 UTC (permalink / raw)
To: Andy Shevchenko
Cc: pjw, palmer, aou, alex, akpm, kees, andy, ebiggers,
martin.petersen, ardb, charlie, conor.dooley, ajones,
linus.walleij, nathan, linux-riscv, linux-kernel, linux-hardening,
Joel Stanley
On 2026/1/20 15:31, Andy Shevchenko wrote:
> On Tue, Jan 20, 2026 at 02:58:50PM +0800, Feng Jiang wrote:
>> Add an optimized strnlen() implementation for RISC-V. This version
>> includes a generic word-at-a-time optimization and a Zbb-powered
>> optimization using the 'orc.b' instruction, derived from the strlen
>> implementation.
>>
>> Benchmark results (QEMU TCG, rv64):
>> Length | Original (MB/s) | Optimized (MB/s) | Improvement
>> -------|-----------------|------------------|------------
>> 16 B | 189 | 310 | +64.0%
>> 512 B | 344 | 1535 | +346.2%
>> 4096 B | 363 | 1854 | +410.7%
>
>> Suggested-by: Andy Shevchenko <andy@kernel.org>
>
> Wrong tag, I have zero knowledge about RISC V.
>
Sorry for the confusion. I misunderstood the scope of the 'Suggested-by' tag.
I will remove it from the RISC-V specific implementation patches and only keep
relevant credits in the benchmarking/testing patches where your feedback was
applied.
Thanks for clarifying!
--
With Best Regards,
Feng Jiang
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v3 0/8] riscv: optimize string functions and add kunit tests
2026-01-20 7:36 ` [PATCH v3 0/8] riscv: optimize string functions and add kunit tests Andy Shevchenko
@ 2026-01-21 6:44 ` Feng Jiang
2026-01-21 7:01 ` Andy Shevchenko
0 siblings, 1 reply; 29+ messages in thread
From: Feng Jiang @ 2026-01-21 6:44 UTC (permalink / raw)
To: Andy Shevchenko
Cc: pjw, palmer, aou, alex, akpm, kees, andy, ebiggers,
martin.petersen, ardb, charlie, conor.dooley, ajones,
linus.walleij, nathan, linux-riscv, linux-kernel, linux-hardening
On 2026/1/20 15:36, Andy Shevchenko wrote:
> On Tue, Jan 20, 2026 at 02:58:44PM +0800, Feng Jiang wrote:
>> This series provides optimized implementations of strnlen(), strchr(),
>> and strrchr() for the RISC-V architecture. The strnlen implementation
>> is derived from the existing optimized strlen. For strchr and strrchr,
>
> strchr() and strrchr()
>>> the current versions use simple byte-by-byte assembly logic, which
>> will serve as a baseline for future Zbb-based optimizations.
>>
>> The patch series is organized into three parts:
>> 1. Correctness Testing: The first three patches add KUnit test cases
>> for strlen, strnlen, and strrchr to ensure the baseline and optimized
>
> strlen(), strnlen(), and strrchr()
Will fix these to include parentheses consistently in the next version.
>> Performance Summary (Improvement %):
>> ---------------------------------------------------------------
>> Function | 16 B (Short) | 512 B (Mid) | 4096 B (Long)
>> ---------------------------------------------------------------
>> strnlen | +64.0% | +346.2% | +410.7%
>
> This is still suspicious.
>
Regarding the +410% gain, it becomes clearer when looking at the inner loop of the
Zbb implementation (https://docs.riscv.org/reference/isa/unpriv/b-st-ext.html#zbb).
For a 64-bit system, the core loop uses only 5 instructions to process 8 bytes.
Note that t3 is pre-loaded with -1 (0xFFFFFFFFFFFFFFFF):
1:
REG_L t1, SZREG(t0) /* Load 8 bytes */
addi t0, t0, SZREG /* Advance pointer */
orc.b t1, t1 /* Bitwise OR-Combine: 0x00 becomes 0x00, others 0xFF */
bgeu t0, t4, 4f /* Boundary check (max_len) */
beq t1, t3, 1b /* If t1 == 0xFFFFFFFFFFFFFFFF (no NUL), loop */
In contrast, the generic C implementation performs byte-by-byte comparisons, which involves
significantly more loads and branches for every single byte processed. The Zbb approach is
much leaner: the orc.b instruction collapses the NUL-check for an entire 8-byte word into a
single step. By shifting from a byte-oriented loop to this hardware-accelerated word-at-a-time
logic, we drastically reduce the instruction count and branch overhead, which explains the
massive jump in TCG throughput for long strings.
Beyond the main loop, the Zbb implementation also utilizes ctz (Count Trailing Zeros)
to handle the tail and alignment. Once orc.b identifies a NUL byte within a register,
we can precisely locate its position in just two instructions:
not t1, t1 /* Flip bits: NUL byte (0x00) becomes 0xFF */
ctz t1, t1 /* Count bits before the first NUL byte */
srli t1, t1, 3 /* Divide by 8 to get byte offset */
In a generic C implementation, calculating this byte offset typically requires a series
of shifts, masks, or a sub-loop, which adds significant overhead. By combining orc.b and
ctz, we eliminate all branching and lookup tables for the tail-end calculation, further
contributing to the performance gains observed in the benchmarks.
>> strchr | +4.0% | +6.4% | +1.5%
>> strrchr | +6.6% | +2.8% | +0.0%
As for strchr() and strrchr(), the relatively modest improvements are because the current
versions in this series are implemented using simple byte-by-byte assembly. These primarily
gain performance by reducing function call overhead and eliminating stack frame management
compared to the generic C version.
Unlike strnlen(), they do not yet utilize Zbb extensions. I plan to introduce Zbb-optimized
versions for these functions in a future patch set, which I expect will bring performance
gains similar to what we now see with strnlen().
>> word-at-a-time logic, showing significant gains as the string length
>> increases.
>
> Hmm... Have you tried to optimise the generic implementation to use
> word-at-a-time logic and compare?
>
Regarding the generic implementation, even if we were to optimize the C code
to use word-at-a-time logic (the has_zero() style bit-manipulation), it still
wouldn't match the Zbb version's efficiency.
The traditional C-based word-level detection requires a sequence of arithmetic
operations to identify NUL bytes. In contrast, the RISC-V orc.b instruction
collapses this entire check into a single hardware cycle. I've focused on this
architectural approach to fully leverage these specific Zbb features, which
provides a level of instruction density that generic C math cannot achieve.
>> For strchr and strrchr, the handwritten assembly reduces
>
> strchr() and strrchr()
>
>> fixed overhead by eliminating stack frame management. The gain is most
>> prominent on short strings (1-16B) where function call overhead dominates,
>> while the performance converges with the C implementation for longer
>> strings in the TCG environment.
>
Thanks for your detailed review!
--
With Best Regards,
Feng Jiang
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v3 0/8] riscv: optimize string functions and add kunit tests
2026-01-21 6:44 ` Feng Jiang
@ 2026-01-21 7:01 ` Andy Shevchenko
2026-01-21 8:12 ` Feng Jiang
2026-01-21 10:57 ` David Laight
0 siblings, 2 replies; 29+ messages in thread
From: Andy Shevchenko @ 2026-01-21 7:01 UTC (permalink / raw)
To: Feng Jiang
Cc: Andy Shevchenko, pjw, palmer, aou, alex, akpm, kees, andy,
ebiggers, martin.petersen, ardb, charlie, conor.dooley, ajones,
linus.walleij, nathan, linux-riscv, linux-kernel, linux-hardening
On Wed, Jan 21, 2026 at 8:44 AM Feng Jiang <jiangfeng@kylinos.cn> wrote:
> On 2026/1/20 15:36, Andy Shevchenko wrote:
> > On Tue, Jan 20, 2026 at 02:58:44PM +0800, Feng Jiang wrote:
...
> >> Performance Summary (Improvement %):
> >> ---------------------------------------------------------------
> >> Function | 16 B (Short) | 512 B (Mid) | 4096 B (Long)
> >> ---------------------------------------------------------------
> >> strnlen | +64.0% | +346.2% | +410.7%
> >
> > This is still suspicious.
>
> Regarding the +410% gain, it becomes clearer when looking at the inner loop of the
> Zbb implementation (https://docs.riscv.org/reference/isa/unpriv/b-st-ext.html#zbb).
> For a 64-bit system, the core loop uses only 5 instructions to process 8 bytes.
> Note that t3 is pre-loaded with -1 (0xFFFFFFFFFFFFFFFF):
>
> 1:
> REG_L t1, SZREG(t0) /* Load 8 bytes */
> addi t0, t0, SZREG /* Advance pointer */
> orc.b t1, t1 /* Bitwise OR-Combine: 0x00 becomes 0x00, others 0xFF */
> bgeu t0, t4, 4f /* Boundary check (max_len) */
> beq t1, t3, 1b /* If t1 == 0xFFFFFFFFFFFFFFFF (no NUL), loop */
>
> In contrast, the generic C implementation performs byte-by-byte comparisons, which involves
> significantly more loads and branches for every single byte processed. The Zbb approach is
> much leaner: the orc.b instruction collapses the NUL-check for an entire 8-byte word into a
> single step. By shifting from a byte-oriented loop to this hardware-accelerated word-at-a-time
> logic, we drastically reduce the instruction count and branch overhead, which explains the
> massive jump in TCG throughput for long strings.
>
> Beyond the main loop, the Zbb implementation also utilizes ctz (Count Trailing Zeros)
> to handle the tail and alignment. Once orc.b identifies a NUL byte within a register,
> we can precisely locate its position in just two instructions:
>
> not t1, t1 /* Flip bits: NUL byte (0x00) becomes 0xFF */
> ctz t1, t1 /* Count bits before the first NUL byte */
> srli t1, t1, 3 /* Divide by 8 to get byte offset */
>
> In a generic C implementation, calculating this byte offset typically requires a series
> of shifts, masks, or a sub-loop, which adds significant overhead. By combining orc.b and
> ctz, we eliminate all branching and lookup tables for the tail-end calculation, further
> contributing to the performance gains observed in the benchmarks.
>
> >> strchr | +4.0% | +6.4% | +1.5%
> >> strrchr | +6.6% | +2.8% | +0.0%
>
> As for strchr() and strrchr(), the relatively modest improvements are because the current
> versions in this series are implemented using simple byte-by-byte assembly. These primarily
> gain performance by reducing function call overhead and eliminating stack frame management
> compared to the generic C version.
>
> Unlike strnlen(), they do not yet utilize Zbb extensions. I plan to introduce Zbb-optimized
> versions for these functions in a future patch set, which I expect will bring performance
> gains similar to what we now see with strnlen().
Thanks for the details regarding the assembly native implementation.
> >> word-at-a-time logic, showing significant gains as the string length
> >> increases.
> >
> > Hmm... Have you tried to optimise the generic implementation to use
> > word-at-a-time logic and compare?
>
> Regarding the generic implementation, even if we were to optimize the C code
> to use word-at-a-time logic (the has_zero() style bit-manipulation), it still
> wouldn't match the Zbb version's efficiency.
>
> The traditional C-based word-level detection requires a sequence of arithmetic
> operations to identify NUL bytes. In contrast, the RISC-V orc.b instruction
> collapses this entire check into a single hardware cycle. I've focused on this
> architectural approach to fully leverage these specific Zbb features, which
> provides a level of instruction density that generic C math cannot achieve.
I understand that. My point is if we move the generic implementation
to use word-at-a-time technique the difference should not go 4x,
right? Perhaps 1.5x or so. I believe this will be a very useful
exercise.
--
With Best Regards,
Andy Shevchenko
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v3 6/8] riscv: lib: add strnlen implementation
2026-01-20 6:58 ` [PATCH v3 6/8] riscv: lib: add strnlen implementation Feng Jiang
2026-01-20 7:31 ` Andy Shevchenko
@ 2026-01-21 7:24 ` Qingfang Deng
2026-01-23 1:28 ` Feng Jiang
1 sibling, 1 reply; 29+ messages in thread
From: Qingfang Deng @ 2026-01-21 7:24 UTC (permalink / raw)
To: Feng Jiang
Cc: pjw, palmer, aou, alex, akpm, kees, andy, ebiggers,
martin.petersen, ardb, charlie, conor.dooley, ajones,
linus.walleij, nathan, linux-riscv, linux-kernel, linux-hardening,
Joel Stanley
On Tue, 20 Jan 2026 14:58:50 +0800, Feng Jiang wrote:
> diff --git a/arch/riscv/lib/strnlen.S b/arch/riscv/lib/strnlen.S
Branches that test maxlen can be replaced with Zbb minu instruction.
(see below)
> + /*
> + * Returns
> + * a0 - String length
> + *
> + * Parameters
> + * a0 - String to measure
> + * a1 - Max length of string
> + *
> + * Clobbers
> + * t0, t1, t2, t3, t4
> + */
> +
> + /* If maxlen is 0, return 0. */
> + beqz a1, 3f
> +
> + /* Number of irrelevant bytes in the first word. */
> + andi t2, a0, SZREG-1
> +
> + /* Align pointer. */
> + andi t0, a0, -SZREG
> +
> + li t3, SZREG
> + sub t3, t3, t2
> + slli t2, t2, 3
> +
> + /* Aligned boundary. */
> + add t4, a0, a1
> + andi t4, t4, -SZREG
> +
> + /* Get the first word. */
> + REG_L t1, 0(t0)
> +
> + /*
> + * Shift away the partial data we loaded to remove the irrelevant bytes
> + * preceding the string with the effect of adding NUL bytes at the
> + * end of the string's first word.
> + */
> + SHIFT t1, t1, t2
> +
> + /* Convert non-NUL into 0xff and NUL into 0x00. */
> + orc.b t1, t1
> +
> + /* Convert non-NUL into 0x00 and NUL into 0xff. */
> + not t1, t1
> +
> + /*
> + * Search for the first set bit (corresponding to a NUL byte in the
> + * original chunk).
> + */
> + CZ t1, t1
> +
> + /*
> + * The first chunk is special: compare against the number
> + * of valid bytes in this chunk.
> + */
> + srli a0, t1, 3
> +
> + /* Limit the result by maxlen. */
> + bleu a1, a0, 3f
minu a0, a0, a1
> +
> + bgtu t3, a0, 2f
> +
> + /* Prepare for the word comparison loop. */
> + addi t2, t0, SZREG
> + li t3, -1
> +
> + /*
> + * Our critical loop is 4 instructions and processes data in
> + * 4 byte or 8 byte chunks.
> + */
> + .p2align 3
> +1:
> + REG_L t1, SZREG(t0)
> + addi t0, t0, SZREG
> + orc.b t1, t1
> + bgeu t0, t4, 4f
> + beq t1, t3, 1b
> +4:
> + not t1, t1
> + CZ t1, t1
> + srli t1, t1, 3
> +
> + /* Get number of processed bytes. */
> + sub t2, t0, t2
> +
> + /* Add number of characters in the first word. */
> + add a0, a0, t2
> +
> + /* Add number of characters in the last word. */
> + add a0, a0, t1
> +
> + /* Ensure the final result does not exceed maxlen. */
> + bgeu a0, a1, 3f
minu a0, a0, a1
> +2:
> + ret
> +3:
> + mv a0, a1
> + ret
> +
> +.option pop
--
Qingfang
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v3 0/8] riscv: optimize string functions and add kunit tests
2026-01-21 7:01 ` Andy Shevchenko
@ 2026-01-21 8:12 ` Feng Jiang
2026-01-21 10:57 ` David Laight
1 sibling, 0 replies; 29+ messages in thread
From: Feng Jiang @ 2026-01-21 8:12 UTC (permalink / raw)
To: Andy Shevchenko
Cc: Andy Shevchenko, pjw, palmer, aou, alex, akpm, kees, andy,
ebiggers, martin.petersen, ardb, charlie, conor.dooley, ajones,
linus.walleij, nathan, linux-riscv, linux-kernel, linux-hardening
On 2026/1/21 15:01, Andy Shevchenko wrote:
> On Wed, Jan 21, 2026 at 8:44 AM Feng Jiang <jiangfeng@kylinos.cn> wrote:
>> On 2026/1/20 15:36, Andy Shevchenko wrote:
>>> On Tue, Jan 20, 2026 at 02:58:44PM +0800, Feng Jiang wrote:
>
> ...
>>>> word-at-a-time logic, showing significant gains as the string length
>>>> increases.
>>>
>>> Hmm... Have you tried to optimise the generic implementation to use
>>> word-at-a-time logic and compare?
>>
>> Regarding the generic implementation, even if we were to optimize the C code
>> to use word-at-a-time logic (the has_zero() style bit-manipulation), it still
>> wouldn't match the Zbb version's efficiency.
>>
>> The traditional C-based word-level detection requires a sequence of arithmetic
>> operations to identify NUL bytes. In contrast, the RISC-V orc.b instruction
>> collapses this entire check into a single hardware cycle. I've focused on this
>> architectural approach to fully leverage these specific Zbb features, which
>> provides a level of instruction density that generic C math cannot achieve.
>
> I understand that. My point is if we move the generic implementation
> to use word-at-a-time technique the difference should not go 4x,
> right? Perhaps 1.5x or so. I believe this will be a very useful
> exercise.
>
That is a very insightful point, thanks for the suggestion. I'll look into
optimizing the generic string library as a follow-up task to see if we can
bring some improvements there as well.
Thanks again for the guidance.
--
With Best Regards,
Feng Jiang
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v3 0/8] riscv: optimize string functions and add kunit tests
2026-01-21 7:01 ` Andy Shevchenko
2026-01-21 8:12 ` Feng Jiang
@ 2026-01-21 10:57 ` David Laight
2026-01-23 3:12 ` Feng Jiang
1 sibling, 1 reply; 29+ messages in thread
From: David Laight @ 2026-01-21 10:57 UTC (permalink / raw)
To: Andy Shevchenko
Cc: Feng Jiang, Andy Shevchenko, pjw, palmer, aou, alex, akpm, kees,
andy, ebiggers, martin.petersen, ardb, charlie, conor.dooley,
ajones, linus.walleij, nathan, linux-riscv, linux-kernel,
linux-hardening
On Wed, 21 Jan 2026 09:01:29 +0200
Andy Shevchenko <andy.shevchenko@gmail.com> wrote:
...
> I understand that. My point is if we move the generic implementation
> to use word-at-a-time technique the difference should not go 4x,
> right? Perhaps 1.5x or so. I believe this will be a very useful
> exercise.
I posted a version earlier.
After the initial setup (aligning the base address and loading
some constants the loop on x86-64 is 7 instructions (should be similar
for other architectures).
I think it will execute in 4 clocks.
You then need to find the byte in the word, easy enough on LE with
a fast ffs() - but harder otherwise.
The real problem is the cost for short strings.
Like memcpy() you need a hint from the source of the 'expected' length
(as a compile-time constant) to compile-time select the algorithm.
OTOH:
for (;;) {
if (!ptr[0]) return ptr - start;
ptr += 2;
while (ptr[-1]);
return ptr - start - 1;
has two 'load+compare+branch' and one add per loop.
On x86 that might all overlap and give you a two-clock loop
that checks one byte every clock - faster than 'rep scasb'.
(You can get a two clock loop, but not a 1 clock loop.)
I think unrolling further will make little/no difference.
The break-even for the word-at-a-time version is probably at least 64
characters.
David
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v3 6/8] riscv: lib: add strnlen implementation
2026-01-21 7:24 ` Qingfang Deng
@ 2026-01-23 1:28 ` Feng Jiang
0 siblings, 0 replies; 29+ messages in thread
From: Feng Jiang @ 2026-01-23 1:28 UTC (permalink / raw)
To: Qingfang Deng
Cc: pjw, palmer, aou, alex, akpm, kees, andy, ebiggers,
martin.petersen, ardb, charlie, conor.dooley, ajones,
linus.walleij, nathan, linux-riscv, linux-kernel, linux-hardening,
Joel Stanley
On 2026/1/21 15:24, Qingfang Deng wrote:
> On Tue, 20 Jan 2026 14:58:50 +0800, Feng Jiang wrote:
>> diff --git a/arch/riscv/lib/strnlen.S b/arch/riscv/lib/strnlen.S
>
> Branches that test maxlen can be replaced with Zbb minu instruction.
> (see below)
>
...
>> + /*
>> + * The first chunk is special: compare against the number
>> + * of valid bytes in this chunk.
>> + */
>> + srli a0, t1, 3
>> +
>> + /* Limit the result by maxlen. */
>> + bleu a1, a0, 3f
>
> minu a0, a0, a1
>
>> +
>> + bgtu t3, a0, 2f
>> +
>> + /* Prepare for the word comparison loop. */
>> + addi t2, t0, SZREG
>> + li t3, -1
>> +
>> + /*
>> + * Our critical loop is 4 instructions and processes data in
>> + * 4 byte or 8 byte chunks.
>> + */
>> + .p2align 3
>> +1:
>> + REG_L t1, SZREG(t0)
>> + addi t0, t0, SZREG
>> + orc.b t1, t1
>> + bgeu t0, t4, 4f
>> + beq t1, t3, 1b
>> +4:
>> + not t1, t1
>> + CZ t1, t1
>> + srli t1, t1, 3
>> +
>> + /* Get number of processed bytes. */
>> + sub t2, t0, t2
>> +
>> + /* Add number of characters in the first word. */
>> + add a0, a0, t2
>> +
>> + /* Add number of characters in the last word. */
>> + add a0, a0, t1
>> +
>> + /* Ensure the final result does not exceed maxlen. */
>> + bgeu a0, a1, 3f
>
> minu a0, a0, a1
>
Thanks for the great suggestion! I see your point now—using minu is indeed a much
more elegant and efficient way to handle the maxlen constraint. It nicely eliminates
unnecessary branches and simplifies the code while still allowing for early returns.
I'll incorporate this into a v4 patch and add a Suggested-by tag for you. Thanks
again for your insightful review!
--
With Best Regards,
Feng Jiang
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v3 0/8] riscv: optimize string functions and add kunit tests
2026-01-21 10:57 ` David Laight
@ 2026-01-23 3:12 ` Feng Jiang
2026-01-23 10:16 ` David Laight
0 siblings, 1 reply; 29+ messages in thread
From: Feng Jiang @ 2026-01-23 3:12 UTC (permalink / raw)
To: David Laight, Andy Shevchenko
Cc: Andy Shevchenko, pjw, palmer, aou, alex, akpm, kees, andy,
ebiggers, martin.petersen, ardb, charlie, conor.dooley, ajones,
linus.walleij, nathan, linux-riscv, linux-kernel, linux-hardening
On 2026/1/21 18:57, David Laight wrote:
> On Wed, 21 Jan 2026 09:01:29 +0200
> Andy Shevchenko <andy.shevchenko@gmail.com> wrote:
>
> ...
>> I understand that. My point is if we move the generic implementation
>> to use word-at-a-time technique the difference should not go 4x,
>> right? Perhaps 1.5x or so. I believe this will be a very useful
>> exercise.
>
> I posted a version earlier.
>
> After the initial setup (aligning the base address and loading
> some constants the loop on x86-64 is 7 instructions (should be similar
> for other architectures).
> I think it will execute in 4 clocks.
> You then need to find the byte in the word, easy enough on LE with
> a fast ffs() - but harder otherwise.
> The real problem is the cost for short strings.
> Like memcpy() you need a hint from the source of the 'expected' length
> (as a compile-time constant) to compile-time select the algorithm.
>
> OTOH:
> for (;;) {
> if (!ptr[0]) return ptr - start;
> ptr += 2;
> while (ptr[-1]);
> return ptr - start - 1;
> has two 'load+compare+branch' and one add per loop.
> On x86 that might all overlap and give you a two-clock loop
> that checks one byte every clock - faster than 'rep scasb'.
> (You can get a two clock loop, but not a 1 clock loop.)
> I think unrolling further will make little/no difference.
>
> The break-even for the word-at-a-time version is probably at least 64
> characters.
>
Thanks for the profound analysis and the detailed suggestions.
I am still exploring some of the finer points of these low-level performance
trade-offs, so your input is very helpful. I will definitely spend some time
studying this further and experiment with the approaches you mentioned.
Thanks again for your help!
--
With Best Regards,
Feng Jiang
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v3 0/8] riscv: optimize string functions and add kunit tests
2026-01-23 3:12 ` Feng Jiang
@ 2026-01-23 10:16 ` David Laight
0 siblings, 0 replies; 29+ messages in thread
From: David Laight @ 2026-01-23 10:16 UTC (permalink / raw)
To: Feng Jiang
Cc: Andy Shevchenko, Andy Shevchenko, pjw, palmer, aou, alex, akpm,
kees, andy, ebiggers, martin.petersen, ardb, charlie,
conor.dooley, ajones, linus.walleij, nathan, linux-riscv,
linux-kernel, linux-hardening
On Fri, 23 Jan 2026 11:12:00 +0800
Feng Jiang <jiangfeng@kylinos.cn> wrote:
>...
> I am still exploring some of the finer points of these low-level performance
> trade-offs, so your input is very helpful. I will definitely spend some time
> studying this further and experiment with the approaches you mentioned.
You can spend a long time micro-optimising small functions.
I know, I've done it....
Then you need to test on as many cpu variants as you can find.
David
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2026-01-23 10:16 UTC | newest]
Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-20 6:58 [PATCH v3 0/8] riscv: optimize string functions and add kunit tests Feng Jiang
2026-01-20 6:58 ` [PATCH v3 1/8] lib/string_kunit: add correctness test for strlen Feng Jiang
2026-01-20 7:28 ` Andy Shevchenko
2026-01-20 6:58 ` [PATCH v3 2/8] lib/string_kunit: add correctness test for strnlen Feng Jiang
2026-01-20 7:29 ` Andy Shevchenko
2026-01-20 6:58 ` [PATCH v3 3/8] lib/string_kunit: add correctness test for strrchr() Feng Jiang
2026-01-20 7:30 ` Andy Shevchenko
2026-01-20 6:58 ` [PATCH v3 4/8] lib/string_kunit: add performance benchmarks for strlen Feng Jiang
2026-01-20 7:46 ` Andy Shevchenko
2026-01-21 5:45 ` Feng Jiang
2026-01-20 6:58 ` [PATCH v3 5/8] lib/string_kunit: extend benchmarks to strnlen and chr searches Feng Jiang
2026-01-20 7:48 ` Andy Shevchenko
2026-01-21 5:48 ` Feng Jiang
2026-01-20 6:58 ` [PATCH v3 6/8] riscv: lib: add strnlen implementation Feng Jiang
2026-01-20 7:31 ` Andy Shevchenko
2026-01-21 5:52 ` Feng Jiang
2026-01-21 7:24 ` Qingfang Deng
2026-01-23 1:28 ` Feng Jiang
2026-01-20 6:58 ` [PATCH v3 7/8] riscv: lib: add strchr implementation Feng Jiang
2026-01-20 7:31 ` Andy Shevchenko
2026-01-20 6:58 ` [PATCH v3 8/8] riscv: lib: add strrchr implementation Feng Jiang
2026-01-20 7:32 ` Andy Shevchenko
2026-01-20 7:36 ` [PATCH v3 0/8] riscv: optimize string functions and add kunit tests Andy Shevchenko
2026-01-21 6:44 ` Feng Jiang
2026-01-21 7:01 ` Andy Shevchenko
2026-01-21 8:12 ` Feng Jiang
2026-01-21 10:57 ` David Laight
2026-01-23 3:12 ` Feng Jiang
2026-01-23 10:16 ` David Laight
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox