public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] riscv: Cleanup and deduplicate unaligned access speed probe
@ 2026-02-11 17:30 Nam Cao
  2026-02-11 17:30 ` [PATCH 1/5] riscv: Clean up & optimize unaligned scalar access probe Nam Cao
                   ` (5 more replies)
  0 siblings, 6 replies; 9+ messages in thread
From: Nam Cao @ 2026-02-11 17:30 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Andrew Jones, Clément Léger, linux-riscv, linux-kernel
  Cc: Nam Cao

Hi,

This series does some minor cleanups and deduplication of riscv unaligned
access speed probe.

 arch/riscv/kernel/unaligned_access_speed.c | 221 ++++++++-------------
 1 file changed, 85 insertions(+), 136 deletions(-)

Nam Cao (5):
  riscv: Clean up & optimize unaligned scalar access probe
  riscv: Split out measure_cycles() for reuse
  riscv: Reuse measure_cycles() in check_vector_unaligned_access()
  riscv: Split out compare_unaligned_access()
  riscv: Reuse compare_unaligned_access() in
    check_vector_unaligned_access()

-- 
2.47.3


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/5] riscv: Clean up & optimize unaligned scalar access probe
  2026-02-11 17:30 [PATCH 0/5] riscv: Cleanup and deduplicate unaligned access speed probe Nam Cao
@ 2026-02-11 17:30 ` Nam Cao
  2026-02-11 17:30 ` [PATCH 2/5] riscv: Split out measure_cycles() for reuse Nam Cao
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Nam Cao @ 2026-02-11 17:30 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Andrew Jones, Clément Léger, linux-riscv, linux-kernel
  Cc: Nam Cao, Sebastian Andrzej Siewior

check_unaligned_access_speed_all_cpus() is more complicated than it should
be:

  - It uses on_each_cpu() to probe unaligned memory access on all CPUs but
    excludes CPU0 with a check in the callback function. So an IPI to CPU0
    is wasted.

  - Probing on CPU0 is done with smp_call_on_cpu(), which is not as fast as
    on_each_cpu().

The reason for this design is because the probe is timed with jiffies.
Therefore on_each_cpu() excludes CPU0 because that CPU needs to tend to
jiffies.

Instead, replace jiffies usage with ktime_get_mono_fast_ns(). With jiffies
out of the way, on_each_cpu() can be used for all CPUs and
smp_call_on_cpu() can be dropped.

To make ktime_get_mono_fast_ns() usable, move this probe to late_initcall.
Anything after clocksource's fs_initcall works, but avoid depending on
clocksource staying at fs_initcall.

The choice of probe time is now 8000000 ns, which is the same as before (2
jiffies) for riscv defconfig. This is excessive for the CPUs I have, and
probably should be reduced; but that's a different discussion.

Suggested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Nam Cao <namcao@linutronix.de>
---
 arch/riscv/kernel/unaligned_access_speed.c | 28 ++++++++--------------
 1 file changed, 10 insertions(+), 18 deletions(-)

diff --git a/arch/riscv/kernel/unaligned_access_speed.c b/arch/riscv/kernel/unaligned_access_speed.c
index 70b5e6927620..8b744c4a41ea 100644
--- a/arch/riscv/kernel/unaligned_access_speed.c
+++ b/arch/riscv/kernel/unaligned_access_speed.c
@@ -17,6 +17,7 @@
 #include "copy-unaligned.h"
 
 #define MISALIGNED_ACCESS_JIFFIES_LG2 1
+#define MISALIGNED_ACCESS_NS 8000000
 #define MISALIGNED_BUFFER_SIZE 0x4000
 #define MISALIGNED_BUFFER_ORDER get_order(MISALIGNED_BUFFER_SIZE)
 #define MISALIGNED_COPY_SIZE ((MISALIGNED_BUFFER_SIZE / 2) - 0x80)
@@ -36,8 +37,8 @@ static int check_unaligned_access(void *param)
 	u64 start_cycles, end_cycles;
 	u64 word_cycles;
 	u64 byte_cycles;
+	u64 start_ns;
 	int ratio;
-	unsigned long start_jiffies, now;
 	struct page *page = param;
 	void *dst;
 	void *src;
@@ -55,15 +56,13 @@ static int check_unaligned_access(void *param)
 	/* Do a warmup. */
 	__riscv_copy_words_unaligned(dst, src, MISALIGNED_COPY_SIZE);
 	preempt_disable();
-	start_jiffies = jiffies;
-	while ((now = jiffies) == start_jiffies)
-		cpu_relax();
 
 	/*
 	 * For a fixed amount of time, repeatedly try the function, and take
 	 * the best time in cycles as the measurement.
 	 */
-	while (time_before(jiffies, now + (1 << MISALIGNED_ACCESS_JIFFIES_LG2))) {
+	start_ns = ktime_get_mono_fast_ns();
+	while (ktime_get_mono_fast_ns() < start_ns + MISALIGNED_ACCESS_NS) {
 		start_cycles = get_cycles64();
 		/* Ensure the CSR read can't reorder WRT to the copy. */
 		mb();
@@ -77,11 +76,9 @@ static int check_unaligned_access(void *param)
 
 	byte_cycles = -1ULL;
 	__riscv_copy_bytes_unaligned(dst, src, MISALIGNED_COPY_SIZE);
-	start_jiffies = jiffies;
-	while ((now = jiffies) == start_jiffies)
-		cpu_relax();
 
-	while (time_before(jiffies, now + (1 << MISALIGNED_ACCESS_JIFFIES_LG2))) {
+	start_ns = ktime_get_mono_fast_ns();
+	while (ktime_get_mono_fast_ns() < start_ns + MISALIGNED_ACCESS_NS) {
 		start_cycles = get_cycles64();
 		mb();
 		__riscv_copy_bytes_unaligned(dst, src, MISALIGNED_COPY_SIZE);
@@ -125,13 +122,12 @@ static int check_unaligned_access(void *param)
 	return 0;
 }
 
-static void __init check_unaligned_access_nonboot_cpu(void *param)
+static void __init _check_unaligned_access(void *param)
 {
 	unsigned int cpu = smp_processor_id();
 	struct page **pages = param;
 
-	if (smp_processor_id() != 0)
-		check_unaligned_access(pages[cpu]);
+	check_unaligned_access(pages[cpu]);
 }
 
 /* Measure unaligned access speed on all CPUs present at boot in parallel. */
@@ -158,11 +154,7 @@ static void __init check_unaligned_access_speed_all_cpus(void)
 		}
 	}
 
-	/* Check everybody except 0, who stays behind to tend jiffies. */
-	on_each_cpu(check_unaligned_access_nonboot_cpu, bufs, 1);
-
-	/* Check core 0. */
-	smp_call_on_cpu(0, check_unaligned_access, bufs[0], true);
+	on_each_cpu(_check_unaligned_access, bufs, 1);
 
 out:
 	for_each_cpu(cpu, cpu_online_mask) {
@@ -494,4 +486,4 @@ static int __init check_unaligned_access_all_cpus(void)
 	return 0;
 }
 
-arch_initcall(check_unaligned_access_all_cpus);
+late_initcall(check_unaligned_access_all_cpus);
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/5] riscv: Split out measure_cycles() for reuse
  2026-02-11 17:30 [PATCH 0/5] riscv: Cleanup and deduplicate unaligned access speed probe Nam Cao
  2026-02-11 17:30 ` [PATCH 1/5] riscv: Clean up & optimize unaligned scalar access probe Nam Cao
@ 2026-02-11 17:30 ` Nam Cao
  2026-02-11 17:30 ` [PATCH 3/5] riscv: Reuse measure_cycles() in check_vector_unaligned_access() Nam Cao
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Nam Cao @ 2026-02-11 17:30 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Andrew Jones, Clément Léger, linux-riscv, linux-kernel
  Cc: Nam Cao

Byte cycle measurement and word cycle measurement of scalar misaligned
access are very similar. Split these parts out into a common
measure_cycles() function to avoid duplication.

This function will also be reused for vector misaligned access probe in a
follow-up commit.

Signed-off-by: Nam Cao <namcao@linutronix.de>
---
 arch/riscv/kernel/unaligned_access_speed.c | 69 +++++++++++-----------
 1 file changed, 33 insertions(+), 36 deletions(-)

diff --git a/arch/riscv/kernel/unaligned_access_speed.c b/arch/riscv/kernel/unaligned_access_speed.c
index 8b744c4a41ea..b964a666a973 100644
--- a/arch/riscv/kernel/unaligned_access_speed.c
+++ b/arch/riscv/kernel/unaligned_access_speed.c
@@ -31,30 +31,15 @@ static long unaligned_vector_speed_param = RISCV_HWPROBE_MISALIGNED_VECTOR_UNKNO
 static cpumask_t fast_misaligned_access;
 
 #ifdef CONFIG_RISCV_PROBE_UNALIGNED_ACCESS
-static int check_unaligned_access(void *param)
+static u64 measure_cycles(void (*func)(void *dst, const void *src, size_t len),
+			  void *dst, void *src, size_t len)
 {
-	int cpu = smp_processor_id();
-	u64 start_cycles, end_cycles;
-	u64 word_cycles;
-	u64 byte_cycles;
+	u64 start_cycles, end_cycles, cycles = -1ULL;
 	u64 start_ns;
-	int ratio;
-	struct page *page = param;
-	void *dst;
-	void *src;
-	long speed = RISCV_HWPROBE_MISALIGNED_SCALAR_SLOW;
 
-	if (per_cpu(misaligned_access_speed, cpu) != RISCV_HWPROBE_MISALIGNED_SCALAR_UNKNOWN)
-		return 0;
-
-	/* Make an unaligned destination buffer. */
-	dst = (void *)((unsigned long)page_address(page) | 0x1);
-	/* Unalign src as well, but differently (off by 1 + 2 = 3). */
-	src = dst + (MISALIGNED_BUFFER_SIZE / 2);
-	src += 2;
-	word_cycles = -1ULL;
 	/* Do a warmup. */
-	__riscv_copy_words_unaligned(dst, src, MISALIGNED_COPY_SIZE);
+	func(dst, src, len);
+
 	preempt_disable();
 
 	/*
@@ -66,29 +51,41 @@ static int check_unaligned_access(void *param)
 		start_cycles = get_cycles64();
 		/* Ensure the CSR read can't reorder WRT to the copy. */
 		mb();
-		__riscv_copy_words_unaligned(dst, src, MISALIGNED_COPY_SIZE);
+		func(dst, src, len);
 		/* Ensure the copy ends before the end time is snapped. */
 		mb();
 		end_cycles = get_cycles64();
-		if ((end_cycles - start_cycles) < word_cycles)
-			word_cycles = end_cycles - start_cycles;
+		if ((end_cycles - start_cycles) < cycles)
+			cycles = end_cycles - start_cycles;
 	}
 
-	byte_cycles = -1ULL;
-	__riscv_copy_bytes_unaligned(dst, src, MISALIGNED_COPY_SIZE);
+	preempt_enable();
 
-	start_ns = ktime_get_mono_fast_ns();
-	while (ktime_get_mono_fast_ns() < start_ns + MISALIGNED_ACCESS_NS) {
-		start_cycles = get_cycles64();
-		mb();
-		__riscv_copy_bytes_unaligned(dst, src, MISALIGNED_COPY_SIZE);
-		mb();
-		end_cycles = get_cycles64();
-		if ((end_cycles - start_cycles) < byte_cycles)
-			byte_cycles = end_cycles - start_cycles;
-	}
+	return cycles;
+}
 
-	preempt_enable();
+static int check_unaligned_access(void *param)
+{
+	int cpu = smp_processor_id();
+	u64 word_cycles;
+	u64 byte_cycles;
+	int ratio;
+	struct page *page = param;
+	void *dst;
+	void *src;
+	long speed = RISCV_HWPROBE_MISALIGNED_SCALAR_SLOW;
+
+	if (per_cpu(misaligned_access_speed, cpu) != RISCV_HWPROBE_MISALIGNED_SCALAR_UNKNOWN)
+		return 0;
+
+	/* Make an unaligned destination buffer. */
+	dst = (void *)((unsigned long)page_address(page) | 0x1);
+	/* Unalign src as well, but differently (off by 1 + 2 = 3). */
+	src = dst + (MISALIGNED_BUFFER_SIZE / 2);
+	src += 2;
+
+	word_cycles = measure_cycles(__riscv_copy_words_unaligned, dst, src, MISALIGNED_COPY_SIZE);
+	byte_cycles = measure_cycles(__riscv_copy_bytes_unaligned, dst, src, MISALIGNED_COPY_SIZE);
 
 	/* Don't divide by zero. */
 	if (!word_cycles || !byte_cycles) {
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/5] riscv: Reuse measure_cycles() in check_vector_unaligned_access()
  2026-02-11 17:30 [PATCH 0/5] riscv: Cleanup and deduplicate unaligned access speed probe Nam Cao
  2026-02-11 17:30 ` [PATCH 1/5] riscv: Clean up & optimize unaligned scalar access probe Nam Cao
  2026-02-11 17:30 ` [PATCH 2/5] riscv: Split out measure_cycles() for reuse Nam Cao
@ 2026-02-11 17:30 ` Nam Cao
  2026-02-11 17:30 ` [PATCH 4/5] riscv: Split out compare_unaligned_access() Nam Cao
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Nam Cao @ 2026-02-11 17:30 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Andrew Jones, Clément Léger, linux-riscv, linux-kernel
  Cc: Nam Cao

check_vector_unaligned_access() duplicates the logic in measure_cycles().

Reuse measure_cycles() and deduplicate.

Signed-off-by: Nam Cao <namcao@linutronix.de>
---
 arch/riscv/kernel/unaligned_access_speed.c | 54 ++++------------------
 1 file changed, 8 insertions(+), 46 deletions(-)

diff --git a/arch/riscv/kernel/unaligned_access_speed.c b/arch/riscv/kernel/unaligned_access_speed.c
index b964a666a973..c0d39c4b2150 100644
--- a/arch/riscv/kernel/unaligned_access_speed.c
+++ b/arch/riscv/kernel/unaligned_access_speed.c
@@ -16,7 +16,6 @@
 
 #include "copy-unaligned.h"
 
-#define MISALIGNED_ACCESS_JIFFIES_LG2 1
 #define MISALIGNED_ACCESS_NS 8000000
 #define MISALIGNED_BUFFER_SIZE 0x4000
 #define MISALIGNED_BUFFER_ORDER get_order(MISALIGNED_BUFFER_SIZE)
@@ -30,9 +29,9 @@ static long unaligned_vector_speed_param = RISCV_HWPROBE_MISALIGNED_VECTOR_UNKNO
 
 static cpumask_t fast_misaligned_access;
 
-#ifdef CONFIG_RISCV_PROBE_UNALIGNED_ACCESS
-static u64 measure_cycles(void (*func)(void *dst, const void *src, size_t len),
-			  void *dst, void *src, size_t len)
+static u64 __maybe_unused
+measure_cycles(void (*func)(void *dst, const void *src, size_t len),
+	       void *dst, void *src, size_t len)
 {
 	u64 start_cycles, end_cycles, cycles = -1ULL;
 	u64 start_ns;
@@ -64,6 +63,7 @@ static u64 measure_cycles(void (*func)(void *dst, const void *src, size_t len),
 	return cycles;
 }
 
+#ifdef CONFIG_RISCV_PROBE_UNALIGNED_ACCESS
 static int check_unaligned_access(void *param)
 {
 	int cpu = smp_processor_id();
@@ -270,11 +270,9 @@ static int riscv_offline_cpu(unsigned int cpu)
 static void check_vector_unaligned_access(struct work_struct *work __always_unused)
 {
 	int cpu = smp_processor_id();
-	u64 start_cycles, end_cycles;
 	u64 word_cycles;
 	u64 byte_cycles;
 	int ratio;
-	unsigned long start_jiffies, now;
 	struct page *page;
 	void *dst;
 	void *src;
@@ -294,50 +292,14 @@ static void check_vector_unaligned_access(struct work_struct *work __always_unus
 	/* Unalign src as well, but differently (off by 1 + 2 = 3). */
 	src = dst + (MISALIGNED_BUFFER_SIZE / 2);
 	src += 2;
-	word_cycles = -1ULL;
 
-	/* Do a warmup. */
 	kernel_vector_begin();
-	__riscv_copy_vec_words_unaligned(dst, src, MISALIGNED_COPY_SIZE);
-
-	start_jiffies = jiffies;
-	while ((now = jiffies) == start_jiffies)
-		cpu_relax();
-
-	/*
-	 * For a fixed amount of time, repeatedly try the function, and take
-	 * the best time in cycles as the measurement.
-	 */
-	while (time_before(jiffies, now + (1 << MISALIGNED_ACCESS_JIFFIES_LG2))) {
-		start_cycles = get_cycles64();
-		/* Ensure the CSR read can't reorder WRT to the copy. */
-		mb();
-		__riscv_copy_vec_words_unaligned(dst, src, MISALIGNED_COPY_SIZE);
-		/* Ensure the copy ends before the end time is snapped. */
-		mb();
-		end_cycles = get_cycles64();
-		if ((end_cycles - start_cycles) < word_cycles)
-			word_cycles = end_cycles - start_cycles;
-	}
-
-	byte_cycles = -1ULL;
-	__riscv_copy_vec_bytes_unaligned(dst, src, MISALIGNED_COPY_SIZE);
-	start_jiffies = jiffies;
-	while ((now = jiffies) == start_jiffies)
-		cpu_relax();
 
-	while (time_before(jiffies, now + (1 << MISALIGNED_ACCESS_JIFFIES_LG2))) {
-		start_cycles = get_cycles64();
-		/* Ensure the CSR read can't reorder WRT to the copy. */
-		mb();
-		__riscv_copy_vec_bytes_unaligned(dst, src, MISALIGNED_COPY_SIZE);
-		/* Ensure the copy ends before the end time is snapped. */
-		mb();
-		end_cycles = get_cycles64();
-		if ((end_cycles - start_cycles) < byte_cycles)
-			byte_cycles = end_cycles - start_cycles;
-	}
+	word_cycles = measure_cycles(__riscv_copy_vec_words_unaligned,
+				     dst, src, MISALIGNED_COPY_SIZE);
 
+	byte_cycles = measure_cycles(__riscv_copy_vec_bytes_unaligned,
+				     dst, src, MISALIGNED_COPY_SIZE);
 	kernel_vector_end();
 
 	/* Don't divide by zero. */
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4/5] riscv: Split out compare_unaligned_access()
  2026-02-11 17:30 [PATCH 0/5] riscv: Cleanup and deduplicate unaligned access speed probe Nam Cao
                   ` (2 preceding siblings ...)
  2026-02-11 17:30 ` [PATCH 3/5] riscv: Reuse measure_cycles() in check_vector_unaligned_access() Nam Cao
@ 2026-02-11 17:30 ` Nam Cao
  2026-02-11 17:30 ` [PATCH 5/5] riscv: Reuse compare_unaligned_access() in check_vector_unaligned_access() Nam Cao
  2026-04-03 18:30 ` [PATCH 0/5] riscv: Cleanup and deduplicate unaligned access speed probe patchwork-bot+linux-riscv
  5 siblings, 0 replies; 9+ messages in thread
From: Nam Cao @ 2026-02-11 17:30 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Andrew Jones, Clément Léger, linux-riscv, linux-kernel
  Cc: Nam Cao

Scalar misaligned access probe and vector misaligned access probe share
very similar code. Split out this similar part from scalar probe into
compare_unaligned_access(), which will be reused for vector probe in a
follow-up commit.

Signed-off-by: Nam Cao <namcao@linutronix.de>
---
 arch/riscv/kernel/unaligned_access_speed.c | 59 +++++++++++++++-------
 1 file changed, 40 insertions(+), 19 deletions(-)

diff --git a/arch/riscv/kernel/unaligned_access_speed.c b/arch/riscv/kernel/unaligned_access_speed.c
index c0d39c4b2150..b3ed74b71d3e 100644
--- a/arch/riscv/kernel/unaligned_access_speed.c
+++ b/arch/riscv/kernel/unaligned_access_speed.c
@@ -63,58 +63,79 @@ measure_cycles(void (*func)(void *dst, const void *src, size_t len),
 	return cycles;
 }
 
-#ifdef CONFIG_RISCV_PROBE_UNALIGNED_ACCESS
-static int check_unaligned_access(void *param)
+/*
+ * Return:
+ *     1 if unaligned accesses are fast
+ *     0 if unaligned accesses are slow
+ *    -1 if check cannot be done
+ */
+static int __maybe_unused
+compare_unaligned_access(void (*word_copy)(void *dst, const void *src, size_t len),
+			 void (*byte_copy)(void *dst, const void *src, size_t len),
+			 void *buf)
 {
 	int cpu = smp_processor_id();
 	u64 word_cycles;
 	u64 byte_cycles;
+	void *dst, *src;
+	bool fast;
 	int ratio;
-	struct page *page = param;
-	void *dst;
-	void *src;
-	long speed = RISCV_HWPROBE_MISALIGNED_SCALAR_SLOW;
-
-	if (per_cpu(misaligned_access_speed, cpu) != RISCV_HWPROBE_MISALIGNED_SCALAR_UNKNOWN)
-		return 0;
 
 	/* Make an unaligned destination buffer. */
-	dst = (void *)((unsigned long)page_address(page) | 0x1);
+	dst = (void *)((unsigned long)buf | 0x1);
 	/* Unalign src as well, but differently (off by 1 + 2 = 3). */
 	src = dst + (MISALIGNED_BUFFER_SIZE / 2);
 	src += 2;
 
-	word_cycles = measure_cycles(__riscv_copy_words_unaligned, dst, src, MISALIGNED_COPY_SIZE);
-	byte_cycles = measure_cycles(__riscv_copy_bytes_unaligned, dst, src, MISALIGNED_COPY_SIZE);
+	word_cycles = measure_cycles(word_copy, dst, src, MISALIGNED_COPY_SIZE);
+	byte_cycles = measure_cycles(byte_copy, dst, src, MISALIGNED_COPY_SIZE);
 
 	/* Don't divide by zero. */
 	if (!word_cycles || !byte_cycles) {
 		pr_warn("cpu%d: rdtime lacks granularity needed to measure unaligned access speed\n",
 			cpu);
 
-		return 0;
+		return -1;
 	}
 
-	if (word_cycles < byte_cycles)
-		speed = RISCV_HWPROBE_MISALIGNED_SCALAR_FAST;
+	fast = word_cycles < byte_cycles;
 
 	ratio = div_u64((byte_cycles * 100), word_cycles);
 	pr_info("cpu%d: Ratio of byte access time to unaligned word access is %d.%02d, unaligned accesses are %s\n",
 		cpu,
 		ratio / 100,
 		ratio % 100,
-		(speed == RISCV_HWPROBE_MISALIGNED_SCALAR_FAST) ? "fast" : "slow");
+		fast ? "fast" : "slow");
 
-	per_cpu(misaligned_access_speed, cpu) = speed;
+	return fast;
+}
+
+#ifdef CONFIG_RISCV_PROBE_UNALIGNED_ACCESS
+static int check_unaligned_access(struct page *page)
+{
+	void *buf = page_address(page);
+	int cpu = smp_processor_id();
+	int ret;
+
+	if (per_cpu(misaligned_access_speed, cpu) != RISCV_HWPROBE_MISALIGNED_SCALAR_UNKNOWN)
+		return 0;
+
+	ret = compare_unaligned_access(__riscv_copy_words_unaligned,
+				       __riscv_copy_bytes_unaligned, buf);
+	if (ret < 0)
+		return 0;
 
 	/*
 	 * Set the value of fast_misaligned_access of a CPU. These operations
 	 * are atomic to avoid race conditions.
 	 */
-	if (speed == RISCV_HWPROBE_MISALIGNED_SCALAR_FAST)
+	if (ret) {
+		per_cpu(misaligned_access_speed, cpu) = RISCV_HWPROBE_MISALIGNED_SCALAR_FAST;
 		cpumask_set_cpu(cpu, &fast_misaligned_access);
-	else
+	} else {
+		per_cpu(misaligned_access_speed, cpu) = RISCV_HWPROBE_MISALIGNED_SCALAR_SLOW;
 		cpumask_clear_cpu(cpu, &fast_misaligned_access);
+	}
 
 	return 0;
 }
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 5/5] riscv: Reuse compare_unaligned_access() in check_vector_unaligned_access()
  2026-02-11 17:30 [PATCH 0/5] riscv: Cleanup and deduplicate unaligned access speed probe Nam Cao
                   ` (3 preceding siblings ...)
  2026-02-11 17:30 ` [PATCH 4/5] riscv: Split out compare_unaligned_access() Nam Cao
@ 2026-02-11 17:30 ` Nam Cao
  2026-04-03 18:30 ` [PATCH 0/5] riscv: Cleanup and deduplicate unaligned access speed probe patchwork-bot+linux-riscv
  5 siblings, 0 replies; 9+ messages in thread
From: Nam Cao @ 2026-02-11 17:30 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Andrew Jones, Clément Léger, linux-riscv, linux-kernel
  Cc: Nam Cao

check_vector_unaligned_access() duplicates the logic in
compare_unaligned_access().

Use compare_unaligned_access() and deduplicate.

Signed-off-by: Nam Cao <namcao@linutronix.de>
---
 arch/riscv/kernel/unaligned_access_speed.c | 55 +++++++---------------
 1 file changed, 16 insertions(+), 39 deletions(-)

diff --git a/arch/riscv/kernel/unaligned_access_speed.c b/arch/riscv/kernel/unaligned_access_speed.c
index b3ed74b71d3e..8a9f261dc10b 100644
--- a/arch/riscv/kernel/unaligned_access_speed.c
+++ b/arch/riscv/kernel/unaligned_access_speed.c
@@ -72,7 +72,7 @@ measure_cycles(void (*func)(void *dst, const void *src, size_t len),
 static int __maybe_unused
 compare_unaligned_access(void (*word_copy)(void *dst, const void *src, size_t len),
 			 void (*byte_copy)(void *dst, const void *src, size_t len),
-			 void *buf)
+			 void *buf, const char *type)
 {
 	int cpu = smp_processor_id();
 	u64 word_cycles;
@@ -92,8 +92,8 @@ compare_unaligned_access(void (*word_copy)(void *dst, const void *src, size_t le
 
 	/* Don't divide by zero. */
 	if (!word_cycles || !byte_cycles) {
-		pr_warn("cpu%d: rdtime lacks granularity needed to measure unaligned access speed\n",
-			cpu);
+		pr_warn("cpu%d: rdtime lacks granularity needed to measure %s unaligned access speed\n",
+			cpu, type);
 
 		return -1;
 	}
@@ -101,8 +101,9 @@ compare_unaligned_access(void (*word_copy)(void *dst, const void *src, size_t le
 	fast = word_cycles < byte_cycles;
 
 	ratio = div_u64((byte_cycles * 100), word_cycles);
-	pr_info("cpu%d: Ratio of byte access time to unaligned word access is %d.%02d, unaligned accesses are %s\n",
+	pr_info("cpu%d: %s unaligned word access speed is %d.%02dx byte access speed (%s)\n",
 		cpu,
+		type,
 		ratio / 100,
 		ratio % 100,
 		fast ? "fast" : "slow");
@@ -121,7 +122,8 @@ static int check_unaligned_access(struct page *page)
 		return 0;
 
 	ret = compare_unaligned_access(__riscv_copy_words_unaligned,
-				       __riscv_copy_bytes_unaligned, buf);
+				       __riscv_copy_bytes_unaligned,
+				       buf, "scalar");
 	if (ret < 0)
 		return 0;
 
@@ -291,13 +293,8 @@ static int riscv_offline_cpu(unsigned int cpu)
 static void check_vector_unaligned_access(struct work_struct *work __always_unused)
 {
 	int cpu = smp_processor_id();
-	u64 word_cycles;
-	u64 byte_cycles;
-	int ratio;
 	struct page *page;
-	void *dst;
-	void *src;
-	long speed = RISCV_HWPROBE_MISALIGNED_VECTOR_SLOW;
+	int ret;
 
 	if (per_cpu(vector_misaligned_access, cpu) != RISCV_HWPROBE_MISALIGNED_VECTOR_UNKNOWN)
 		return;
@@ -308,40 +305,20 @@ static void check_vector_unaligned_access(struct work_struct *work __always_unus
 		return;
 	}
 
-	/* Make an unaligned destination buffer. */
-	dst = (void *)((unsigned long)page_address(page) | 0x1);
-	/* Unalign src as well, but differently (off by 1 + 2 = 3). */
-	src = dst + (MISALIGNED_BUFFER_SIZE / 2);
-	src += 2;
-
 	kernel_vector_begin();
 
-	word_cycles = measure_cycles(__riscv_copy_vec_words_unaligned,
-				     dst, src, MISALIGNED_COPY_SIZE);
-
-	byte_cycles = measure_cycles(__riscv_copy_vec_bytes_unaligned,
-				     dst, src, MISALIGNED_COPY_SIZE);
+	ret = compare_unaligned_access(__riscv_copy_vec_words_unaligned,
+				       __riscv_copy_vec_bytes_unaligned,
+				       page_address(page), "vector");
 	kernel_vector_end();
 
-	/* Don't divide by zero. */
-	if (!word_cycles || !byte_cycles) {
-		pr_warn("cpu%d: rdtime lacks granularity needed to measure unaligned vector access speed\n",
-			cpu);
-
+	if (ret < 0)
 		goto free;
-	}
 
-	if (word_cycles < byte_cycles)
-		speed = RISCV_HWPROBE_MISALIGNED_VECTOR_FAST;
-
-	ratio = div_u64((byte_cycles * 100), word_cycles);
-	pr_info("cpu%d: Ratio of vector byte access time to vector unaligned word access is %d.%02d, unaligned accesses are %s\n",
-		cpu,
-		ratio / 100,
-		ratio % 100,
-		(speed ==  RISCV_HWPROBE_MISALIGNED_VECTOR_FAST) ? "fast" : "slow");
-
-	per_cpu(vector_misaligned_access, cpu) = speed;
+	if (ret)
+		per_cpu(vector_misaligned_access, cpu) = RISCV_HWPROBE_MISALIGNED_VECTOR_FAST;
+	else
+		per_cpu(vector_misaligned_access, cpu) = RISCV_HWPROBE_MISALIGNED_VECTOR_SLOW;
 
 free:
 	__free_pages(page, MISALIGNED_BUFFER_ORDER);
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/5] riscv: Cleanup and deduplicate unaligned access speed probe
  2026-02-11 17:30 [PATCH 0/5] riscv: Cleanup and deduplicate unaligned access speed probe Nam Cao
                   ` (4 preceding siblings ...)
  2026-02-11 17:30 ` [PATCH 5/5] riscv: Reuse compare_unaligned_access() in check_vector_unaligned_access() Nam Cao
@ 2026-04-03 18:30 ` patchwork-bot+linux-riscv
  2026-04-07  1:38   ` Michael Neuling
  5 siblings, 1 reply; 9+ messages in thread
From: patchwork-bot+linux-riscv @ 2026-04-03 18:30 UTC (permalink / raw)
  To: Nam Cao; +Cc: linux-riscv, pjw, palmer, aou, alex, ajones, cleger, linux-kernel

Hello:

This series was applied to riscv/linux.git (for-next)
by Paul Walmsley <pjw@kernel.org>:

On Wed, 11 Feb 2026 18:30:30 +0100 you wrote:
> Hi,
> 
> This series does some minor cleanups and deduplication of riscv unaligned
> access speed probe.
> 
>  arch/riscv/kernel/unaligned_access_speed.c | 221 ++++++++-------------
>  1 file changed, 85 insertions(+), 136 deletions(-)
> 
> [...]

Here is the summary with links:
  - [1/5] riscv: Clean up & optimize unaligned scalar access probe
    https://git.kernel.org/riscv/c/c202d70b2244
  - [2/5] riscv: Split out measure_cycles() for reuse
    https://git.kernel.org/riscv/c/83eb6102c71d
  - [3/5] riscv: Reuse measure_cycles() in check_vector_unaligned_access()
    https://git.kernel.org/riscv/c/b2dd256b5783
  - [4/5] riscv: Split out compare_unaligned_access()
    https://git.kernel.org/riscv/c/72f578aa02f5
  - [5/5] riscv: Reuse compare_unaligned_access() in check_vector_unaligned_access()
    https://git.kernel.org/riscv/c/eb88916053bc

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/5] riscv: Cleanup and deduplicate unaligned access speed probe
  2026-04-03 18:30 ` [PATCH 0/5] riscv: Cleanup and deduplicate unaligned access speed probe patchwork-bot+linux-riscv
@ 2026-04-07  1:38   ` Michael Neuling
  2026-04-07  7:35     ` Nam Cao
  0 siblings, 1 reply; 9+ messages in thread
From: Michael Neuling @ 2026-04-07  1:38 UTC (permalink / raw)
  To: patchwork-bot+linux-riscv
  Cc: Nam Cao, linux-riscv, pjw, palmer, aou, alex, ajones, cleger,
	linux-kernel

> This series was applied to riscv/linux.git (for-next)
> by Paul Walmsley <pjw@kernel.org>:

> Here is the summary with links:
>   - [1/5] riscv: Clean up & optimize unaligned scalar access probe
>     https://git.kernel.org/riscv/c/c202d70b2244

I think this is causing a regression (SHA1 actually 6455c6c11827) .
Fast unaligned accesses are no longer being set ever.

Analysis from Claude (Opus 4.6) with Chris Masons kernel patch review skills:
--
  > diff --git a/arch/riscv/kernel/unaligned_access_speed.c
b/arch/riscv/kernel/unaligned_access_speed.c
  > index b36a6a56f4..1f4c128d73 100644
  > --- a/arch/riscv/kernel/unaligned_access_speed.c
  > +++ b/arch/riscv/kernel/unaligned_access_speed.c

  [ ... ]

  > -arch_initcall(check_unaligned_access_all_cpus);
  > +late_initcall(check_unaligned_access_all_cpus);

  With this change, check_unaligned_access_all_cpus() now runs at
  late_initcall (level 7), but lock_and_set_unaligned_access_static_branch()
  remains at arch_initcall_sync (level 3s):

  arch/riscv/kernel/unaligned_access_speed.c:
      static int __init lock_and_set_unaligned_access_static_branch(void)
      {
              cpus_read_lock();
              set_unaligned_access_static_branches();
              cpus_read_unlock();
              return 0;
      }
      arch_initcall_sync(lock_and_set_unaligned_access_static_branch);

  Before this patch, the ordering was:

      arch_initcall:      check_unaligned_access_all_cpus()  -> measures
      arch_initcall_sync: lock_and_set_unaligned_access_static_branch()
                                                           -> sets branch

  After this patch:

      arch_initcall_sync: lock_and_set_unaligned_access_static_branch()
                                                  -> empty cpumask -> off
     late_initcall:      check_unaligned_access_all_cpus()  -> measures
                                                  -> never re-evaluates

  Does this mean fast_unaligned_access_speed_key is never enabled at boot,
  even on hardware with fast unaligned access? The comment in
  set_unaligned_access_static_branches() says "This will be called after
  check_unaligned_access_all_cpus" which is no longer true with this
  ordering change.
--
I confirmed this by booting (qemu) the below patch before and after
this change, and I get this:

Before this change fast_misaligned access is set when setting the
static branches
   [    0.117326] TEST: lock_and_set running, fast_misaligned_access weight: 4/4
After this change fast_misaligned access is unset when setting the
static branches:
  [    0.168865] TEST: lock_and_set running, fast_misaligned_access weight: 0/4

So I think this confirms the regression.

Mikey

--- a/arch/riscv/kernel/unaligned_access_speed.c
+++ b/arch/riscv/kernel/unaligned_access_speed.c
@@ @@
 static int __init lock_and_set_unaligned_access_static_branch(void)
 {
+ pr_info("TEST: lock_and_set running, fast_misaligned_access cpumask
weight: %u/%u\n",
+ cpumask_weight(&fast_misaligned_access), num_online_cpus());
  cpus_read_lock();
  set_unaligned_access_static_branches();
  cpus_read_unlock();

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/5] riscv: Cleanup and deduplicate unaligned access speed probe
  2026-04-07  1:38   ` Michael Neuling
@ 2026-04-07  7:35     ` Nam Cao
  0 siblings, 0 replies; 9+ messages in thread
From: Nam Cao @ 2026-04-07  7:35 UTC (permalink / raw)
  To: Michael Neuling, patchwork-bot+linux-riscv
  Cc: linux-riscv, pjw, palmer, aou, alex, ajones, cleger, linux-kernel

Michael Neuling <mikey@neuling.org> writes:
>> This series was applied to riscv/linux.git (for-next)
>> by Paul Walmsley <pjw@kernel.org>:
>
>> Here is the summary with links:
>>   - [1/5] riscv: Clean up & optimize unaligned scalar access probe
>>     https://git.kernel.org/riscv/c/c202d70b2244
>
> I think this is causing a regression (SHA1 actually 6455c6c11827) .
> Fast unaligned accesses are no longer being set ever.
>
> Analysis from Claude (Opus 4.6) with Chris Masons kernel patch review skills:

I should start using these AIs..

> --
>   > diff --git a/arch/riscv/kernel/unaligned_access_speed.c
> b/arch/riscv/kernel/unaligned_access_speed.c
>   > index b36a6a56f4..1f4c128d73 100644
>   > --- a/arch/riscv/kernel/unaligned_access_speed.c
>   > +++ b/arch/riscv/kernel/unaligned_access_speed.c
>
>   [ ... ]
>
>   > -arch_initcall(check_unaligned_access_all_cpus);
>   > +late_initcall(check_unaligned_access_all_cpus);
>
>   With this change, check_unaligned_access_all_cpus() now runs at
>   late_initcall (level 7), but lock_and_set_unaligned_access_static_branch()
>   remains at arch_initcall_sync (level 3s):
...
>   Does this mean fast_unaligned_access_speed_key is never enabled at boot,
>   even on hardware with fast unaligned access? The comment in
>   set_unaligned_access_static_branches() says "This will be called after
>   check_unaligned_access_all_cpus" which is no longer true with this
>   ordering change.

Thanks, you are indeed right. This affects do_csum()'s performance.

The below patch should resolve the issue. I will send a proper patch
later today after I have tested with my hardware.

Nam

diff --git a/arch/riscv/kernel/unaligned_access_speed.c b/arch/riscv/kernel/unaligned_access_speed.c
index 485ab1d105d3..96ba80e6ea32 100644
--- a/arch/riscv/kernel/unaligned_access_speed.c
+++ b/arch/riscv/kernel/unaligned_access_speed.c
@@ -244,7 +244,7 @@ static int __init lock_and_set_unaligned_access_static_branch(void)
 	return 0;
 }
 
-arch_initcall_sync(lock_and_set_unaligned_access_static_branch);
+late_initcall_sync(lock_and_set_unaligned_access_static_branch);
 
 static int riscv_online_cpu(unsigned int cpu)
 {

^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-04-07  7:35 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-11 17:30 [PATCH 0/5] riscv: Cleanup and deduplicate unaligned access speed probe Nam Cao
2026-02-11 17:30 ` [PATCH 1/5] riscv: Clean up & optimize unaligned scalar access probe Nam Cao
2026-02-11 17:30 ` [PATCH 2/5] riscv: Split out measure_cycles() for reuse Nam Cao
2026-02-11 17:30 ` [PATCH 3/5] riscv: Reuse measure_cycles() in check_vector_unaligned_access() Nam Cao
2026-02-11 17:30 ` [PATCH 4/5] riscv: Split out compare_unaligned_access() Nam Cao
2026-02-11 17:30 ` [PATCH 5/5] riscv: Reuse compare_unaligned_access() in check_vector_unaligned_access() Nam Cao
2026-04-03 18:30 ` [PATCH 0/5] riscv: Cleanup and deduplicate unaligned access speed probe patchwork-bot+linux-riscv
2026-04-07  1:38   ` Michael Neuling
2026-04-07  7:35     ` Nam Cao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox