public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] riscv: Cleanup and deduplicate unaligned access speed probe
@ 2026-02-11 17:30 Nam Cao
  2026-02-11 17:30 ` [PATCH 1/5] riscv: Clean up & optimize unaligned scalar access probe Nam Cao
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Nam Cao @ 2026-02-11 17:30 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Andrew Jones, Clément Léger, linux-riscv, linux-kernel
  Cc: Nam Cao

Hi,

This series does some minor cleanups and deduplication of riscv unaligned
access speed probe.

 arch/riscv/kernel/unaligned_access_speed.c | 221 ++++++++-------------
 1 file changed, 85 insertions(+), 136 deletions(-)

Nam Cao (5):
  riscv: Clean up & optimize unaligned scalar access probe
  riscv: Split out measure_cycles() for reuse
  riscv: Reuse measure_cycles() in check_vector_unaligned_access()
  riscv: Split out compare_unaligned_access()
  riscv: Reuse compare_unaligned_access() in
    check_vector_unaligned_access()

-- 
2.47.3


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/5] riscv: Clean up & optimize unaligned scalar access probe
  2026-02-11 17:30 [PATCH 0/5] riscv: Cleanup and deduplicate unaligned access speed probe Nam Cao
@ 2026-02-11 17:30 ` Nam Cao
  2026-02-11 17:30 ` [PATCH 2/5] riscv: Split out measure_cycles() for reuse Nam Cao
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Nam Cao @ 2026-02-11 17:30 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Andrew Jones, Clément Léger, linux-riscv, linux-kernel
  Cc: Nam Cao, Sebastian Andrzej Siewior

check_unaligned_access_speed_all_cpus() is more complicated than it should
be:

  - It uses on_each_cpu() to probe unaligned memory access on all CPUs but
    excludes CPU0 with a check in the callback function. So an IPI to CPU0
    is wasted.

  - Probing on CPU0 is done with smp_call_on_cpu(), which is not as fast as
    on_each_cpu().

The reason for this design is because the probe is timed with jiffies.
Therefore on_each_cpu() excludes CPU0 because that CPU needs to tend to
jiffies.

Instead, replace jiffies usage with ktime_get_mono_fast_ns(). With jiffies
out of the way, on_each_cpu() can be used for all CPUs and
smp_call_on_cpu() can be dropped.

To make ktime_get_mono_fast_ns() usable, move this probe to late_initcall.
Anything after clocksource's fs_initcall works, but avoid depending on
clocksource staying at fs_initcall.

The choice of probe time is now 8000000 ns, which is the same as before (2
jiffies) for riscv defconfig. This is excessive for the CPUs I have, and
probably should be reduced; but that's a different discussion.

Suggested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Nam Cao <namcao@linutronix.de>
---
 arch/riscv/kernel/unaligned_access_speed.c | 28 ++++++++--------------
 1 file changed, 10 insertions(+), 18 deletions(-)

diff --git a/arch/riscv/kernel/unaligned_access_speed.c b/arch/riscv/kernel/unaligned_access_speed.c
index 70b5e6927620..8b744c4a41ea 100644
--- a/arch/riscv/kernel/unaligned_access_speed.c
+++ b/arch/riscv/kernel/unaligned_access_speed.c
@@ -17,6 +17,7 @@
 #include "copy-unaligned.h"
 
 #define MISALIGNED_ACCESS_JIFFIES_LG2 1
+#define MISALIGNED_ACCESS_NS 8000000
 #define MISALIGNED_BUFFER_SIZE 0x4000
 #define MISALIGNED_BUFFER_ORDER get_order(MISALIGNED_BUFFER_SIZE)
 #define MISALIGNED_COPY_SIZE ((MISALIGNED_BUFFER_SIZE / 2) - 0x80)
@@ -36,8 +37,8 @@ static int check_unaligned_access(void *param)
 	u64 start_cycles, end_cycles;
 	u64 word_cycles;
 	u64 byte_cycles;
+	u64 start_ns;
 	int ratio;
-	unsigned long start_jiffies, now;
 	struct page *page = param;
 	void *dst;
 	void *src;
@@ -55,15 +56,13 @@ static int check_unaligned_access(void *param)
 	/* Do a warmup. */
 	__riscv_copy_words_unaligned(dst, src, MISALIGNED_COPY_SIZE);
 	preempt_disable();
-	start_jiffies = jiffies;
-	while ((now = jiffies) == start_jiffies)
-		cpu_relax();
 
 	/*
 	 * For a fixed amount of time, repeatedly try the function, and take
 	 * the best time in cycles as the measurement.
 	 */
-	while (time_before(jiffies, now + (1 << MISALIGNED_ACCESS_JIFFIES_LG2))) {
+	start_ns = ktime_get_mono_fast_ns();
+	while (ktime_get_mono_fast_ns() < start_ns + MISALIGNED_ACCESS_NS) {
 		start_cycles = get_cycles64();
 		/* Ensure the CSR read can't reorder WRT to the copy. */
 		mb();
@@ -77,11 +76,9 @@ static int check_unaligned_access(void *param)
 
 	byte_cycles = -1ULL;
 	__riscv_copy_bytes_unaligned(dst, src, MISALIGNED_COPY_SIZE);
-	start_jiffies = jiffies;
-	while ((now = jiffies) == start_jiffies)
-		cpu_relax();
 
-	while (time_before(jiffies, now + (1 << MISALIGNED_ACCESS_JIFFIES_LG2))) {
+	start_ns = ktime_get_mono_fast_ns();
+	while (ktime_get_mono_fast_ns() < start_ns + MISALIGNED_ACCESS_NS) {
 		start_cycles = get_cycles64();
 		mb();
 		__riscv_copy_bytes_unaligned(dst, src, MISALIGNED_COPY_SIZE);
@@ -125,13 +122,12 @@ static int check_unaligned_access(void *param)
 	return 0;
 }
 
-static void __init check_unaligned_access_nonboot_cpu(void *param)
+static void __init _check_unaligned_access(void *param)
 {
 	unsigned int cpu = smp_processor_id();
 	struct page **pages = param;
 
-	if (smp_processor_id() != 0)
-		check_unaligned_access(pages[cpu]);
+	check_unaligned_access(pages[cpu]);
 }
 
 /* Measure unaligned access speed on all CPUs present at boot in parallel. */
@@ -158,11 +154,7 @@ static void __init check_unaligned_access_speed_all_cpus(void)
 		}
 	}
 
-	/* Check everybody except 0, who stays behind to tend jiffies. */
-	on_each_cpu(check_unaligned_access_nonboot_cpu, bufs, 1);
-
-	/* Check core 0. */
-	smp_call_on_cpu(0, check_unaligned_access, bufs[0], true);
+	on_each_cpu(_check_unaligned_access, bufs, 1);
 
 out:
 	for_each_cpu(cpu, cpu_online_mask) {
@@ -494,4 +486,4 @@ static int __init check_unaligned_access_all_cpus(void)
 	return 0;
 }
 
-arch_initcall(check_unaligned_access_all_cpus);
+late_initcall(check_unaligned_access_all_cpus);
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/5] riscv: Split out measure_cycles() for reuse
  2026-02-11 17:30 [PATCH 0/5] riscv: Cleanup and deduplicate unaligned access speed probe Nam Cao
  2026-02-11 17:30 ` [PATCH 1/5] riscv: Clean up & optimize unaligned scalar access probe Nam Cao
@ 2026-02-11 17:30 ` Nam Cao
  2026-02-11 17:30 ` [PATCH 3/5] riscv: Reuse measure_cycles() in check_vector_unaligned_access() Nam Cao
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Nam Cao @ 2026-02-11 17:30 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Andrew Jones, Clément Léger, linux-riscv, linux-kernel
  Cc: Nam Cao

Byte cycle measurement and word cycle measurement of scalar misaligned
access are very similar. Split these parts out into a common
measure_cycles() function to avoid duplication.

This function will also be reused for vector misaligned access probe in a
follow-up commit.

Signed-off-by: Nam Cao <namcao@linutronix.de>
---
 arch/riscv/kernel/unaligned_access_speed.c | 69 +++++++++++-----------
 1 file changed, 33 insertions(+), 36 deletions(-)

diff --git a/arch/riscv/kernel/unaligned_access_speed.c b/arch/riscv/kernel/unaligned_access_speed.c
index 8b744c4a41ea..b964a666a973 100644
--- a/arch/riscv/kernel/unaligned_access_speed.c
+++ b/arch/riscv/kernel/unaligned_access_speed.c
@@ -31,30 +31,15 @@ static long unaligned_vector_speed_param = RISCV_HWPROBE_MISALIGNED_VECTOR_UNKNO
 static cpumask_t fast_misaligned_access;
 
 #ifdef CONFIG_RISCV_PROBE_UNALIGNED_ACCESS
-static int check_unaligned_access(void *param)
+static u64 measure_cycles(void (*func)(void *dst, const void *src, size_t len),
+			  void *dst, void *src, size_t len)
 {
-	int cpu = smp_processor_id();
-	u64 start_cycles, end_cycles;
-	u64 word_cycles;
-	u64 byte_cycles;
+	u64 start_cycles, end_cycles, cycles = -1ULL;
 	u64 start_ns;
-	int ratio;
-	struct page *page = param;
-	void *dst;
-	void *src;
-	long speed = RISCV_HWPROBE_MISALIGNED_SCALAR_SLOW;
 
-	if (per_cpu(misaligned_access_speed, cpu) != RISCV_HWPROBE_MISALIGNED_SCALAR_UNKNOWN)
-		return 0;
-
-	/* Make an unaligned destination buffer. */
-	dst = (void *)((unsigned long)page_address(page) | 0x1);
-	/* Unalign src as well, but differently (off by 1 + 2 = 3). */
-	src = dst + (MISALIGNED_BUFFER_SIZE / 2);
-	src += 2;
-	word_cycles = -1ULL;
 	/* Do a warmup. */
-	__riscv_copy_words_unaligned(dst, src, MISALIGNED_COPY_SIZE);
+	func(dst, src, len);
+
 	preempt_disable();
 
 	/*
@@ -66,29 +51,41 @@ static int check_unaligned_access(void *param)
 		start_cycles = get_cycles64();
 		/* Ensure the CSR read can't reorder WRT to the copy. */
 		mb();
-		__riscv_copy_words_unaligned(dst, src, MISALIGNED_COPY_SIZE);
+		func(dst, src, len);
 		/* Ensure the copy ends before the end time is snapped. */
 		mb();
 		end_cycles = get_cycles64();
-		if ((end_cycles - start_cycles) < word_cycles)
-			word_cycles = end_cycles - start_cycles;
+		if ((end_cycles - start_cycles) < cycles)
+			cycles = end_cycles - start_cycles;
 	}
 
-	byte_cycles = -1ULL;
-	__riscv_copy_bytes_unaligned(dst, src, MISALIGNED_COPY_SIZE);
+	preempt_enable();
 
-	start_ns = ktime_get_mono_fast_ns();
-	while (ktime_get_mono_fast_ns() < start_ns + MISALIGNED_ACCESS_NS) {
-		start_cycles = get_cycles64();
-		mb();
-		__riscv_copy_bytes_unaligned(dst, src, MISALIGNED_COPY_SIZE);
-		mb();
-		end_cycles = get_cycles64();
-		if ((end_cycles - start_cycles) < byte_cycles)
-			byte_cycles = end_cycles - start_cycles;
-	}
+	return cycles;
+}
 
-	preempt_enable();
+static int check_unaligned_access(void *param)
+{
+	int cpu = smp_processor_id();
+	u64 word_cycles;
+	u64 byte_cycles;
+	int ratio;
+	struct page *page = param;
+	void *dst;
+	void *src;
+	long speed = RISCV_HWPROBE_MISALIGNED_SCALAR_SLOW;
+
+	if (per_cpu(misaligned_access_speed, cpu) != RISCV_HWPROBE_MISALIGNED_SCALAR_UNKNOWN)
+		return 0;
+
+	/* Make an unaligned destination buffer. */
+	dst = (void *)((unsigned long)page_address(page) | 0x1);
+	/* Unalign src as well, but differently (off by 1 + 2 = 3). */
+	src = dst + (MISALIGNED_BUFFER_SIZE / 2);
+	src += 2;
+
+	word_cycles = measure_cycles(__riscv_copy_words_unaligned, dst, src, MISALIGNED_COPY_SIZE);
+	byte_cycles = measure_cycles(__riscv_copy_bytes_unaligned, dst, src, MISALIGNED_COPY_SIZE);
 
 	/* Don't divide by zero. */
 	if (!word_cycles || !byte_cycles) {
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 3/5] riscv: Reuse measure_cycles() in check_vector_unaligned_access()
  2026-02-11 17:30 [PATCH 0/5] riscv: Cleanup and deduplicate unaligned access speed probe Nam Cao
  2026-02-11 17:30 ` [PATCH 1/5] riscv: Clean up & optimize unaligned scalar access probe Nam Cao
  2026-02-11 17:30 ` [PATCH 2/5] riscv: Split out measure_cycles() for reuse Nam Cao
@ 2026-02-11 17:30 ` Nam Cao
  2026-02-11 17:30 ` [PATCH 4/5] riscv: Split out compare_unaligned_access() Nam Cao
  2026-02-11 17:30 ` [PATCH 5/5] riscv: Reuse compare_unaligned_access() in check_vector_unaligned_access() Nam Cao
  4 siblings, 0 replies; 6+ messages in thread
From: Nam Cao @ 2026-02-11 17:30 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Andrew Jones, Clément Léger, linux-riscv, linux-kernel
  Cc: Nam Cao

check_vector_unaligned_access() duplicates the logic in measure_cycles().

Reuse measure_cycles() and deduplicate.

Signed-off-by: Nam Cao <namcao@linutronix.de>
---
 arch/riscv/kernel/unaligned_access_speed.c | 54 ++++------------------
 1 file changed, 8 insertions(+), 46 deletions(-)

diff --git a/arch/riscv/kernel/unaligned_access_speed.c b/arch/riscv/kernel/unaligned_access_speed.c
index b964a666a973..c0d39c4b2150 100644
--- a/arch/riscv/kernel/unaligned_access_speed.c
+++ b/arch/riscv/kernel/unaligned_access_speed.c
@@ -16,7 +16,6 @@
 
 #include "copy-unaligned.h"
 
-#define MISALIGNED_ACCESS_JIFFIES_LG2 1
 #define MISALIGNED_ACCESS_NS 8000000
 #define MISALIGNED_BUFFER_SIZE 0x4000
 #define MISALIGNED_BUFFER_ORDER get_order(MISALIGNED_BUFFER_SIZE)
@@ -30,9 +29,9 @@ static long unaligned_vector_speed_param = RISCV_HWPROBE_MISALIGNED_VECTOR_UNKNO
 
 static cpumask_t fast_misaligned_access;
 
-#ifdef CONFIG_RISCV_PROBE_UNALIGNED_ACCESS
-static u64 measure_cycles(void (*func)(void *dst, const void *src, size_t len),
-			  void *dst, void *src, size_t len)
+static u64 __maybe_unused
+measure_cycles(void (*func)(void *dst, const void *src, size_t len),
+	       void *dst, void *src, size_t len)
 {
 	u64 start_cycles, end_cycles, cycles = -1ULL;
 	u64 start_ns;
@@ -64,6 +63,7 @@ static u64 measure_cycles(void (*func)(void *dst, const void *src, size_t len),
 	return cycles;
 }
 
+#ifdef CONFIG_RISCV_PROBE_UNALIGNED_ACCESS
 static int check_unaligned_access(void *param)
 {
 	int cpu = smp_processor_id();
@@ -270,11 +270,9 @@ static int riscv_offline_cpu(unsigned int cpu)
 static void check_vector_unaligned_access(struct work_struct *work __always_unused)
 {
 	int cpu = smp_processor_id();
-	u64 start_cycles, end_cycles;
 	u64 word_cycles;
 	u64 byte_cycles;
 	int ratio;
-	unsigned long start_jiffies, now;
 	struct page *page;
 	void *dst;
 	void *src;
@@ -294,50 +292,14 @@ static void check_vector_unaligned_access(struct work_struct *work __always_unus
 	/* Unalign src as well, but differently (off by 1 + 2 = 3). */
 	src = dst + (MISALIGNED_BUFFER_SIZE / 2);
 	src += 2;
-	word_cycles = -1ULL;
 
-	/* Do a warmup. */
 	kernel_vector_begin();
-	__riscv_copy_vec_words_unaligned(dst, src, MISALIGNED_COPY_SIZE);
-
-	start_jiffies = jiffies;
-	while ((now = jiffies) == start_jiffies)
-		cpu_relax();
-
-	/*
-	 * For a fixed amount of time, repeatedly try the function, and take
-	 * the best time in cycles as the measurement.
-	 */
-	while (time_before(jiffies, now + (1 << MISALIGNED_ACCESS_JIFFIES_LG2))) {
-		start_cycles = get_cycles64();
-		/* Ensure the CSR read can't reorder WRT to the copy. */
-		mb();
-		__riscv_copy_vec_words_unaligned(dst, src, MISALIGNED_COPY_SIZE);
-		/* Ensure the copy ends before the end time is snapped. */
-		mb();
-		end_cycles = get_cycles64();
-		if ((end_cycles - start_cycles) < word_cycles)
-			word_cycles = end_cycles - start_cycles;
-	}
-
-	byte_cycles = -1ULL;
-	__riscv_copy_vec_bytes_unaligned(dst, src, MISALIGNED_COPY_SIZE);
-	start_jiffies = jiffies;
-	while ((now = jiffies) == start_jiffies)
-		cpu_relax();
 
-	while (time_before(jiffies, now + (1 << MISALIGNED_ACCESS_JIFFIES_LG2))) {
-		start_cycles = get_cycles64();
-		/* Ensure the CSR read can't reorder WRT to the copy. */
-		mb();
-		__riscv_copy_vec_bytes_unaligned(dst, src, MISALIGNED_COPY_SIZE);
-		/* Ensure the copy ends before the end time is snapped. */
-		mb();
-		end_cycles = get_cycles64();
-		if ((end_cycles - start_cycles) < byte_cycles)
-			byte_cycles = end_cycles - start_cycles;
-	}
+	word_cycles = measure_cycles(__riscv_copy_vec_words_unaligned,
+				     dst, src, MISALIGNED_COPY_SIZE);
 
+	byte_cycles = measure_cycles(__riscv_copy_vec_bytes_unaligned,
+				     dst, src, MISALIGNED_COPY_SIZE);
 	kernel_vector_end();
 
 	/* Don't divide by zero. */
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 4/5] riscv: Split out compare_unaligned_access()
  2026-02-11 17:30 [PATCH 0/5] riscv: Cleanup and deduplicate unaligned access speed probe Nam Cao
                   ` (2 preceding siblings ...)
  2026-02-11 17:30 ` [PATCH 3/5] riscv: Reuse measure_cycles() in check_vector_unaligned_access() Nam Cao
@ 2026-02-11 17:30 ` Nam Cao
  2026-02-11 17:30 ` [PATCH 5/5] riscv: Reuse compare_unaligned_access() in check_vector_unaligned_access() Nam Cao
  4 siblings, 0 replies; 6+ messages in thread
From: Nam Cao @ 2026-02-11 17:30 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Andrew Jones, Clément Léger, linux-riscv, linux-kernel
  Cc: Nam Cao

Scalar misaligned access probe and vector misaligned access probe share
very similar code. Split out this similar part from scalar probe into
compare_unaligned_access(), which will be reused for vector probe in a
follow-up commit.

Signed-off-by: Nam Cao <namcao@linutronix.de>
---
 arch/riscv/kernel/unaligned_access_speed.c | 59 +++++++++++++++-------
 1 file changed, 40 insertions(+), 19 deletions(-)

diff --git a/arch/riscv/kernel/unaligned_access_speed.c b/arch/riscv/kernel/unaligned_access_speed.c
index c0d39c4b2150..b3ed74b71d3e 100644
--- a/arch/riscv/kernel/unaligned_access_speed.c
+++ b/arch/riscv/kernel/unaligned_access_speed.c
@@ -63,58 +63,79 @@ measure_cycles(void (*func)(void *dst, const void *src, size_t len),
 	return cycles;
 }
 
-#ifdef CONFIG_RISCV_PROBE_UNALIGNED_ACCESS
-static int check_unaligned_access(void *param)
+/*
+ * Return:
+ *     1 if unaligned accesses are fast
+ *     0 if unaligned accesses are slow
+ *    -1 if check cannot be done
+ */
+static int __maybe_unused
+compare_unaligned_access(void (*word_copy)(void *dst, const void *src, size_t len),
+			 void (*byte_copy)(void *dst, const void *src, size_t len),
+			 void *buf)
 {
 	int cpu = smp_processor_id();
 	u64 word_cycles;
 	u64 byte_cycles;
+	void *dst, *src;
+	bool fast;
 	int ratio;
-	struct page *page = param;
-	void *dst;
-	void *src;
-	long speed = RISCV_HWPROBE_MISALIGNED_SCALAR_SLOW;
-
-	if (per_cpu(misaligned_access_speed, cpu) != RISCV_HWPROBE_MISALIGNED_SCALAR_UNKNOWN)
-		return 0;
 
 	/* Make an unaligned destination buffer. */
-	dst = (void *)((unsigned long)page_address(page) | 0x1);
+	dst = (void *)((unsigned long)buf | 0x1);
 	/* Unalign src as well, but differently (off by 1 + 2 = 3). */
 	src = dst + (MISALIGNED_BUFFER_SIZE / 2);
 	src += 2;
 
-	word_cycles = measure_cycles(__riscv_copy_words_unaligned, dst, src, MISALIGNED_COPY_SIZE);
-	byte_cycles = measure_cycles(__riscv_copy_bytes_unaligned, dst, src, MISALIGNED_COPY_SIZE);
+	word_cycles = measure_cycles(word_copy, dst, src, MISALIGNED_COPY_SIZE);
+	byte_cycles = measure_cycles(byte_copy, dst, src, MISALIGNED_COPY_SIZE);
 
 	/* Don't divide by zero. */
 	if (!word_cycles || !byte_cycles) {
 		pr_warn("cpu%d: rdtime lacks granularity needed to measure unaligned access speed\n",
 			cpu);
 
-		return 0;
+		return -1;
 	}
 
-	if (word_cycles < byte_cycles)
-		speed = RISCV_HWPROBE_MISALIGNED_SCALAR_FAST;
+	fast = word_cycles < byte_cycles;
 
 	ratio = div_u64((byte_cycles * 100), word_cycles);
 	pr_info("cpu%d: Ratio of byte access time to unaligned word access is %d.%02d, unaligned accesses are %s\n",
 		cpu,
 		ratio / 100,
 		ratio % 100,
-		(speed == RISCV_HWPROBE_MISALIGNED_SCALAR_FAST) ? "fast" : "slow");
+		fast ? "fast" : "slow");
 
-	per_cpu(misaligned_access_speed, cpu) = speed;
+	return fast;
+}
+
+#ifdef CONFIG_RISCV_PROBE_UNALIGNED_ACCESS
+static int check_unaligned_access(struct page *page)
+{
+	void *buf = page_address(page);
+	int cpu = smp_processor_id();
+	int ret;
+
+	if (per_cpu(misaligned_access_speed, cpu) != RISCV_HWPROBE_MISALIGNED_SCALAR_UNKNOWN)
+		return 0;
+
+	ret = compare_unaligned_access(__riscv_copy_words_unaligned,
+				       __riscv_copy_bytes_unaligned, buf);
+	if (ret < 0)
+		return 0;
 
 	/*
 	 * Set the value of fast_misaligned_access of a CPU. These operations
 	 * are atomic to avoid race conditions.
 	 */
-	if (speed == RISCV_HWPROBE_MISALIGNED_SCALAR_FAST)
+	if (ret) {
+		per_cpu(misaligned_access_speed, cpu) = RISCV_HWPROBE_MISALIGNED_SCALAR_FAST;
 		cpumask_set_cpu(cpu, &fast_misaligned_access);
-	else
+	} else {
+		per_cpu(misaligned_access_speed, cpu) = RISCV_HWPROBE_MISALIGNED_SCALAR_SLOW;
 		cpumask_clear_cpu(cpu, &fast_misaligned_access);
+	}
 
 	return 0;
 }
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 5/5] riscv: Reuse compare_unaligned_access() in check_vector_unaligned_access()
  2026-02-11 17:30 [PATCH 0/5] riscv: Cleanup and deduplicate unaligned access speed probe Nam Cao
                   ` (3 preceding siblings ...)
  2026-02-11 17:30 ` [PATCH 4/5] riscv: Split out compare_unaligned_access() Nam Cao
@ 2026-02-11 17:30 ` Nam Cao
  4 siblings, 0 replies; 6+ messages in thread
From: Nam Cao @ 2026-02-11 17:30 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
	Andrew Jones, Clément Léger, linux-riscv, linux-kernel
  Cc: Nam Cao

check_vector_unaligned_access() duplicates the logic in
compare_unaligned_access().

Use compare_unaligned_access() and deduplicate.

Signed-off-by: Nam Cao <namcao@linutronix.de>
---
 arch/riscv/kernel/unaligned_access_speed.c | 55 +++++++---------------
 1 file changed, 16 insertions(+), 39 deletions(-)

diff --git a/arch/riscv/kernel/unaligned_access_speed.c b/arch/riscv/kernel/unaligned_access_speed.c
index b3ed74b71d3e..8a9f261dc10b 100644
--- a/arch/riscv/kernel/unaligned_access_speed.c
+++ b/arch/riscv/kernel/unaligned_access_speed.c
@@ -72,7 +72,7 @@ measure_cycles(void (*func)(void *dst, const void *src, size_t len),
 static int __maybe_unused
 compare_unaligned_access(void (*word_copy)(void *dst, const void *src, size_t len),
 			 void (*byte_copy)(void *dst, const void *src, size_t len),
-			 void *buf)
+			 void *buf, const char *type)
 {
 	int cpu = smp_processor_id();
 	u64 word_cycles;
@@ -92,8 +92,8 @@ compare_unaligned_access(void (*word_copy)(void *dst, const void *src, size_t le
 
 	/* Don't divide by zero. */
 	if (!word_cycles || !byte_cycles) {
-		pr_warn("cpu%d: rdtime lacks granularity needed to measure unaligned access speed\n",
-			cpu);
+		pr_warn("cpu%d: rdtime lacks granularity needed to measure %s unaligned access speed\n",
+			cpu, type);
 
 		return -1;
 	}
@@ -101,8 +101,9 @@ compare_unaligned_access(void (*word_copy)(void *dst, const void *src, size_t le
 	fast = word_cycles < byte_cycles;
 
 	ratio = div_u64((byte_cycles * 100), word_cycles);
-	pr_info("cpu%d: Ratio of byte access time to unaligned word access is %d.%02d, unaligned accesses are %s\n",
+	pr_info("cpu%d: %s unaligned word access speed is %d.%02dx byte access speed (%s)\n",
 		cpu,
+		type,
 		ratio / 100,
 		ratio % 100,
 		fast ? "fast" : "slow");
@@ -121,7 +122,8 @@ static int check_unaligned_access(struct page *page)
 		return 0;
 
 	ret = compare_unaligned_access(__riscv_copy_words_unaligned,
-				       __riscv_copy_bytes_unaligned, buf);
+				       __riscv_copy_bytes_unaligned,
+				       buf, "scalar");
 	if (ret < 0)
 		return 0;
 
@@ -291,13 +293,8 @@ static int riscv_offline_cpu(unsigned int cpu)
 static void check_vector_unaligned_access(struct work_struct *work __always_unused)
 {
 	int cpu = smp_processor_id();
-	u64 word_cycles;
-	u64 byte_cycles;
-	int ratio;
 	struct page *page;
-	void *dst;
-	void *src;
-	long speed = RISCV_HWPROBE_MISALIGNED_VECTOR_SLOW;
+	int ret;
 
 	if (per_cpu(vector_misaligned_access, cpu) != RISCV_HWPROBE_MISALIGNED_VECTOR_UNKNOWN)
 		return;
@@ -308,40 +305,20 @@ static void check_vector_unaligned_access(struct work_struct *work __always_unus
 		return;
 	}
 
-	/* Make an unaligned destination buffer. */
-	dst = (void *)((unsigned long)page_address(page) | 0x1);
-	/* Unalign src as well, but differently (off by 1 + 2 = 3). */
-	src = dst + (MISALIGNED_BUFFER_SIZE / 2);
-	src += 2;
-
 	kernel_vector_begin();
 
-	word_cycles = measure_cycles(__riscv_copy_vec_words_unaligned,
-				     dst, src, MISALIGNED_COPY_SIZE);
-
-	byte_cycles = measure_cycles(__riscv_copy_vec_bytes_unaligned,
-				     dst, src, MISALIGNED_COPY_SIZE);
+	ret = compare_unaligned_access(__riscv_copy_vec_words_unaligned,
+				       __riscv_copy_vec_bytes_unaligned,
+				       page_address(page), "vector");
 	kernel_vector_end();
 
-	/* Don't divide by zero. */
-	if (!word_cycles || !byte_cycles) {
-		pr_warn("cpu%d: rdtime lacks granularity needed to measure unaligned vector access speed\n",
-			cpu);
-
+	if (ret < 0)
 		goto free;
-	}
 
-	if (word_cycles < byte_cycles)
-		speed = RISCV_HWPROBE_MISALIGNED_VECTOR_FAST;
-
-	ratio = div_u64((byte_cycles * 100), word_cycles);
-	pr_info("cpu%d: Ratio of vector byte access time to vector unaligned word access is %d.%02d, unaligned accesses are %s\n",
-		cpu,
-		ratio / 100,
-		ratio % 100,
-		(speed ==  RISCV_HWPROBE_MISALIGNED_VECTOR_FAST) ? "fast" : "slow");
-
-	per_cpu(vector_misaligned_access, cpu) = speed;
+	if (ret)
+		per_cpu(vector_misaligned_access, cpu) = RISCV_HWPROBE_MISALIGNED_VECTOR_FAST;
+	else
+		per_cpu(vector_misaligned_access, cpu) = RISCV_HWPROBE_MISALIGNED_VECTOR_SLOW;
 
 free:
 	__free_pages(page, MISALIGNED_BUFFER_ORDER);
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-02-11 17:30 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-11 17:30 [PATCH 0/5] riscv: Cleanup and deduplicate unaligned access speed probe Nam Cao
2026-02-11 17:30 ` [PATCH 1/5] riscv: Clean up & optimize unaligned scalar access probe Nam Cao
2026-02-11 17:30 ` [PATCH 2/5] riscv: Split out measure_cycles() for reuse Nam Cao
2026-02-11 17:30 ` [PATCH 3/5] riscv: Reuse measure_cycles() in check_vector_unaligned_access() Nam Cao
2026-02-11 17:30 ` [PATCH 4/5] riscv: Split out compare_unaligned_access() Nam Cao
2026-02-11 17:30 ` [PATCH 5/5] riscv: Reuse compare_unaligned_access() in check_vector_unaligned_access() Nam Cao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox