* [PATCH 0/5] riscv: Cleanup and deduplicate unaligned access speed probe
@ 2026-02-11 17:30 Nam Cao
2026-02-11 17:30 ` [PATCH 1/5] riscv: Clean up & optimize unaligned scalar access probe Nam Cao
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: Nam Cao @ 2026-02-11 17:30 UTC (permalink / raw)
To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
Andrew Jones, Clément Léger, linux-riscv, linux-kernel
Cc: Nam Cao
Hi,
This series does some minor cleanups and deduplication of riscv unaligned
access speed probe.
arch/riscv/kernel/unaligned_access_speed.c | 221 ++++++++-------------
1 file changed, 85 insertions(+), 136 deletions(-)
Nam Cao (5):
riscv: Clean up & optimize unaligned scalar access probe
riscv: Split out measure_cycles() for reuse
riscv: Reuse measure_cycles() in check_vector_unaligned_access()
riscv: Split out compare_unaligned_access()
riscv: Reuse compare_unaligned_access() in
check_vector_unaligned_access()
--
2.47.3
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/5] riscv: Clean up & optimize unaligned scalar access probe
2026-02-11 17:30 [PATCH 0/5] riscv: Cleanup and deduplicate unaligned access speed probe Nam Cao
@ 2026-02-11 17:30 ` Nam Cao
2026-02-11 17:30 ` [PATCH 2/5] riscv: Split out measure_cycles() for reuse Nam Cao
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Nam Cao @ 2026-02-11 17:30 UTC (permalink / raw)
To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
Andrew Jones, Clément Léger, linux-riscv, linux-kernel
Cc: Nam Cao, Sebastian Andrzej Siewior
check_unaligned_access_speed_all_cpus() is more complicated than it should
be:
- It uses on_each_cpu() to probe unaligned memory access on all CPUs but
excludes CPU0 with a check in the callback function. So an IPI to CPU0
is wasted.
- Probing on CPU0 is done with smp_call_on_cpu(), which is not as fast as
on_each_cpu().
The reason for this design is because the probe is timed with jiffies.
Therefore on_each_cpu() excludes CPU0 because that CPU needs to tend to
jiffies.
Instead, replace jiffies usage with ktime_get_mono_fast_ns(). With jiffies
out of the way, on_each_cpu() can be used for all CPUs and
smp_call_on_cpu() can be dropped.
To make ktime_get_mono_fast_ns() usable, move this probe to late_initcall.
Anything after clocksource's fs_initcall works, but avoid depending on
clocksource staying at fs_initcall.
The choice of probe time is now 8000000 ns, which is the same as before (2
jiffies) for riscv defconfig. This is excessive for the CPUs I have, and
probably should be reduced; but that's a different discussion.
Suggested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Nam Cao <namcao@linutronix.de>
---
arch/riscv/kernel/unaligned_access_speed.c | 28 ++++++++--------------
1 file changed, 10 insertions(+), 18 deletions(-)
diff --git a/arch/riscv/kernel/unaligned_access_speed.c b/arch/riscv/kernel/unaligned_access_speed.c
index 70b5e6927620..8b744c4a41ea 100644
--- a/arch/riscv/kernel/unaligned_access_speed.c
+++ b/arch/riscv/kernel/unaligned_access_speed.c
@@ -17,6 +17,7 @@
#include "copy-unaligned.h"
#define MISALIGNED_ACCESS_JIFFIES_LG2 1
+#define MISALIGNED_ACCESS_NS 8000000
#define MISALIGNED_BUFFER_SIZE 0x4000
#define MISALIGNED_BUFFER_ORDER get_order(MISALIGNED_BUFFER_SIZE)
#define MISALIGNED_COPY_SIZE ((MISALIGNED_BUFFER_SIZE / 2) - 0x80)
@@ -36,8 +37,8 @@ static int check_unaligned_access(void *param)
u64 start_cycles, end_cycles;
u64 word_cycles;
u64 byte_cycles;
+ u64 start_ns;
int ratio;
- unsigned long start_jiffies, now;
struct page *page = param;
void *dst;
void *src;
@@ -55,15 +56,13 @@ static int check_unaligned_access(void *param)
/* Do a warmup. */
__riscv_copy_words_unaligned(dst, src, MISALIGNED_COPY_SIZE);
preempt_disable();
- start_jiffies = jiffies;
- while ((now = jiffies) == start_jiffies)
- cpu_relax();
/*
* For a fixed amount of time, repeatedly try the function, and take
* the best time in cycles as the measurement.
*/
- while (time_before(jiffies, now + (1 << MISALIGNED_ACCESS_JIFFIES_LG2))) {
+ start_ns = ktime_get_mono_fast_ns();
+ while (ktime_get_mono_fast_ns() < start_ns + MISALIGNED_ACCESS_NS) {
start_cycles = get_cycles64();
/* Ensure the CSR read can't reorder WRT to the copy. */
mb();
@@ -77,11 +76,9 @@ static int check_unaligned_access(void *param)
byte_cycles = -1ULL;
__riscv_copy_bytes_unaligned(dst, src, MISALIGNED_COPY_SIZE);
- start_jiffies = jiffies;
- while ((now = jiffies) == start_jiffies)
- cpu_relax();
- while (time_before(jiffies, now + (1 << MISALIGNED_ACCESS_JIFFIES_LG2))) {
+ start_ns = ktime_get_mono_fast_ns();
+ while (ktime_get_mono_fast_ns() < start_ns + MISALIGNED_ACCESS_NS) {
start_cycles = get_cycles64();
mb();
__riscv_copy_bytes_unaligned(dst, src, MISALIGNED_COPY_SIZE);
@@ -125,13 +122,12 @@ static int check_unaligned_access(void *param)
return 0;
}
-static void __init check_unaligned_access_nonboot_cpu(void *param)
+static void __init _check_unaligned_access(void *param)
{
unsigned int cpu = smp_processor_id();
struct page **pages = param;
- if (smp_processor_id() != 0)
- check_unaligned_access(pages[cpu]);
+ check_unaligned_access(pages[cpu]);
}
/* Measure unaligned access speed on all CPUs present at boot in parallel. */
@@ -158,11 +154,7 @@ static void __init check_unaligned_access_speed_all_cpus(void)
}
}
- /* Check everybody except 0, who stays behind to tend jiffies. */
- on_each_cpu(check_unaligned_access_nonboot_cpu, bufs, 1);
-
- /* Check core 0. */
- smp_call_on_cpu(0, check_unaligned_access, bufs[0], true);
+ on_each_cpu(_check_unaligned_access, bufs, 1);
out:
for_each_cpu(cpu, cpu_online_mask) {
@@ -494,4 +486,4 @@ static int __init check_unaligned_access_all_cpus(void)
return 0;
}
-arch_initcall(check_unaligned_access_all_cpus);
+late_initcall(check_unaligned_access_all_cpus);
--
2.47.3
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/5] riscv: Split out measure_cycles() for reuse
2026-02-11 17:30 [PATCH 0/5] riscv: Cleanup and deduplicate unaligned access speed probe Nam Cao
2026-02-11 17:30 ` [PATCH 1/5] riscv: Clean up & optimize unaligned scalar access probe Nam Cao
@ 2026-02-11 17:30 ` Nam Cao
2026-02-11 17:30 ` [PATCH 3/5] riscv: Reuse measure_cycles() in check_vector_unaligned_access() Nam Cao
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Nam Cao @ 2026-02-11 17:30 UTC (permalink / raw)
To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
Andrew Jones, Clément Léger, linux-riscv, linux-kernel
Cc: Nam Cao
Byte cycle measurement and word cycle measurement of scalar misaligned
access are very similar. Split these parts out into a common
measure_cycles() function to avoid duplication.
This function will also be reused for vector misaligned access probe in a
follow-up commit.
Signed-off-by: Nam Cao <namcao@linutronix.de>
---
arch/riscv/kernel/unaligned_access_speed.c | 69 +++++++++++-----------
1 file changed, 33 insertions(+), 36 deletions(-)
diff --git a/arch/riscv/kernel/unaligned_access_speed.c b/arch/riscv/kernel/unaligned_access_speed.c
index 8b744c4a41ea..b964a666a973 100644
--- a/arch/riscv/kernel/unaligned_access_speed.c
+++ b/arch/riscv/kernel/unaligned_access_speed.c
@@ -31,30 +31,15 @@ static long unaligned_vector_speed_param = RISCV_HWPROBE_MISALIGNED_VECTOR_UNKNO
static cpumask_t fast_misaligned_access;
#ifdef CONFIG_RISCV_PROBE_UNALIGNED_ACCESS
-static int check_unaligned_access(void *param)
+static u64 measure_cycles(void (*func)(void *dst, const void *src, size_t len),
+ void *dst, void *src, size_t len)
{
- int cpu = smp_processor_id();
- u64 start_cycles, end_cycles;
- u64 word_cycles;
- u64 byte_cycles;
+ u64 start_cycles, end_cycles, cycles = -1ULL;
u64 start_ns;
- int ratio;
- struct page *page = param;
- void *dst;
- void *src;
- long speed = RISCV_HWPROBE_MISALIGNED_SCALAR_SLOW;
- if (per_cpu(misaligned_access_speed, cpu) != RISCV_HWPROBE_MISALIGNED_SCALAR_UNKNOWN)
- return 0;
-
- /* Make an unaligned destination buffer. */
- dst = (void *)((unsigned long)page_address(page) | 0x1);
- /* Unalign src as well, but differently (off by 1 + 2 = 3). */
- src = dst + (MISALIGNED_BUFFER_SIZE / 2);
- src += 2;
- word_cycles = -1ULL;
/* Do a warmup. */
- __riscv_copy_words_unaligned(dst, src, MISALIGNED_COPY_SIZE);
+ func(dst, src, len);
+
preempt_disable();
/*
@@ -66,29 +51,41 @@ static int check_unaligned_access(void *param)
start_cycles = get_cycles64();
/* Ensure the CSR read can't reorder WRT to the copy. */
mb();
- __riscv_copy_words_unaligned(dst, src, MISALIGNED_COPY_SIZE);
+ func(dst, src, len);
/* Ensure the copy ends before the end time is snapped. */
mb();
end_cycles = get_cycles64();
- if ((end_cycles - start_cycles) < word_cycles)
- word_cycles = end_cycles - start_cycles;
+ if ((end_cycles - start_cycles) < cycles)
+ cycles = end_cycles - start_cycles;
}
- byte_cycles = -1ULL;
- __riscv_copy_bytes_unaligned(dst, src, MISALIGNED_COPY_SIZE);
+ preempt_enable();
- start_ns = ktime_get_mono_fast_ns();
- while (ktime_get_mono_fast_ns() < start_ns + MISALIGNED_ACCESS_NS) {
- start_cycles = get_cycles64();
- mb();
- __riscv_copy_bytes_unaligned(dst, src, MISALIGNED_COPY_SIZE);
- mb();
- end_cycles = get_cycles64();
- if ((end_cycles - start_cycles) < byte_cycles)
- byte_cycles = end_cycles - start_cycles;
- }
+ return cycles;
+}
- preempt_enable();
+static int check_unaligned_access(void *param)
+{
+ int cpu = smp_processor_id();
+ u64 word_cycles;
+ u64 byte_cycles;
+ int ratio;
+ struct page *page = param;
+ void *dst;
+ void *src;
+ long speed = RISCV_HWPROBE_MISALIGNED_SCALAR_SLOW;
+
+ if (per_cpu(misaligned_access_speed, cpu) != RISCV_HWPROBE_MISALIGNED_SCALAR_UNKNOWN)
+ return 0;
+
+ /* Make an unaligned destination buffer. */
+ dst = (void *)((unsigned long)page_address(page) | 0x1);
+ /* Unalign src as well, but differently (off by 1 + 2 = 3). */
+ src = dst + (MISALIGNED_BUFFER_SIZE / 2);
+ src += 2;
+
+ word_cycles = measure_cycles(__riscv_copy_words_unaligned, dst, src, MISALIGNED_COPY_SIZE);
+ byte_cycles = measure_cycles(__riscv_copy_bytes_unaligned, dst, src, MISALIGNED_COPY_SIZE);
/* Don't divide by zero. */
if (!word_cycles || !byte_cycles) {
--
2.47.3
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 3/5] riscv: Reuse measure_cycles() in check_vector_unaligned_access()
2026-02-11 17:30 [PATCH 0/5] riscv: Cleanup and deduplicate unaligned access speed probe Nam Cao
2026-02-11 17:30 ` [PATCH 1/5] riscv: Clean up & optimize unaligned scalar access probe Nam Cao
2026-02-11 17:30 ` [PATCH 2/5] riscv: Split out measure_cycles() for reuse Nam Cao
@ 2026-02-11 17:30 ` Nam Cao
2026-02-11 17:30 ` [PATCH 4/5] riscv: Split out compare_unaligned_access() Nam Cao
2026-02-11 17:30 ` [PATCH 5/5] riscv: Reuse compare_unaligned_access() in check_vector_unaligned_access() Nam Cao
4 siblings, 0 replies; 6+ messages in thread
From: Nam Cao @ 2026-02-11 17:30 UTC (permalink / raw)
To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
Andrew Jones, Clément Léger, linux-riscv, linux-kernel
Cc: Nam Cao
check_vector_unaligned_access() duplicates the logic in measure_cycles().
Reuse measure_cycles() and deduplicate.
Signed-off-by: Nam Cao <namcao@linutronix.de>
---
arch/riscv/kernel/unaligned_access_speed.c | 54 ++++------------------
1 file changed, 8 insertions(+), 46 deletions(-)
diff --git a/arch/riscv/kernel/unaligned_access_speed.c b/arch/riscv/kernel/unaligned_access_speed.c
index b964a666a973..c0d39c4b2150 100644
--- a/arch/riscv/kernel/unaligned_access_speed.c
+++ b/arch/riscv/kernel/unaligned_access_speed.c
@@ -16,7 +16,6 @@
#include "copy-unaligned.h"
-#define MISALIGNED_ACCESS_JIFFIES_LG2 1
#define MISALIGNED_ACCESS_NS 8000000
#define MISALIGNED_BUFFER_SIZE 0x4000
#define MISALIGNED_BUFFER_ORDER get_order(MISALIGNED_BUFFER_SIZE)
@@ -30,9 +29,9 @@ static long unaligned_vector_speed_param = RISCV_HWPROBE_MISALIGNED_VECTOR_UNKNO
static cpumask_t fast_misaligned_access;
-#ifdef CONFIG_RISCV_PROBE_UNALIGNED_ACCESS
-static u64 measure_cycles(void (*func)(void *dst, const void *src, size_t len),
- void *dst, void *src, size_t len)
+static u64 __maybe_unused
+measure_cycles(void (*func)(void *dst, const void *src, size_t len),
+ void *dst, void *src, size_t len)
{
u64 start_cycles, end_cycles, cycles = -1ULL;
u64 start_ns;
@@ -64,6 +63,7 @@ static u64 measure_cycles(void (*func)(void *dst, const void *src, size_t len),
return cycles;
}
+#ifdef CONFIG_RISCV_PROBE_UNALIGNED_ACCESS
static int check_unaligned_access(void *param)
{
int cpu = smp_processor_id();
@@ -270,11 +270,9 @@ static int riscv_offline_cpu(unsigned int cpu)
static void check_vector_unaligned_access(struct work_struct *work __always_unused)
{
int cpu = smp_processor_id();
- u64 start_cycles, end_cycles;
u64 word_cycles;
u64 byte_cycles;
int ratio;
- unsigned long start_jiffies, now;
struct page *page;
void *dst;
void *src;
@@ -294,50 +292,14 @@ static void check_vector_unaligned_access(struct work_struct *work __always_unus
/* Unalign src as well, but differently (off by 1 + 2 = 3). */
src = dst + (MISALIGNED_BUFFER_SIZE / 2);
src += 2;
- word_cycles = -1ULL;
- /* Do a warmup. */
kernel_vector_begin();
- __riscv_copy_vec_words_unaligned(dst, src, MISALIGNED_COPY_SIZE);
-
- start_jiffies = jiffies;
- while ((now = jiffies) == start_jiffies)
- cpu_relax();
-
- /*
- * For a fixed amount of time, repeatedly try the function, and take
- * the best time in cycles as the measurement.
- */
- while (time_before(jiffies, now + (1 << MISALIGNED_ACCESS_JIFFIES_LG2))) {
- start_cycles = get_cycles64();
- /* Ensure the CSR read can't reorder WRT to the copy. */
- mb();
- __riscv_copy_vec_words_unaligned(dst, src, MISALIGNED_COPY_SIZE);
- /* Ensure the copy ends before the end time is snapped. */
- mb();
- end_cycles = get_cycles64();
- if ((end_cycles - start_cycles) < word_cycles)
- word_cycles = end_cycles - start_cycles;
- }
-
- byte_cycles = -1ULL;
- __riscv_copy_vec_bytes_unaligned(dst, src, MISALIGNED_COPY_SIZE);
- start_jiffies = jiffies;
- while ((now = jiffies) == start_jiffies)
- cpu_relax();
- while (time_before(jiffies, now + (1 << MISALIGNED_ACCESS_JIFFIES_LG2))) {
- start_cycles = get_cycles64();
- /* Ensure the CSR read can't reorder WRT to the copy. */
- mb();
- __riscv_copy_vec_bytes_unaligned(dst, src, MISALIGNED_COPY_SIZE);
- /* Ensure the copy ends before the end time is snapped. */
- mb();
- end_cycles = get_cycles64();
- if ((end_cycles - start_cycles) < byte_cycles)
- byte_cycles = end_cycles - start_cycles;
- }
+ word_cycles = measure_cycles(__riscv_copy_vec_words_unaligned,
+ dst, src, MISALIGNED_COPY_SIZE);
+ byte_cycles = measure_cycles(__riscv_copy_vec_bytes_unaligned,
+ dst, src, MISALIGNED_COPY_SIZE);
kernel_vector_end();
/* Don't divide by zero. */
--
2.47.3
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 4/5] riscv: Split out compare_unaligned_access()
2026-02-11 17:30 [PATCH 0/5] riscv: Cleanup and deduplicate unaligned access speed probe Nam Cao
` (2 preceding siblings ...)
2026-02-11 17:30 ` [PATCH 3/5] riscv: Reuse measure_cycles() in check_vector_unaligned_access() Nam Cao
@ 2026-02-11 17:30 ` Nam Cao
2026-02-11 17:30 ` [PATCH 5/5] riscv: Reuse compare_unaligned_access() in check_vector_unaligned_access() Nam Cao
4 siblings, 0 replies; 6+ messages in thread
From: Nam Cao @ 2026-02-11 17:30 UTC (permalink / raw)
To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
Andrew Jones, Clément Léger, linux-riscv, linux-kernel
Cc: Nam Cao
Scalar misaligned access probe and vector misaligned access probe share
very similar code. Split out this similar part from scalar probe into
compare_unaligned_access(), which will be reused for vector probe in a
follow-up commit.
Signed-off-by: Nam Cao <namcao@linutronix.de>
---
arch/riscv/kernel/unaligned_access_speed.c | 59 +++++++++++++++-------
1 file changed, 40 insertions(+), 19 deletions(-)
diff --git a/arch/riscv/kernel/unaligned_access_speed.c b/arch/riscv/kernel/unaligned_access_speed.c
index c0d39c4b2150..b3ed74b71d3e 100644
--- a/arch/riscv/kernel/unaligned_access_speed.c
+++ b/arch/riscv/kernel/unaligned_access_speed.c
@@ -63,58 +63,79 @@ measure_cycles(void (*func)(void *dst, const void *src, size_t len),
return cycles;
}
-#ifdef CONFIG_RISCV_PROBE_UNALIGNED_ACCESS
-static int check_unaligned_access(void *param)
+/*
+ * Return:
+ * 1 if unaligned accesses are fast
+ * 0 if unaligned accesses are slow
+ * -1 if check cannot be done
+ */
+static int __maybe_unused
+compare_unaligned_access(void (*word_copy)(void *dst, const void *src, size_t len),
+ void (*byte_copy)(void *dst, const void *src, size_t len),
+ void *buf)
{
int cpu = smp_processor_id();
u64 word_cycles;
u64 byte_cycles;
+ void *dst, *src;
+ bool fast;
int ratio;
- struct page *page = param;
- void *dst;
- void *src;
- long speed = RISCV_HWPROBE_MISALIGNED_SCALAR_SLOW;
-
- if (per_cpu(misaligned_access_speed, cpu) != RISCV_HWPROBE_MISALIGNED_SCALAR_UNKNOWN)
- return 0;
/* Make an unaligned destination buffer. */
- dst = (void *)((unsigned long)page_address(page) | 0x1);
+ dst = (void *)((unsigned long)buf | 0x1);
/* Unalign src as well, but differently (off by 1 + 2 = 3). */
src = dst + (MISALIGNED_BUFFER_SIZE / 2);
src += 2;
- word_cycles = measure_cycles(__riscv_copy_words_unaligned, dst, src, MISALIGNED_COPY_SIZE);
- byte_cycles = measure_cycles(__riscv_copy_bytes_unaligned, dst, src, MISALIGNED_COPY_SIZE);
+ word_cycles = measure_cycles(word_copy, dst, src, MISALIGNED_COPY_SIZE);
+ byte_cycles = measure_cycles(byte_copy, dst, src, MISALIGNED_COPY_SIZE);
/* Don't divide by zero. */
if (!word_cycles || !byte_cycles) {
pr_warn("cpu%d: rdtime lacks granularity needed to measure unaligned access speed\n",
cpu);
- return 0;
+ return -1;
}
- if (word_cycles < byte_cycles)
- speed = RISCV_HWPROBE_MISALIGNED_SCALAR_FAST;
+ fast = word_cycles < byte_cycles;
ratio = div_u64((byte_cycles * 100), word_cycles);
pr_info("cpu%d: Ratio of byte access time to unaligned word access is %d.%02d, unaligned accesses are %s\n",
cpu,
ratio / 100,
ratio % 100,
- (speed == RISCV_HWPROBE_MISALIGNED_SCALAR_FAST) ? "fast" : "slow");
+ fast ? "fast" : "slow");
- per_cpu(misaligned_access_speed, cpu) = speed;
+ return fast;
+}
+
+#ifdef CONFIG_RISCV_PROBE_UNALIGNED_ACCESS
+static int check_unaligned_access(struct page *page)
+{
+ void *buf = page_address(page);
+ int cpu = smp_processor_id();
+ int ret;
+
+ if (per_cpu(misaligned_access_speed, cpu) != RISCV_HWPROBE_MISALIGNED_SCALAR_UNKNOWN)
+ return 0;
+
+ ret = compare_unaligned_access(__riscv_copy_words_unaligned,
+ __riscv_copy_bytes_unaligned, buf);
+ if (ret < 0)
+ return 0;
/*
* Set the value of fast_misaligned_access of a CPU. These operations
* are atomic to avoid race conditions.
*/
- if (speed == RISCV_HWPROBE_MISALIGNED_SCALAR_FAST)
+ if (ret) {
+ per_cpu(misaligned_access_speed, cpu) = RISCV_HWPROBE_MISALIGNED_SCALAR_FAST;
cpumask_set_cpu(cpu, &fast_misaligned_access);
- else
+ } else {
+ per_cpu(misaligned_access_speed, cpu) = RISCV_HWPROBE_MISALIGNED_SCALAR_SLOW;
cpumask_clear_cpu(cpu, &fast_misaligned_access);
+ }
return 0;
}
--
2.47.3
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 5/5] riscv: Reuse compare_unaligned_access() in check_vector_unaligned_access()
2026-02-11 17:30 [PATCH 0/5] riscv: Cleanup and deduplicate unaligned access speed probe Nam Cao
` (3 preceding siblings ...)
2026-02-11 17:30 ` [PATCH 4/5] riscv: Split out compare_unaligned_access() Nam Cao
@ 2026-02-11 17:30 ` Nam Cao
4 siblings, 0 replies; 6+ messages in thread
From: Nam Cao @ 2026-02-11 17:30 UTC (permalink / raw)
To: Paul Walmsley, Palmer Dabbelt, Albert Ou, Alexandre Ghiti,
Andrew Jones, Clément Léger, linux-riscv, linux-kernel
Cc: Nam Cao
check_vector_unaligned_access() duplicates the logic in
compare_unaligned_access().
Use compare_unaligned_access() and deduplicate.
Signed-off-by: Nam Cao <namcao@linutronix.de>
---
arch/riscv/kernel/unaligned_access_speed.c | 55 +++++++---------------
1 file changed, 16 insertions(+), 39 deletions(-)
diff --git a/arch/riscv/kernel/unaligned_access_speed.c b/arch/riscv/kernel/unaligned_access_speed.c
index b3ed74b71d3e..8a9f261dc10b 100644
--- a/arch/riscv/kernel/unaligned_access_speed.c
+++ b/arch/riscv/kernel/unaligned_access_speed.c
@@ -72,7 +72,7 @@ measure_cycles(void (*func)(void *dst, const void *src, size_t len),
static int __maybe_unused
compare_unaligned_access(void (*word_copy)(void *dst, const void *src, size_t len),
void (*byte_copy)(void *dst, const void *src, size_t len),
- void *buf)
+ void *buf, const char *type)
{
int cpu = smp_processor_id();
u64 word_cycles;
@@ -92,8 +92,8 @@ compare_unaligned_access(void (*word_copy)(void *dst, const void *src, size_t le
/* Don't divide by zero. */
if (!word_cycles || !byte_cycles) {
- pr_warn("cpu%d: rdtime lacks granularity needed to measure unaligned access speed\n",
- cpu);
+ pr_warn("cpu%d: rdtime lacks granularity needed to measure %s unaligned access speed\n",
+ cpu, type);
return -1;
}
@@ -101,8 +101,9 @@ compare_unaligned_access(void (*word_copy)(void *dst, const void *src, size_t le
fast = word_cycles < byte_cycles;
ratio = div_u64((byte_cycles * 100), word_cycles);
- pr_info("cpu%d: Ratio of byte access time to unaligned word access is %d.%02d, unaligned accesses are %s\n",
+ pr_info("cpu%d: %s unaligned word access speed is %d.%02dx byte access speed (%s)\n",
cpu,
+ type,
ratio / 100,
ratio % 100,
fast ? "fast" : "slow");
@@ -121,7 +122,8 @@ static int check_unaligned_access(struct page *page)
return 0;
ret = compare_unaligned_access(__riscv_copy_words_unaligned,
- __riscv_copy_bytes_unaligned, buf);
+ __riscv_copy_bytes_unaligned,
+ buf, "scalar");
if (ret < 0)
return 0;
@@ -291,13 +293,8 @@ static int riscv_offline_cpu(unsigned int cpu)
static void check_vector_unaligned_access(struct work_struct *work __always_unused)
{
int cpu = smp_processor_id();
- u64 word_cycles;
- u64 byte_cycles;
- int ratio;
struct page *page;
- void *dst;
- void *src;
- long speed = RISCV_HWPROBE_MISALIGNED_VECTOR_SLOW;
+ int ret;
if (per_cpu(vector_misaligned_access, cpu) != RISCV_HWPROBE_MISALIGNED_VECTOR_UNKNOWN)
return;
@@ -308,40 +305,20 @@ static void check_vector_unaligned_access(struct work_struct *work __always_unus
return;
}
- /* Make an unaligned destination buffer. */
- dst = (void *)((unsigned long)page_address(page) | 0x1);
- /* Unalign src as well, but differently (off by 1 + 2 = 3). */
- src = dst + (MISALIGNED_BUFFER_SIZE / 2);
- src += 2;
-
kernel_vector_begin();
- word_cycles = measure_cycles(__riscv_copy_vec_words_unaligned,
- dst, src, MISALIGNED_COPY_SIZE);
-
- byte_cycles = measure_cycles(__riscv_copy_vec_bytes_unaligned,
- dst, src, MISALIGNED_COPY_SIZE);
+ ret = compare_unaligned_access(__riscv_copy_vec_words_unaligned,
+ __riscv_copy_vec_bytes_unaligned,
+ page_address(page), "vector");
kernel_vector_end();
- /* Don't divide by zero. */
- if (!word_cycles || !byte_cycles) {
- pr_warn("cpu%d: rdtime lacks granularity needed to measure unaligned vector access speed\n",
- cpu);
-
+ if (ret < 0)
goto free;
- }
- if (word_cycles < byte_cycles)
- speed = RISCV_HWPROBE_MISALIGNED_VECTOR_FAST;
-
- ratio = div_u64((byte_cycles * 100), word_cycles);
- pr_info("cpu%d: Ratio of vector byte access time to vector unaligned word access is %d.%02d, unaligned accesses are %s\n",
- cpu,
- ratio / 100,
- ratio % 100,
- (speed == RISCV_HWPROBE_MISALIGNED_VECTOR_FAST) ? "fast" : "slow");
-
- per_cpu(vector_misaligned_access, cpu) = speed;
+ if (ret)
+ per_cpu(vector_misaligned_access, cpu) = RISCV_HWPROBE_MISALIGNED_VECTOR_FAST;
+ else
+ per_cpu(vector_misaligned_access, cpu) = RISCV_HWPROBE_MISALIGNED_VECTOR_SLOW;
free:
__free_pages(page, MISALIGNED_BUFFER_ORDER);
--
2.47.3
^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-02-11 17:30 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-11 17:30 [PATCH 0/5] riscv: Cleanup and deduplicate unaligned access speed probe Nam Cao
2026-02-11 17:30 ` [PATCH 1/5] riscv: Clean up & optimize unaligned scalar access probe Nam Cao
2026-02-11 17:30 ` [PATCH 2/5] riscv: Split out measure_cycles() for reuse Nam Cao
2026-02-11 17:30 ` [PATCH 3/5] riscv: Reuse measure_cycles() in check_vector_unaligned_access() Nam Cao
2026-02-11 17:30 ` [PATCH 4/5] riscv: Split out compare_unaligned_access() Nam Cao
2026-02-11 17:30 ` [PATCH 5/5] riscv: Reuse compare_unaligned_access() in check_vector_unaligned_access() Nam Cao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox