* [PATCH v2 0/3] Optimize GCD performance on RISC-V by selecting implementation at runtime
@ 2025-05-24 15:55 Kuan-Wei Chiu
2025-05-24 15:55 ` [PATCH v2 1/3] lib/math/gcd: Use static key to select " Kuan-Wei Chiu
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Kuan-Wei Chiu @ 2025-05-24 15:55 UTC (permalink / raw)
To: paul.walmsley, palmer, aou, alex, akpm
Cc: linux-riscv, linux-kernel, jserv, Kuan-Wei Chiu, Yu-Chun Lin
The current implementation of gcd() selects between the binary GCD and
the odd-even GCD algorithm at compile time, depending on whether
CONFIG_CPU_NO_EFFICIENT_FFS is set. On platforms like RISC-V, however,
this compile-time decision can be misleading: even when the compiler
emits ctz instructions based on the assumption that they are efficient
(as is the case when CONFIG_RISCV_ISA_ZBB is enabled), the actual
hardware may lack support for the Zbb extension. In such cases, ffs()
falls back to a software implementation at runtime, making the binary
GCD algorithm significantly slower than the odd-even variant.
To address this, we introduce a static key to allow runtime selection
between the binary and odd-even GCD implementations. On RISC-V, the
kernel now checks for Zbb support during boot. If Zbb is unavailable,
the static key is disabled so that gcd() consistently uses the more
efficient odd-even algorithm in that scenario. Additionally, to further
reduce code size, we select CONFIG_CPU_NO_EFFICIENT_FFS automatically
when CONFIG_RISCV_ISA_ZBB is not enabled, avoiding compilation of the
unused binary GCD implementation entirely on systems where it would
never be executed.
This series ensures that the most efficient GCD algorithm is used in
practice and avoids compiling unnecessary code based on hardware
capabilities and kernel configuration.
Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
---
This series has been tested on QEMU to verify that the correct GCD
implementation is used both with and without Zbb support.
v1 -> v2:
- Use a static key to select the GCD implementation at runtime.
v1: https://lore.kernel.org/lkml/20250217013708.1932496-1-visitorckw@gmail.com/
---
Kuan-Wei Chiu (3):
lib/math/gcd: Use static key to select implementation at runtime
riscv: Optimize gcd() code size when CONFIG_RISCV_ISA_ZBB is disabled
riscv: Optimize gcd() performance on RISC-V without Zbb extension
arch/riscv/Kconfig | 1 +
arch/riscv/kernel/setup.c | 6 ++++++
lib/math/gcd.c | 25 ++++++++++++++++---------
3 files changed, 23 insertions(+), 9 deletions(-)
--
2.34.1
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2 1/3] lib/math/gcd: Use static key to select implementation at runtime
2025-05-24 15:55 [PATCH v2 0/3] Optimize GCD performance on RISC-V by selecting implementation at runtime Kuan-Wei Chiu
@ 2025-05-24 15:55 ` Kuan-Wei Chiu
2025-06-04 23:01 ` Andrew Morton
2025-05-24 15:55 ` [PATCH v2 2/3] riscv: Optimize gcd() code size when CONFIG_RISCV_ISA_ZBB is disabled Kuan-Wei Chiu
2025-05-24 15:55 ` [PATCH v2 3/3] riscv: Optimize gcd() performance on RISC-V without Zbb extension Kuan-Wei Chiu
2 siblings, 1 reply; 6+ messages in thread
From: Kuan-Wei Chiu @ 2025-05-24 15:55 UTC (permalink / raw)
To: paul.walmsley, palmer, aou, alex, akpm
Cc: linux-riscv, linux-kernel, jserv, Kuan-Wei Chiu, Yu-Chun Lin
On platforms like RISC-V, the compiler may generate hardware FFS
instructions even if the underlying CPU does not actually support them.
Currently, the GCD implementation is chosen at compile time based on
CONFIG_CPU_NO_EFFICIENT_FFS, which can result in suboptimal behavior on
such systems.
Introduce a static key, efficient_ffs_key, to enable runtime selection
between the binary GCD (using ffs) and the odd-even GCD implementation.
This allows the kernel to default to the faster binary GCD when FFS is
efficient, while retaining the ability to fall back when needed.
Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
---
lib/math/gcd.c | 25 ++++++++++++++++---------
1 file changed, 16 insertions(+), 9 deletions(-)
diff --git a/lib/math/gcd.c b/lib/math/gcd.c
index e3b042214d1b..2898ee0e5add 100644
--- a/lib/math/gcd.c
+++ b/lib/math/gcd.c
@@ -2,6 +2,7 @@
#include <linux/kernel.h>
#include <linux/gcd.h>
#include <linux/export.h>
+#include <linux/jump_label.h>
/*
* This implements the binary GCD algorithm. (Often attributed to Stein,
@@ -11,16 +12,13 @@
* has decent hardware division.
*/
+DEFINE_STATIC_KEY_TRUE(efficient_ffs_key);
+
#if !defined(CONFIG_CPU_NO_EFFICIENT_FFS)
/* If __ffs is available, the even/odd algorithm benchmarks slower. */
-/**
- * gcd - calculate and return the greatest common divisor of 2 unsigned longs
- * @a: first value
- * @b: second value
- */
-unsigned long gcd(unsigned long a, unsigned long b)
+static unsigned long binary_gcd(unsigned long a, unsigned long b)
{
unsigned long r = a | b;
@@ -44,9 +42,15 @@ unsigned long gcd(unsigned long a, unsigned long b)
}
}
-#else
+#endif
/* If normalization is done by loops, the even/odd algorithm is a win. */
+
+/**
+ * gcd - calculate and return the greatest common divisor of 2 unsigned longs
+ * @a: first value
+ * @b: second value
+ */
unsigned long gcd(unsigned long a, unsigned long b)
{
unsigned long r = a | b;
@@ -54,6 +58,11 @@ unsigned long gcd(unsigned long a, unsigned long b)
if (!a || !b)
return r;
+#if !defined(CONFIG_CPU_NO_EFFICIENT_FFS)
+ if (static_branch_likely(&efficient_ffs_key))
+ return binary_gcd(a, b);
+#endif
+
/* Isolate lsbit of r */
r &= -r;
@@ -80,6 +89,4 @@ unsigned long gcd(unsigned long a, unsigned long b)
}
}
-#endif
-
EXPORT_SYMBOL_GPL(gcd);
--
2.34.1
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v2 2/3] riscv: Optimize gcd() code size when CONFIG_RISCV_ISA_ZBB is disabled
2025-05-24 15:55 [PATCH v2 0/3] Optimize GCD performance on RISC-V by selecting implementation at runtime Kuan-Wei Chiu
2025-05-24 15:55 ` [PATCH v2 1/3] lib/math/gcd: Use static key to select " Kuan-Wei Chiu
@ 2025-05-24 15:55 ` Kuan-Wei Chiu
2025-05-24 15:55 ` [PATCH v2 3/3] riscv: Optimize gcd() performance on RISC-V without Zbb extension Kuan-Wei Chiu
2 siblings, 0 replies; 6+ messages in thread
From: Kuan-Wei Chiu @ 2025-05-24 15:55 UTC (permalink / raw)
To: paul.walmsley, palmer, aou, alex, akpm
Cc: linux-riscv, linux-kernel, jserv, Kuan-Wei Chiu, Yu-Chun Lin
The binary GCD implementation depends on efficient ffs(), which on
RISC-V requires hardware support for the Zbb extension. When
CONFIG_RISCV_ISA_ZBB is not enabled, the kernel will never use binary
GCD, as runtime logic will always fall back to the odd-even
implementation.
To avoid compiling unused code and reduce code size, select
CONFIG_CPU_NO_EFFICIENT_FFS when CONFIG_RISCV_ISA_ZBB is not set.
$ ./scripts/bloat-o-meter ./lib/math/gcd.o.old ./lib/math/gcd.o.new
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-274 (-274)
Function old new delta
gcd 360 86 -274
Total: Before=384, After=110, chg -71.35%
Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
---
arch/riscv/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index bbec87b79309..f085adc6f573 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -95,6 +95,7 @@ config RISCV
select CLINT_TIMER if RISCV_M_MODE
select CLONE_BACKWARDS
select COMMON_CLK
+ select CPU_NO_EFFICIENT_FFS if !RISCV_ISA_ZBB
select CPU_PM if CPU_IDLE || HIBERNATION || SUSPEND
select EDAC_SUPPORT
select FRAME_POINTER if PERF_EVENTS || (FUNCTION_TRACER && !DYNAMIC_FTRACE)
--
2.34.1
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v2 3/3] riscv: Optimize gcd() performance on RISC-V without Zbb extension
2025-05-24 15:55 [PATCH v2 0/3] Optimize GCD performance on RISC-V by selecting implementation at runtime Kuan-Wei Chiu
2025-05-24 15:55 ` [PATCH v2 1/3] lib/math/gcd: Use static key to select " Kuan-Wei Chiu
2025-05-24 15:55 ` [PATCH v2 2/3] riscv: Optimize gcd() code size when CONFIG_RISCV_ISA_ZBB is disabled Kuan-Wei Chiu
@ 2025-05-24 15:55 ` Kuan-Wei Chiu
2025-06-04 23:02 ` Andrew Morton
2 siblings, 1 reply; 6+ messages in thread
From: Kuan-Wei Chiu @ 2025-05-24 15:55 UTC (permalink / raw)
To: paul.walmsley, palmer, aou, alex, akpm
Cc: linux-riscv, linux-kernel, jserv, Kuan-Wei Chiu, Yu-Chun Lin
The binary GCD implementation uses FFS (find first set), which benefits
from hardware support for the ctz instruction, provided by the Zbb
extension on RISC-V. Without Zbb, this results in slower
software-emulated behavior.
Previously, RISC-V always used the binary GCD, regardless of actual
hardware support. This patch improves runtime efficiency by disabling
the efficient_ffs_key static branch when Zbb is either not enabled in
the kernel (config) or not supported on the executing CPU. This selects
the odd-even GCD implementation, which is faster in the absence of
efficient FFS.
This change ensures the most suitable GCD algorithm is chosen
dynamically based on actual hardware capabilities.
Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com>
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
---
arch/riscv/kernel/setup.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
index f7c9a1caa83e..f891eedc3644 100644
--- a/arch/riscv/kernel/setup.c
+++ b/arch/riscv/kernel/setup.c
@@ -21,6 +21,7 @@
#include <linux/efi.h>
#include <linux/crash_dump.h>
#include <linux/panic_notifier.h>
+#include <linux/jump_label.h>
#include <asm/acpi.h>
#include <asm/alternative.h>
@@ -51,6 +52,8 @@ atomic_t hart_lottery __section(".sdata")
;
unsigned long boot_cpu_hartid;
+DECLARE_STATIC_KEY_TRUE(efficient_ffs_key);
+
/*
* Place kernel memory regions on the resource tree so that
* kexec-tools can retrieve them from /proc/iomem. While there
@@ -361,6 +364,9 @@ void __init setup_arch(char **cmdline_p)
riscv_user_isa_enable();
riscv_spinlock_init();
+
+ if (!IS_ENABLED(CONFIG_RISCV_ISA_ZBB) || !riscv_isa_extension_available(NULL, ZBB))
+ static_branch_disable(&efficient_ffs_key);
}
bool arch_cpu_is_hotpluggable(int cpu)
--
2.34.1
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v2 1/3] lib/math/gcd: Use static key to select implementation at runtime
2025-05-24 15:55 ` [PATCH v2 1/3] lib/math/gcd: Use static key to select " Kuan-Wei Chiu
@ 2025-06-04 23:01 ` Andrew Morton
0 siblings, 0 replies; 6+ messages in thread
From: Andrew Morton @ 2025-06-04 23:01 UTC (permalink / raw)
To: Kuan-Wei Chiu
Cc: paul.walmsley, palmer, aou, alex, linux-riscv, linux-kernel,
jserv, Yu-Chun Lin
On Sat, 24 May 2025 23:55:17 +0800 Kuan-Wei Chiu <visitorckw@gmail.com> wrote:
> On platforms like RISC-V, the compiler may generate hardware FFS
> instructions even if the underlying CPU does not actually support them.
> Currently, the GCD implementation is chosen at compile time based on
> CONFIG_CPU_NO_EFFICIENT_FFS, which can result in suboptimal behavior on
> such systems.
>
> Introduce a static key, efficient_ffs_key, to enable runtime selection
> between the binary GCD (using ffs) and the odd-even GCD implementation.
> This allows the kernel to default to the faster binary GCD when FFS is
> efficient, while retaining the ability to fall back when needed.
>
> ...
>
> @@ -54,6 +58,11 @@ unsigned long gcd(unsigned long a, unsigned long b)
> if (!a || !b)
> return r;
Both binary_gcd() and gcd() perform the above. Seems unnecessary?
I can merge this series if the riscv team are OK with it?
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 3/3] riscv: Optimize gcd() performance on RISC-V without Zbb extension
2025-05-24 15:55 ` [PATCH v2 3/3] riscv: Optimize gcd() performance on RISC-V without Zbb extension Kuan-Wei Chiu
@ 2025-06-04 23:02 ` Andrew Morton
0 siblings, 0 replies; 6+ messages in thread
From: Andrew Morton @ 2025-06-04 23:02 UTC (permalink / raw)
To: Kuan-Wei Chiu
Cc: paul.walmsley, palmer, aou, alex, linux-riscv, linux-kernel,
jserv, Yu-Chun Lin
On Sat, 24 May 2025 23:55:19 +0800 Kuan-Wei Chiu <visitorckw@gmail.com> wrote:
> The binary GCD implementation uses FFS (find first set), which benefits
> from hardware support for the ctz instruction, provided by the Zbb
> extension on RISC-V. Without Zbb, this results in slower
> software-emulated behavior.
>
> Previously, RISC-V always used the binary GCD, regardless of actual
> hardware support. This patch improves runtime efficiency by disabling
> the efficient_ffs_key static branch when Zbb is either not enabled in
> the kernel (config) or not supported on the executing CPU. This selects
> the odd-even GCD implementation, which is faster in the absence of
> efficient FFS.
>
> This change ensures the most suitable GCD algorithm is chosen
> dynamically based on actual hardware capabilities.
>
> ...
>
> --- a/arch/riscv/kernel/setup.c
> +++ b/arch/riscv/kernel/setup.c
> @@ -21,6 +21,7 @@
> #include <linux/efi.h>
> #include <linux/crash_dump.h>
> #include <linux/panic_notifier.h>
> +#include <linux/jump_label.h>
>
> #include <asm/acpi.h>
> #include <asm/alternative.h>
> @@ -51,6 +52,8 @@ atomic_t hart_lottery __section(".sdata")
> ;
> unsigned long boot_cpu_hartid;
>
> +DECLARE_STATIC_KEY_TRUE(efficient_ffs_key);
Please let's get this into a header file, visible to the definition
site and to all users.
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-06-04 23:05 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-24 15:55 [PATCH v2 0/3] Optimize GCD performance on RISC-V by selecting implementation at runtime Kuan-Wei Chiu
2025-05-24 15:55 ` [PATCH v2 1/3] lib/math/gcd: Use static key to select " Kuan-Wei Chiu
2025-06-04 23:01 ` Andrew Morton
2025-05-24 15:55 ` [PATCH v2 2/3] riscv: Optimize gcd() code size when CONFIG_RISCV_ISA_ZBB is disabled Kuan-Wei Chiu
2025-05-24 15:55 ` [PATCH v2 3/3] riscv: Optimize gcd() performance on RISC-V without Zbb extension Kuan-Wei Chiu
2025-06-04 23:02 ` Andrew Morton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).